Topics with Label: Delt Lake

Forum Posts

Sorted by:

Start a conversation

by Jingalls • New Contributor II

06-28-2022 12:37:38 PM

363 Views
1 replies
2 kudos

The Data + AI summit is a blast so far. There are so many new technologies being released such as Delta Lake 2.0 being open source.

Data Engineering

363 Views
1 replies
2 kudos

06-28-2022 12:37:38 PM

View Replies

Latest Reply

Zzof
New Contributor II

06-28-2022 2:13:44 PM

2 kudos

Agreed! You should check out the Azure booth if you haven't already they have a really cool demo.

2 kudos

06-28-2022 2:13:44 PM

by Braxx • Contributor II

05-27-2022 2:32:31 AM

2564 Views
2 replies
1 kudos

Resolved! delta table storage

I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.Few questions:1) Where does delta tables are stored? Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"So where exactly i...

Data Engineering

2564 Views
2 replies
1 kudos

05-27-2022 2:32:31 AM

View Replies

Latest Reply

Braxx
Contributor II

05-30-2022 8:01:03 AM

1 kudos

thanks, very helpful

1 kudos

05-30-2022 8:01:03 AM

1 More Replies

by Development • New Contributor III

04-12-2022 11:25:00 PM

2614 Views
8 replies
5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

Data Engineering

2614 Views
8 replies
5 kudos

04-12-2022 11:25:00 PM

View Replies

Latest Reply

Development
New Contributor III

04-27-2022 8:27:46 AM

5 kudos

@Kaniz Fatma @Parker Temple I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....

5 kudos

04-27-2022 8:27:46 AM

7 More Replies

by Taha_Hussain • Valued Contributor II

05-05-2022 3:13:02 PM

779 Views
1 replies
6 kudos

Databricks Office Hours Our next Office Hours session is scheduled for May 18th from 8:00 am - 9:00am PT. Do you have questions about how to set up or...

Databricks Office HoursOur next Office Hours session is scheduled for May 18th from 8:00 am - 9:00am PT.Do you have questions about how to set up or use Databricks? Do you want to learn more about the best practices for deploying your use case or tip...

Data Engineering

779 Views
1 replies
6 kudos

05-05-2022 3:13:02 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

05-06-2022 9:06:08 AM

6 kudos

Just registered!

6 kudos

05-06-2022 9:06:08 AM

by WojtekJ • New Contributor

04-25-2022 4:39:13 AM

4100 Views
2 replies
3 kudos

Is it possible to use Iceberg instead of DeltaLake?

Hi.Do you know if it is possible to use Iceberg table format instead DeltaLake?Ideally, I would like to see the tables in Databricks stored as Iceberg and use them as usual in the notebooks.I read that there is also an option to link external metasto...

Data Engineering

4100 Views
2 replies
3 kudos

04-25-2022 4:39:13 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-27-2022 7:41:03 PM

3 kudos

Hi @Wojtek J , Here's a thorough comparison of Delta Lake, Iceberg and Hudi.This talk shares the research that we did for the comparison of the key features and designs these table format holds, the maturity of features, such as APIs expose to end u...

3 kudos

04-27-2022 7:41:03 PM

1 More Replies

by AmanSehgal • Honored Contributor III

04-20-2022 9:44:53 PM

1637 Views
4 replies
10 kudos

Migrating data from delta lake to RDS MySQL and ElasticSearch

There are mechanisms (like DMS) to get data from RDS to delta lake and store the data in parquet format, but is it possible to reverse of this in AWS?I want to send data from data lake to MySQL RDS tables in batch mode.And the next step is to send th...

Data Engineering

1637 Views
4 replies
10 kudos

04-20-2022 9:44:53 PM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

04-26-2022 5:05:28 AM

10 kudos

@Kaniz Fatma and @Hubert Dudek - writing to MySQL RDS is relatively simpler. I'm finding ways to export data into Elasticsearch

10 kudos

04-26-2022 5:05:28 AM

3 More Replies

by hari • Contributor

02-14-2022 4:47:14 AM

3525 Views
8 replies
4 kudos

Resolved! How to write Change Data from Delta Lake to aws dynamodb

Is there some direct way to write data from DeltaLake to AWS DynamoDB.If there is none, Is there any way to do the same.

Data Engineering

3525 Views
8 replies
4 kudos

02-14-2022 4:47:14 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

03-14-2022 5:55:55 PM

4 kudos

Hi @Harikrishnan P H ,Did @Werner Stinckens reply help you to resolved your issue? if yes, please mark it as best. if not, please let us know.

4 kudos

03-14-2022 5:55:55 PM

7 More Replies

by alejandrofm • Valued Contributor

03-07-2022 6:24:01 AM

2829 Views
3 replies
3 kudos

Resolved! Delta, the specified key does not exist error

Hi, I'm having this error too frequently on a few tables, I check on S3 and the partition exists and the file is there on the partition.error: Spectrum Scan Error: DeltaManifestcode: 15005context: Error fetching Delta Lake manifest delta/product/sub_...

Data Engineering

2829 Views
3 replies
3 kudos

03-07-2022 6:24:01 AM

View Replies

Latest Reply

alejandrofm
Valued Contributor

03-08-2022 5:07:55 AM

3 kudos

@Hubert Dudek , I'll add that sometimes, just running:GENERATE symlink_format_manifest FOR TABLE schema.tablesolves it, but, how can the symlink get broken?Thanks!

3 kudos

03-08-2022 5:07:55 AM

2 More Replies

by pjp94 • Contributor

01-28-2022 12:54:18 PM

7013 Views
5 replies
4 kudos

Resolved! Difference between DBFS and Delta Lake?

Would like a deeper dive/explanation into the difference. When I write to a table with the following code:spark_df.write.mode("overwrite").saveAsTable("db.table")The table is created and can be viewed in the Data tab. It can also be found in some DBF...

Data Engineering

7013 Views
5 replies
4 kudos

01-28-2022 12:54:18 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-31-2022 12:47:53 AM

4 kudos

Tables in spark, delta lake-backed or not are basically just semantic views on top of the actual data.On Databricks, the data itself is stored in DBFS, which is an abstraction layer on top of the actual storage (like S3, ADLS etct). this can be parq...

4 kudos

01-31-2022 12:47:53 AM

4 More Replies

by Hubert-Dudek • Esteemed Contributor III

01-26-2022 12:50:23 PM

760 Views
1 replies
15 kudos

Resolved! Write to Azure Delta Lake - optimization request

Databricks/Delta team could optimize some commands which writes to Azure Blob Storage as Azure display that message:

Data Engineering

760 Views
1 replies
15 kudos

01-26-2022 12:50:23 PM

View Replies

Latest Reply

Anonymous
Not applicable

01-27-2022 2:55:17 PM

15 kudos

Hey there. Thank you for your suggestion. I'll pass this up to the team.

15 kudos

01-27-2022 2:55:17 PM

by Disney • New Contributor II

01-19-2022 10:52:53 AM

742 Views
1 replies
5 kudos

Resolved! We have hundreds of ETL process (Informatica) with a lot of logic pulling various data from applications into a relational db (Target DB). Can we use Delta Lake as the Target DB?

Hi DB Support,Can we use DB's Delta Lake as our Target DB? Here's our situation...We have hundreds of ETL jobs pulling from these Sources. (SAP, Siebel/Oracle, Cognos, Postgres) .Our ETL Process has all of the logic and our Target DB is an MPP syst...

Data Engineering

742 Views
1 replies
5 kudos

01-19-2022 10:52:53 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-20-2022 2:42:04 AM

5 kudos

Hi yes you can the best is to create sql endpoint in premium workspace and just write to delta lake as to sql. This is community forum not support. You can contact databricks via https://databricks.com/company/contact or via AWS, Azure if you have su...

5 kudos

01-20-2022 2:42:04 AM

by MadelynM • New Contributor III

12-06-2021 12:24:26 PM

338 Views
0 replies
1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

Data Engineering

338 Views
0 replies
1 kudos

12-06-2021 12:24:26 PM

by FemiAnthony • New Contributor III

11-05-2021 2:45:52 AM

1875 Views
5 replies
3 kudos

Resolved! Location of customer_t1 dataset

Can anyone tell me how I can access the customer_t1 dataset that is referenced in the book "Delta Lake - The Definitive Guide "? I am trying to follow along with one of the examples.

Data Engineering

1875 Views
5 replies
3 kudos

11-05-2021 2:45:52 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-05-2021 7:41:44 AM

3 kudos

Some files are visualized here https://github.com/vinijaiswal/delta_time_travel/blob/main/Delta%20Time%20Travel.ipynb but it is quite strange that there is no source in repository. I think only one way is to write to Vini Jaiswal on github.

3 kudos

11-05-2021 7:41:44 AM

4 More Replies

by prasadvaze • Valued Contributor

08-17-2021 10:12:27 AM

1676 Views
2 replies
1 kudos

Delta RUST API (not REST )

@dennylee Delta RUST API seems a good option to query delta table without spinning up spark cluster so I am trying out this - https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html using Python app"Read...

Data Engineering

1676 Views
2 replies
1 kudos

08-17-2021 10:12:27 AM

View Replies

Latest Reply

prasadvaze
Valued Contributor

09-23-2021 10:07:33 AM

1 kudos

https://github.com/delta-io/delta-rs/issues/392 This issue is being actively worked on .

1 kudos

09-23-2021 10:07:33 AM

1 More Replies

by JigaoLuo • New Contributor

12-25-2019 4:01:36 AM

4141 Views
3 replies
0 kudos

OPTIMIZE error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'OPTIMIZE'

Hi everyone. I am trying to learn the keyword OPTIMIZE from this blog using scala: https://docs.databricks.com/delta/optimizations/optimization-examples.html#delta-lake-on-databricks-optimizations-scala-notebook. But my local spark seems not able t...

Data Engineering

4141 Views
3 replies
0 kudos

12-25-2019 4:01:36 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-13-2020 2:30:18 PM

0 kudos

Hi Jigao, OPTIMIZE isn't in the open source delta API, so won't run on your local Spark instance - https://docs.delta.io/latest/api/scala/io/delta/tables/index.html?search=optimize

0 kudos

05-13-2020 2:30:18 PM

2 More Replies