Topics with Label: Delta lake table

Forum Posts

Sorted by:

by Anonymous • Not applicable

06-23-2022 10:38:14 AM

16152 Views
8 replies
14 kudos

Resolved! MetadataChangedException

A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.I don't want to load the data in temp table . Need to load directly and parallelly in to delta...

Data Engineering

16152 Views
8 replies
14 kudos

06-23-2022 10:38:14 AM

View Replies

Latest Reply

cpc0707
New Contributor II

12-05-2024 2:43:30 PM

14 kudos

I'm having the same issue, need to load a large amount of data from separate files into a delta table and I want to do it with a for each loop so I don't have to run it sequentially which will take days. There should be a way to handle this

14 kudos

12-05-2024 2:43:30 PM

7 More Replies

by isaac_gritz • Databricks Employee

08-23-2022 12:10:35 AM

8372 Views
1 replies
2 kudos

Change Data Capture with Databricks

How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...

Data Engineering

8372 Views
1 replies
2 kudos

08-23-2022 12:10:35 AM

View Replies

Latest Reply

prasad95
New Contributor III

02-12-2024 9:29:46 AM

2 kudos

Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,

2 kudos

02-12-2024 9:29:46 AM

by pvignesh92 • Honored Contributor

05-22-2023 12:10:59 AM

1752 Views
1 replies
3 kudos

lnkd.in

Databricks Auto Loader is an interesting feature that can be used to load data incrementally.✳ It can process new data files as they arrive in the cloud object stores✳ It can be used to ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT and even Binary file ...

Data Engineering

1752 Views
1 replies
3 kudos

05-22-2023 12:10:59 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

05-22-2023 1:51:07 AM

3 kudos

Thanks for sharing

3 kudos

05-22-2023 1:51:07 AM

by Hunter1604 • New Contributor II

03-30-2023 11:44:11 AM

11077 Views
5 replies
0 kudos

How to remove checkpoints from DeltaLake table ?

How to remove checkpoints from DeltaLake table ?I see that on my delta table exist a few checkpoints I want to remove the oldest one. It seems that existing of it is blocking removing the oldest _delta_logs entries

Data Engineering

11077 Views
5 replies
0 kudos

03-30-2023 11:44:11 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 7:09:15 PM

0 kudos

Hi @Pawel Woj Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

0 kudos

03-31-2023 7:09:15 PM

4 More Replies

by rami-lv • New Contributor II

12-13-2022 7:02:12 AM

4064 Views
3 replies
3 kudos

What gets overridden when writing overriding a delta lake table?

I just tried to write to a delta lake table using override mode, and I found that history is reserved. It's unclear to me how the data is overridden, and how long the history could be preserved. As they say, a code is better than a thousand words: my...

Data Engineering

4064 Views
3 replies
3 kudos

12-13-2022 7:02:12 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-13-2022 10:40:09 PM

3 kudos

Hi @Rami ALZEBAK overwrite means first it will remove the data and again it will write the whole data.If you want to see history use can use DESCRIBE HISTORY command

3 kudos

12-13-2022 10:40:09 PM

2 More Replies

by isaac_gritz • Databricks Employee

08-23-2022 12:14:12 AM

1499 Views
0 replies
2 kudos

Connecting Applications and BI Tools to Databricks SQL

Access Data in Databricks Using an Application or your Favorite BI ToolYou can leverage Partner Connect for easy, low-configuration connections to some of the most popular BI tools through our optimized connectors. Alternatively, you can follow these...

Data Engineering

1499 Views
0 replies
2 kudos

08-23-2022 12:14:12 AM

by Dicer • Valued Contributor

07-02-2022 4:27:46 AM

23169 Views
12 replies
13 kudos

Resolved! Failed to convert Spark.sql to Pandas Dataframe using .toPandas()

I wrote the following code:data = spark.sql (" SELECT A_adjClose, AA_adjClose, AAL_adjClose, AAP_adjClose, AAPL_adjClose FROM deltabase.a_30min_delta, deltabase.aa_30min_delta, deltabase.aal_30min_delta, deltabase.aap_30min_delta ,deltabase.aapl_30m...

Data Engineering

23169 Views
12 replies
13 kudos

07-02-2022 4:27:46 AM

View Replies

Latest Reply

Dicer
Valued Contributor

07-18-2022 11:39:47 PM

13 kudos

I just discovered a solution.Today, I opened Azure Databricks. When I imported python libraries. Databricks told me that toPandas() was deprecated and it suggested me to use toPandas.The following solution works: Use toPandas instead of toPandas() da...

13 kudos

07-18-2022 11:39:47 PM

11 More Replies

by Autel • New Contributor II

01-08-2022 9:31:05 PM

4285 Views
3 replies
0 kudos

Resolved! concurrent update to same hive or deltalake table

HI, I'm interested to know if multiple executors to append the same hive table using saveAsTable or insertInto sparksql. will that cause any data corruption? What configuration do I need to enable concurrent write to same hive table? what about the s...

Data Engineering

4285 Views
3 replies
0 kudos

01-08-2022 9:31:05 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-10-2022 1:21:46 AM

0 kudos

The Hive table will not like this, as the underlying data is parquet format which is not ACID compliant.Delta lake however is:https://docs.delta.io/0.5.0/concurrency-control.htmlYou can see that inserts do not give conflicts.

0 kudos

01-10-2022 1:21:46 AM

2 More Replies

by MadelynM • Databricks Employee

12-06-2021 12:24:26 PM

924 Views
0 replies
1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

Data Engineering

924 Views
0 replies
1 kudos

12-06-2021 12:24:26 PM

by Data_Bricks1 • New Contributor III

10-13-2021 11:47:18 AM

4401 Views
7 replies
0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

Data Engineering

4401 Views
7 replies
0 kudos

10-13-2021 11:47:18 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-14-2021 3:48:17 AM

0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

0 kudos

10-14-2021 3:48:17 AM

6 More Replies

by User16783853906 • Contributor III

06-10-2021 2:49:06 PM

3031 Views
2 replies
0 kudos

VACUUM during read/write

Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time? Will it impact the job result/performance?

Data Engineering

3031 Views
2 replies
0 kudos

06-10-2021 2:49:06 PM

View Replies

Latest Reply

User16783853906
Contributor III

06-23-2021 2:26:03 PM

0 kudos

In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...

0 kudos

06-23-2021 2:26:03 PM

1 More Replies

by User16826994223 • Honored Contributor III

06-17-2021 1:46:26 AM

550 Views
0 replies
0 kudos

databricks.com

Can I create a Delta Lake table on Databricks and query it with open-source Spark?Yes, in order to do this, you would install Open Source Spark and Delta Lake, both are open source. Delta Engine, which is only available on Databricks, will make delta...

Data Engineering

550 Views
0 replies
0 kudos

06-17-2021 1:46:26 AM