- 16152 Views
- 8 replies
- 14 kudos
A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.I don't want to load the data in temp table . Need to load directly and parallelly in to delta...
- 16152 Views
- 8 replies
- 14 kudos
Latest Reply
I'm having the same issue, need to load a large amount of data from separate files into a delta table and I want to do it with a for each loop so I don't have to run it sequentially which will take days. There should be a way to handle this
7 More Replies
- 8372 Views
- 1 replies
- 2 kudos
How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...
- 8372 Views
- 1 replies
- 2 kudos
Latest Reply
Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,
- 1752 Views
- 1 replies
- 3 kudos
Databricks Auto Loader is an interesting feature that can be used to load data incrementally.✳ It can process new data files as they arrive in the cloud object stores✳ It can be used to ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT and even Binary file ...
- 1752 Views
- 1 replies
- 3 kudos
- 11077 Views
- 5 replies
- 0 kudos
How to remove checkpoints from DeltaLake table ?I see that on my delta table exist a few checkpoints I want to remove the oldest one. It seems that existing of it is blocking removing the oldest _delta_logs entries
- 11077 Views
- 5 replies
- 0 kudos
Latest Reply
Hi @Pawel Woj Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...
4 More Replies
- 4064 Views
- 3 replies
- 3 kudos
I just tried to write to a delta lake table using override mode, and I found that history is reserved. It's unclear to me how the data is overridden, and how long the history could be preserved. As they say, a code is better than a thousand words: my...
- 4064 Views
- 3 replies
- 3 kudos
Latest Reply
Hi @Rami ALZEBAK overwrite means first it will remove the data and again it will write the whole data.If you want to see history use can use DESCRIBE HISTORY command
2 More Replies
- 1499 Views
- 0 replies
- 2 kudos
Access Data in Databricks Using an Application or your Favorite BI ToolYou can leverage Partner Connect for easy, low-configuration connections to some of the most popular BI tools through our optimized connectors. Alternatively, you can follow these...
- 1499 Views
- 0 replies
- 2 kudos
by
Dicer
• Valued Contributor
- 23169 Views
- 12 replies
- 13 kudos
I wrote the following code:data = spark.sql (" SELECT A_adjClose, AA_adjClose, AAL_adjClose, AAP_adjClose, AAPL_adjClose FROM deltabase.a_30min_delta, deltabase.aa_30min_delta, deltabase.aal_30min_delta, deltabase.aap_30min_delta ,deltabase.aapl_30m...
- 23169 Views
- 12 replies
- 13 kudos
Latest Reply
I just discovered a solution.Today, I opened Azure Databricks. When I imported python libraries. Databricks told me that toPandas() was deprecated and it suggested me to use toPandas.The following solution works: Use toPandas instead of toPandas() da...
11 More Replies
by
Autel
• New Contributor II
- 4285 Views
- 3 replies
- 0 kudos
HI, I'm interested to know if multiple executors to append the same hive table using saveAsTable or insertInto sparksql. will that cause any data corruption? What configuration do I need to enable concurrent write to same hive table? what about the s...
- 4285 Views
- 3 replies
- 0 kudos
Latest Reply
The Hive table will not like this, as the underlying data is parquet format which is not ACID compliant.Delta lake however is:https://docs.delta.io/0.5.0/concurrency-control.htmlYou can see that inserts do not give conflicts.
2 More Replies
- 924 Views
- 0 replies
- 1 kudos
Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...
- 924 Views
- 0 replies
- 1 kudos
- 4401 Views
- 7 replies
- 0 kudos
I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...
- 4401 Views
- 7 replies
- 0 kudos
Latest Reply
for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)
6 More Replies
- 3031 Views
- 2 replies
- 0 kudos
Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time? Will it impact the job result/performance?
- 3031 Views
- 2 replies
- 0 kudos
Latest Reply
In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...
1 More Replies
- 550 Views
- 0 replies
- 0 kudos
Can I create a Delta Lake table on Databricks and query it with open-source Spark?Yes, in order to do this, you would install Open Source Spark and Delta Lake, both are open source. Delta Engine, which is only available on Databricks, will make delta...
- 550 Views
- 0 replies
- 0 kudos