- 1205 Views
- 1 replies
- 2 kudos
How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...
- 1205 Views
- 1 replies
- 2 kudos
Latest Reply
Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,
- 6962 Views
- 9 replies
- 13 kudos
A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.I don't want to load the data in temp table . Need to load directly and parallelly in to delta...
- 6962 Views
- 9 replies
- 13 kudos
- 557 Views
- 1 replies
- 3 kudos
Databricks Auto Loader is an interesting feature that can be used to load data incrementally.✳ It can process new data files as they arrive in the cloud object stores✳ It can be used to ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT and even Binary file ...
- 557 Views
- 1 replies
- 3 kudos
- 3158 Views
- 5 replies
- 0 kudos
How to remove checkpoints from DeltaLake table ?I see that on my delta table exist a few checkpoints I want to remove the oldest one. It seems that existing of it is blocking removing the oldest _delta_logs entries
- 3158 Views
- 5 replies
- 0 kudos
Latest Reply
Hi @Pawel Woj​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...
4 More Replies
- 1650 Views
- 3 replies
- 3 kudos
I just tried to write to a delta lake table using override mode, and I found that history is reserved. It's unclear to me how the data is overridden, and how long the history could be preserved. As they say, a code is better than a thousand words: my...
- 1650 Views
- 3 replies
- 3 kudos
Latest Reply
Hi @Rami ALZEBAK​ overwrite means first it will remove the data and again it will write the whole data.If you want to see history use can use DESCRIBE HISTORY command
2 More Replies
- 704 Views
- 1 replies
- 3 kudos
Access Data in Databricks Using an Application or your Favorite BI ToolYou can leverage Partner Connect for easy, low-configuration connections to some of the most popular BI tools through our optimized connectors. Alternatively, you can follow these...
- 704 Views
- 1 replies
- 3 kudos
Latest Reply
Thank you, @Isaac Gritz​ , for sharing this fantastic post!
by
Dicer
• Valued Contributor
- 10483 Views
- 13 replies
- 13 kudos
I wrote the following code:​data = spark.sql (" SELECT A_adjClose, AA_adjClose, AAL_adjClose, AAP_adjClose, AAPL_adjClose FROM deltabase.a_30min_delta, deltabase.aa_30min_delta, deltabase.aal_30min_delta, deltabase.aap_30min_delta ,deltabase.aapl_30m...
- 10483 Views
- 13 replies
- 13 kudos
Latest Reply
I just discovered a solution.Today, I opened Azure Databricks. When I imported python libraries. Databricks told me that toPandas() was deprecated and it suggested me to use toPandas.The following solution works: Use toPandas instead of toPandas() da...
12 More Replies
by
Autel
• New Contributor II
- 2150 Views
- 4 replies
- 1 kudos
HI, I'm interested to know if multiple executors to append the same hive table using saveAsTable or insertInto sparksql. will that cause any data corruption? What configuration do I need to enable concurrent write to same hive table? what about the s...
- 2150 Views
- 4 replies
- 1 kudos
Latest Reply
Hi @Weide Zhang​ , Does @[werners] (Customer)​ 's reply answer your question?
3 More Replies
- 320 Views
- 0 replies
- 1 kudos
Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...
- 320 Views
- 0 replies
- 1 kudos
- 1912 Views
- 7 replies
- 0 kudos
I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...
- 1912 Views
- 7 replies
- 0 kudos
Latest Reply
for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)
6 More Replies
- 1533 Views
- 2 replies
- 0 kudos
Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time? Will it impact the job result/performance?
- 1533 Views
- 2 replies
- 0 kudos
Latest Reply
In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...
1 More Replies
- 243 Views
- 0 replies
- 0 kudos
Can I create a Delta Lake table on Databricks and query it with open-source Spark?Yes, in order to do this, you would install Open Source Spark and Delta Lake, both are open source. Delta Engine, which is only available on Databricks, will make delta...
- 243 Views
- 0 replies
- 0 kudos