cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by New Contributor III
  • 7233 Views
  • 6 replies
  • 5 kudos

Resolved! Override and Merge mode write using AutoLoader in Databricks

We are reading files using Autoloader in Databricks. Source system is giving full snapshot of complete data in files. So we want to read the data and write in delta table in override mode so all old data is replaced by the new data. Similarly for oth...

  • 7233 Views
  • 6 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @Ranjeet Jaiswal​ , Does @Werner Stinckens​ 's solution work in your case?

  • 5 kudos
5 More Replies
elgeo
by Valued Contributor II
  • 1483 Views
  • 1 replies
  • 1 kudos

Merge didn't fail while inserting wrong data type values

Hello. During some example cases we were running, in order to identify how Databricks treats possible wrong actions we could make, we noticed that merge doesn't fail while inserting different data type values from the ones in the corresponding table....

Target table schema merge_failure image merge_failure3
  • 1483 Views
  • 1 replies
  • 1 kudos
Latest Reply
elgeo
Valued Contributor II
  • 1 kudos

Hello. Any update on this please? Thank you in advance

  • 1 kudos
User16826994223
by Honored Contributor III
  • 9351 Views
  • 2 replies
  • 3 kudos

How to Prevent Duplicate Entries to enter to delta lake of Azure Storage

I Have a Dataframe stored in the format of delta into Adls, now when im trying to append new updated rows to that delta lake it should, Is there any way where i can delete the old existing record in delta and add the new updated Record.There is a uni...

  • 9351 Views
  • 2 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 3 kudos

You should use a MERGE command on this table to match records on the unique column. Delta Lake does not enforce primary keys so if you append only the duplicate ids will appear. Merge will provide you the functionality you desire. https://docs.databr...

  • 3 kudos
1 More Replies
Constantine
by Contributor III
  • 3460 Views
  • 3 replies
  • 2 kudos

Resolved! OPTIMIZE throws an error after doing MERGE on the table

I have a table on which I do upsert i.e. MERGE INTO table_name ...After which I run OPTIMIZE table_nameWhich throws an errorjava.util.concurrent.ExecutionException: io.delta.exceptions.ConcurrentDeleteReadException: This transaction attempted to read...

  • 3460 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

You can try to change isolation level:https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/isolation-levelIn merge is good to specify all partitions in merge conditions.It can also happen when script is running concurrently.

  • 2 kudos
2 More Replies
Sumeet_Dora
by New Contributor II
  • 1744 Views
  • 2 replies
  • 4 kudos

Resolved! Write mode features in Bigquey using Databricks notebook.

Currently using df.write.format("bigquery") ,Databricks only supports append and Overwrite modes in to writing Bigquery tables.Does Databricks has any option of executing the DMLs like Merge in to Bigquey using Databricks Notebooks.?

  • 1744 Views
  • 2 replies
  • 4 kudos
Latest Reply
mathan_pillai
Valued Contributor
  • 4 kudos

@Sumeet Dora​ , Unfortunately there is no direct "merge into" option for writing to Bigquery using Databricks notebook. You could write to an intermediate delta table using the "merge into" option in delta table. Then read from the delta table and pe...

  • 4 kudos
1 More Replies
brickster_2018
by Esteemed Contributor
  • 845 Views
  • 2 replies
  • 0 kudos
  • 845 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Delta has significant value beyond the DML/ACID capabilities. Delta's data organization strategies that @Ryan Chynoweth​ mentions also offer an advantage even for read-only use cases for querying and joining the data. Delta also supports in-place con...

  • 0 kudos
1 More Replies
User16826992666
by Valued Contributor
  • 1691 Views
  • 1 replies
  • 0 kudos

Resolved! When running a Merge, if records from the table are deleted are the underlying files that contain the records deleted as well?

I know I have the option to delete rows from a Delta table when running a merge. But I'm confused about how that would actually affect the files that contain the deleted records. Are those files deleted, or are they rewritten, or what?

  • 1691 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Delta implements MERGE by physically rewriting existing files. It is implemented  in two steps.Perform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in t...

  • 0 kudos
SwapanSwapandee
by New Contributor II
  • 7536 Views
  • 2 replies
  • 0 kudos

How to pass column names in selectExpr through one or more string parameters in spark using scala?

I am using script for CDC Merge in spark streaming. I wish to pass column values in selectExpr through a parameter as column names for each table would change. When I pass the columns and struct field through a string variable, I am getting error as...

  • 7536 Views
  • 2 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Swapan Swapandeep Marwaha, Can you pass them as a Seq as in below code, keyCols = Seq("col1", "col2"), structCols = Seq("struct(offset,KAFKA_TS) as otherCols")

  • 0 kudos
1 More Replies
Labels