cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Slow updates/upserts in Delta tables

SS0201
New Contributor II

When using Delta tables with DBR jobs or even with DLT pipelines, the upserts (especially updates) (on key and timestamp) are taking quite higher than expected time to update the files/tables data (~2 mins for even 1 record poll) (Inserts are lightning fast). The backend parquet file which is being updated for even that 1 record contains other records as well.

What we tried:

Partitioning on key proved to be a verry bad idea and made even the inserts too slow.

ZORDER on key was also not helpful.

Please help on what can we improve to update delta table in real time with Kafka topic as source and Spark Streaming keeping compute as last option

4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, how big in sizes the files/tables are?

SS0201
New Contributor II

There is only 1 target table (dev approx 45Mn records), the Delta table. Backend parquet files (abfs) are dispersed by internal DBR algorithms.

Also, After ZORDER on PKey, the files got arranged in almost same size, but still slow upserts were there.

result after doing ZORDER:

image.pngThis is Dev result. Prod data size is more than 10x

Which DBR version are you using? Low shuffle merge might help docs https://docs.databricks.com/optimizations/low-shuffle-merge.html

Anonymous
Not applicable

Hi @Surya Agarwal​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!