Slow updates/upserts in Delta tables
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2023 08:43 AM
When using Delta tables with DBR jobs or even with DLT pipelines, the upserts (especially updates) (on key and timestamp) are taking quite higher than expected time to update the files/tables data (~2 mins for even 1 record poll) (Inserts are lightning fast). The backend parquet file which is being updated for even that 1 record contains other records as well.
What we tried:
Partitioning on key proved to be a verry bad idea and made even the inserts too slow.
ZORDER on key was also not helpful.
Please help on what can we improve to update delta table in real time with Kafka topic as source and Spark Streaming keeping compute as last option
Labels:
- Labels:
-
Delta Tables
-
DLT
-
Real time data