cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Matt_L
by New Contributor III
  • 4811 Views
  • 3 replies
  • 3 kudos

Resolved! Slow performance loading checkpoint file?

Using OSS Delta, hopefully this is the right forum for this question:Hey all, I could use some help as I feel like I’m doing something wrong here.I’m streaming from Kafka -> Delta on EMR/S3FS, and am seeing ever-increasingly slow batches.When looking...

  • 4811 Views
  • 3 replies
  • 3 kudos
Latest Reply
Matt_L
New Contributor III
  • 3 kudos

Found the answer through the Slack user group, courtesy of an Adam Binford.I had set `delta.logRetentionDuration='24 HOURS'` but did not set `delta.deletedFileRetentionDuration`, and so the checkpoint file still had all the accumulated tombstones sin...

  • 3 kudos
2 More Replies
gbrueckl
by Contributor II
  • 4013 Views
  • 10 replies
  • 9 kudos

Slow performance of VACUUM on Azure Data Lake Store Gen2

We need to run VACCUM on one of our biggest tables to free the storage. According to our analysis using VACUUM bigtable DRY RUN this affects 30M+ files that need to be deleted.If we run the final VACUUM, the file-listing takes up to 2h (which is OK) ...

  • 4013 Views
  • 10 replies
  • 9 kudos
Latest Reply
Deepak_Bhutada
Contributor III
  • 9 kudos

@Gerhard Brueckl​ we have seen near 80k-120k file deletions in Azure per hour while doing a VACUUM on delta tables, it's just that the vacuum is slower in azure and S3. It might take some time as you said when deleting the files from the delta path. ...

  • 9 kudos
9 More Replies
Labels