cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AndriusVitkausk
by New Contributor III
  • 928 Views
  • 2 replies
  • 1 kudos

Autoloader event vs directory ingestion

For a production work load containing around 15k gzip compressed json files per hour all in a YYYY/MM/DD/HH/id/timestamp.json.gz directoryWhat would be the better approach on ingesting this into a delta table in terms of not only the incremental load...

  • 928 Views
  • 2 replies
  • 1 kudos
Latest Reply
AndriusVitkausk
New Contributor III
  • 1 kudos

@Kaniz Fatma​ So i've not found a fix for the small file problem using autoloader, seems to struggle really badly against large directories, had a cluster running for 8h stuck on "listing directory" part with no end, cluster seemed completely idle to...

  • 1 kudos
1 More Replies
Labels