Databricks Autoloader Best practice
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 02:28 PM
Databricks Autoloader is a popular mechanism for ingesting data/files from cloud storage into Delta; for a very high throughput source, what are the best practices to be following while scaling up an autoloader based pipeline to the tune of millions of events per minute;
While looking at tuning "cloudFiles.fetchParallelism" is something to look at are they any other configurations that need tuning? presumably fetch rate increase should be paired with delete rate from sqs/aqs as well ?
Labels:
- Labels:
-
Autloader
-
Autoloader