cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

When should you use the directory listing vs file notification

BenLambert
Contributor

We are using Delta Live Tables for running ingestion pipelines and have come across the two options for the autoloader "file notification" vs "directory listing" this is reflected in the option cloudFiles.useIncrementalListing. We are wondering what the best practices are around which of these to use for the autoloader and when we should use one vs the other?

1 REPLY 1

Anonymous
Not applicable

@Bennett Lambertโ€‹ :

The choice between using "file notification" vs "directory listing" for the autoloader in Delta Live Tables depends on your specific use case and requirements. Here are some general guidelines:

  1. Use file notification if you need real-time ingestion: File notification uses event-based triggers to detect new files in a source storage location, which allows for real-time ingestion as soon as new files are added.
  2. Use directory listing if you need to control the ingestion frequency: Directory listing periodically scans the source storage location for new files, which allows you to control the frequency of ingestion. This can be useful if you need to limit the number of ingested files or control the timing of ingestion.
  3. Use file notification for small files: File notification is more efficient for small files because it avoids scanning the entire directory for changes.
  4. Use directory listing for large files: Directory listing is more efficient for large files because it can reduce the overhead of scanning and processing each file individually.

In summary, if you need real-time ingestion and have a large number of small files, use file notification. If you need to control the ingestion frequency or have a small number of large files, use directory listing.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group