12-24-2024 02:14 AM
For Databricks File Trigger below limitation is mentioned.
A storage location configured for a file arrival trigger can contain only up to 10,000 files. Locations with more files cannot be monitored for new file arrivals. If the configured storage location is a subpath of a Unity Catalog external location or volume, the 10,000 file limit applies to the subpath and not the root of the storage location. For example, the root of the storage location can contain more than 10,000 files across its subdirectories, but the configured subdirectory must not exceed the 10,000 file limit.
1. Does this mean if the files are moved from one container to another it will reset the file counter?
2. If we have to setup structure like dir_name/YYYYMMDD structure for external location. Do we have to change external location path for each month for triggered to be verified.
12-24-2024 04:13 AM
Yes, moving files from one container to another will reset the file counter for the Databricks File Trigger. The 10,000 file limit applies to the specific storage location being monitored. If files are moved out of this location, they are no longer counted towards the limit, effectively resetting the counter for the new location.
If you set up a structure like dir_name/YYYYMMDD for the external location, you will need to change the external location path for each month to ensure the trigger is verified. This is because the file trigger monitors a specific path, and each new month would require a new path to be monitored to stay within the 10,000 file limit.
12-24-2024 04:13 AM
Yes, moving files from one container to another will reset the file counter for the Databricks File Trigger. The 10,000 file limit applies to the specific storage location being monitored. If files are moved out of this location, they are no longer counted towards the limit, effectively resetting the counter for the new location.
If you set up a structure like dir_name/YYYYMMDD for the external location, you will need to change the external location path for each month to ensure the trigger is verified. This is because the file trigger monitors a specific path, and each new month would require a new path to be monitored to stay within the 10,000 file limit.
03-13-2025 07:43 AM
Hi Walter, Thank you for the information. In our current project we are using DLTs and for storage we are using ADLS. However I am able to move the folder to another location and keep the file arrival trigger location limit less than 10,000 files. But in our current project we do a full refresh. Can you give me an approach for this?
01-06-2025 08:47 AM
I would like to confirm something. We are using Azure Databricks and Azure BLOB storage.
We have a `landing` container that has directories such as `request_type_a` and `request_type_b`, each receiving files that trigger different jobs in Databricks. We are starting to consider what happens when these directories get to 10,000 BLOBs.
We are thinking about moving older BLOBs out of these directories into another archive directory that is not monitored by Databricks, creating a structure like:
landing/request_type_a/file.json
landing/request_type_a_archive/old_file.json
landing/request_type_b/file.json
landing/request_type_b_archive/old_file.jsonIs this a reasonable method of ensuring we do not exceed the 10,000 file limit, or do you foresee that this would cause issues?
Additionally, do you know if changing older files to use the archive tier would result in these files not being counted in the 10,000 limit?
01-07-2025 09:39 AM
Your approach to managing the number of BLOBs in your Azure BLOB storage by moving older files to an archive directory is reasonable and can help ensure you do not exceed the 10,000 file limit in the monitored directories. This method will help keep the number of files in the request_type_a and request_type_b directories manageable, which is important for performance and operational efficiency.
Regarding your question about changing older files to use the archive tier, it is important to note that the Azure BLOB storage account has a limit on the number of BLOBs per container, not specifically on the number of active or archived BLOBs. Therefore, moving files to the archive tier will not reduce the count of BLOBs in the container; it will only change their storage tier. The 10,000 BLOB limit applies to the total number of BLOBs in the container, regardless of their tier.
 
					
				
				
			
		
 
					
				
				
			
		
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now