NandiniN
Databricks Employee
Databricks Employee

Hello @Ovidiu Eremia​ ,

To filter which folders on S3 contain Delta tables, you can look for the specific files that are associated with Delta tables. Delta Lake stores its metadata in a hidden folder named

_delta_log, which is located at the root of the Delta table. So, you can check for this folder.

In the below code, we first get the S3 bucket and the objects under the specified prefix. We then filter out only those objects that represent Delta tables by checking if their keys end with _delta_log/. Finally, we extract the folder names from the Delta object paths and print the list of folders that contain Delta tables.

import boto3
 
s3 = boto3.resource('s3')
bucket_name = 'your-bucket-name' 
prefix = 'path/to/folders'
 
# Get the S3 bucket and the objects under the specified prefix
bucket = s3.Bucket(bucket_name)
objects = bucket.objects.filter(Prefix=prefix)
 
# Filter out only the objects that represent Delta tables
delta_objects = [obj.key for obj in objects if obj.key.endswith('_delta_log/')]
 
# Extract the folder names from the Delta object paths
delta_folders = [obj.split('_delta_log/')[0] for obj in delta_objects]
 
print(delta_folders)

References: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/collections.html

Hope this helps.

Thanks & Regards,

Nandini