Databricks Community

Anup · ‎12-05-2023

While trying to ingest data from the S3 bucket, we are running into a situation where the data in s3 buckets is in sub-folders of multiple depths.

Is there a good way of specifying patterns for the above case?

We tried using the following for a depth of 4, and it works.

%sql

COPY INTO table_name

FROM 's3://bucket_name'

FILEFORMAT = BINARYFILE

PATTERN = '/*/*/*/*'

FORMAT_OPTIONS ('mergeSchema' = 'true',

'header' = 'true')

COPY_OPTIONS ('mergeSchema' = 'true');

However, it is not always possible to know the exact depth for a large amount of random data.

Has anyone run into this problem and was able to solve it?

Anup · ‎12-07-2023

We ended up implementing the programmatic approach to calculate the depth (using boto3).

Anup · ‎12-07-2023

We ended up implementing the programmatic approach to calculate the depth (using boto3).

Copy Into : Pattern for sub-folders