cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Copy Into : Pattern for sub-folders

Anup
New Contributor III

While trying to ingest data from the S3 bucket, we are running into a situation where the data in s3 buckets is in sub-folders of multiple depths.

Is there a good way of specifying patterns for the above case?

We tried using the following for a depth of 4, and it works.

 

%sql
COPY INTO table_name
FROM 's3://bucket_name'
FILEFORMAT = BINARYFILE
PATTERN = '/*/*/*/*'
FORMAT_OPTIONS ('mergeSchema' = 'true',
'header' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true');
 
However, it is not always possible to know the exact depth for a large amount of random data.
Has anyone run into this problem and was able to solve it?
1 ACCEPTED SOLUTION

Accepted Solutions

Anup
New Contributor III

Thanks @Retired_mod ,

We ended up implementing the programmatic approach to calculate the depth (using boto3).

View solution in original post

1 REPLY 1

Anup
New Contributor III

Thanks @Retired_mod ,

We ended up implementing the programmatic approach to calculate the depth (using boto3).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group