cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Glob pattern for copy into

rkand
New Contributor

I am trying to load some files in my Azure storage container using copy into method. The files have a naming convention of "2023-<month>-<date> <timestamp>".csv.gz. All the files are in one folder.  I want to load only files for month 2. 

So I've used copy into functionality with glob pattern but it doesn't seem to identify it. I get an error on saying relative path is absolute URI. 

Any inputs on this ? 

 

 

 

2 REPLIES 2

agallard
Contributor

Hi @rkand,

You can update the pattern to target only files with a 2023-02 prefix in their names.This will match all files from February, regardless of the specific date and timestamp.

Try with PATTERN = '2023-02-*.csv.gz'

This pattern matches any files that start with 2023-02, followed by any date and timestamp, and ending in .csv.gz. 

Try and comments!

Regards.

Alfonso Gallardo
-------------------
๏”ง I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark

VZLA
Databricks Employee
Databricks Employee

TL;DR Try removing the trailing slash in the FROM value. The trailing slash in FROM confuses the URI parser, making it think that PATTERN might be an absolute path rather than a relative one.

The error message points to a problem not with respect to the pattern itself, but the interpretation of both the "FROM" and "PATTERN" resulting in the exception. Removing the trailing slash should help the Path constructor interpret the FROM path as a clean base URI, allowing the PATTERN to function correctly as a relative path, thus avoiding the "Relative path in absolute URI" error.

For more clarity, you may refer to https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/ap... (find your matching hadoop version).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group