Glob pattern for copy into
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2024 12:05 AM
I am trying to load some files in my Azure storage container using copy into method. The files have a naming convention of "2023-<month>-<date> <timestamp>".csv.gz. All the files are in one folder. I want to load only files for month 2.
So I've used copy into functionality with glob pattern but it doesn't seem to identify it. I get an error on saying relative path is absolute URI.
Any inputs on this ?
- Labels:
-
Delta Lake
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2024 02:52 AM - edited 10-31-2024 02:53 AM
Hi @rkand,
You can update the pattern to target only files with a 2023-02 prefix in their names.This will match all files from February, regardless of the specific date and timestamp.
Try with PATTERN = '2023-02-*.csv.gz'
This pattern matches any files that start with 2023-02, followed by any date and timestamp, and ending in .csv.gz.
Try and comments!
Regards.
-------------------
I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2024 04:54 AM
TL;DR Try removing the trailing slash in the FROM value. The trailing slash in FROM confuses the URI parser, making it think that PATTERN might be an absolute path rather than a relative one.
The error message points to a problem not with respect to the pattern itself, but the interpretation of both the "FROM" and "PATTERN" resulting in the exception. Removing the trailing slash should help the Path constructor interpret the FROM path as a clean base URI, allowing the PATTERN to function correctly as a relative path, thus avoiding the "Relative path in absolute URI" error.
For more clarity, you may refer to https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/ap... (find your matching hadoop version).

