<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Glob pattern for copy into in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/glob-pattern-for-copy-into/m-p/96758#M39342</link>
    <description>&lt;P&gt;I am trying to load some files in my Azure storage container using copy into method. The files have a naming convention of "2023-&amp;lt;month&amp;gt;-&amp;lt;date&amp;gt; &amp;lt;timestamp&amp;gt;".csv.gz. All the files are in one folder.&amp;nbsp; I want to load only files for month 2.&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I've used copy into functionality with glob pattern but it doesn't seem to identify it. I get an error on saying relative path is absolute URI.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any inputs on this ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 30 Oct 2024 07:05:10 GMT</pubDate>
    <dc:creator>rkand</dc:creator>
    <dc:date>2024-10-30T07:05:10Z</dc:date>
    <item>
      <title>Glob pattern for copy into</title>
      <link>https://community.databricks.com/t5/data-engineering/glob-pattern-for-copy-into/m-p/96758#M39342</link>
      <description>&lt;P&gt;I am trying to load some files in my Azure storage container using copy into method. The files have a naming convention of "2023-&amp;lt;month&amp;gt;-&amp;lt;date&amp;gt; &amp;lt;timestamp&amp;gt;".csv.gz. All the files are in one folder.&amp;nbsp; I want to load only files for month 2.&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I've used copy into functionality with glob pattern but it doesn't seem to identify it. I get an error on saying relative path is absolute URI.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any inputs on this ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Oct 2024 07:05:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/glob-pattern-for-copy-into/m-p/96758#M39342</guid>
      <dc:creator>rkand</dc:creator>
      <dc:date>2024-10-30T07:05:10Z</dc:date>
    </item>
    <item>
      <title>Re: Glob pattern for copy into</title>
      <link>https://community.databricks.com/t5/data-engineering/glob-pattern-for-copy-into/m-p/96950#M39369</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9284"&gt;@rkand&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;You can update the pattern to target only files with a 2023-02 prefix in their names.This will match all files from February, regardless of the specific date and timestamp.&lt;/P&gt;&lt;P&gt;Try with&amp;nbsp;&lt;STRONG&gt;&lt;SPAN class=""&gt;PATTERN&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;'2023-02-*.csv.gz'&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;This pattern matches any files that start with 2023-02, followed by any date and timestamp, and ending in .csv.gz.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Try and comments!&lt;/P&gt;&lt;P&gt;Regards.&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2024 09:53:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/glob-pattern-for-copy-into/m-p/96950#M39369</guid>
      <dc:creator>agallard</dc:creator>
      <dc:date>2024-10-31T09:53:13Z</dc:date>
    </item>
    <item>
      <title>Re: Glob pattern for copy into</title>
      <link>https://community.databricks.com/t5/data-engineering/glob-pattern-for-copy-into/m-p/96974#M39378</link>
      <description>&lt;P&gt;&lt;STRONG&gt;TL;DR&amp;nbsp;&lt;/STRONG&gt;Try removing the trailing slash in the &lt;STRONG&gt;FROM&lt;/STRONG&gt; value.&amp;nbsp;The trailing slash in &lt;STRONG&gt;FROM&lt;/STRONG&gt; confuses the URI parser, making it think that PATTERN might be an absolute path rather than a relative one.&lt;/P&gt;
&lt;P&gt;The error message points to a problem not with respect to the pattern itself, but the interpretation of both the "FROM" and "PATTERN" resulting in the exception.&amp;nbsp;Removing the trailing slash should help the Path constructor interpret the FROM path as a clean base URI, allowing the PATTERN to function correctly as a relative path, thus avoiding the "Relative path in absolute URI" error.&lt;/P&gt;
&lt;P&gt;For more clarity, you may refer to&amp;nbsp;&lt;A href="https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Path.java" target="_blank"&gt;https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Path.java&lt;/A&gt;&amp;nbsp;(find your matching hadoop version).&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2024 11:54:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/glob-pattern-for-copy-into/m-p/96974#M39378</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-10-31T11:54:44Z</dc:date>
    </item>
  </channel>
</rss>

