<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: COPY INTO: How to add a partitioning? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22411#M15340</link>
    <description>&lt;P&gt;If you're looking to partition your `target_path` table, then it's recommended to define the partition keys prior to the COPY INTO command (at the DDL level)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;E.g.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;// Drop table if it already exists without the partition key defined (OPTIONNAL)
DROP TABLE IF EXISTS delta.`target_path`;
&amp;nbsp;
// Create with partition key (If table doesn't already exist with partitionning)
CREATE TABLE delta.`target_path`
  PARTITION BY key;
&amp;nbsp;
COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')
&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 18 Jun 2021 21:46:07 GMT</pubDate>
    <dc:creator>Mooune_DBU</dc:creator>
    <dc:date>2021-06-18T21:46:07Z</dc:date>
    <item>
      <title>COPY INTO: How to add a partitioning?</title>
      <link>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22410#M15339</link>
      <description>&lt;P&gt;The command&amp;nbsp;&lt;/P&gt;&lt;P&gt;COPY INTO&lt;/P&gt;&lt;P&gt;&amp;nbsp;from Databricks provides an idempotent file ingestion into a delta table, see&amp;nbsp;&lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html" alt="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html" target="_blank"&gt;here&lt;/A&gt;. From the docs, an example command looks like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;However, I want my data to be ingested via a partition, which can be accomplished with the&amp;nbsp;&lt;/P&gt;&lt;P&gt;PARTITIONED BY&lt;/P&gt;&lt;P&gt;&amp;nbsp;command in a normal ingestion. Is it possible to execute a&amp;nbsp;&lt;/P&gt;&lt;P&gt;COPY INTO&lt;/P&gt;&lt;P&gt;&amp;nbsp;command with partitioning? Simply adding the partitioning clause yields the following error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Error in SQL statement: ParseException: 
mismatched input 'PARTITIONED' expecting {&amp;lt;EOF&amp;gt;, ';'}(line 8, pos 0)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 20:27:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22410#M15339</guid>
      <dc:creator>User16765131552</dc:creator>
      <dc:date>2021-06-18T20:27:31Z</dc:date>
    </item>
    <item>
      <title>Re: COPY INTO: How to add a partitioning?</title>
      <link>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22411#M15340</link>
      <description>&lt;P&gt;If you're looking to partition your `target_path` table, then it's recommended to define the partition keys prior to the COPY INTO command (at the DDL level)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;E.g.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;// Drop table if it already exists without the partition key defined (OPTIONNAL)
DROP TABLE IF EXISTS delta.`target_path`;
&amp;nbsp;
// Create with partition key (If table doesn't already exist with partitionning)
CREATE TABLE delta.`target_path`
  PARTITION BY key;
&amp;nbsp;
COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')
&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 21:46:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22411#M15340</guid>
      <dc:creator>Mooune_DBU</dc:creator>
      <dc:date>2021-06-18T21:46:07Z</dc:date>
    </item>
    <item>
      <title>Re: COPY INTO: How to add a partitioning?</title>
      <link>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22412#M15341</link>
      <description>&lt;P&gt;should it be `IF &lt;B&gt;NOT&lt;/B&gt; EXISTS` in CREATE TABLE? &lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 05:33:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22412#M15341</guid>
      <dc:creator>alexott</dc:creator>
      <dc:date>2021-06-23T05:33:44Z</dc:date>
    </item>
    <item>
      <title>Re: COPY INTO: How to add a partitioning?</title>
      <link>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22413#M15342</link>
      <description>&lt;P&gt;Yes you are correct.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 14:43:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/copy-into-how-to-add-a-partitioning/m-p/22413#M15342</guid>
      <dc:creator>Mooune_DBU</dc:creator>
      <dc:date>2021-06-23T14:43:18Z</dc:date>
    </item>
  </channel>
</rss>

