topic Re: COPY INTO: How to add a partitioning? in Data Engineering

COPY INTO: How to add a partitioning?

User16765131552 — Fri, 18 Jun 2021 20:27:31 GMT

The command

COPY INTO

from Databricks provides an idempotent file ingestion into a delta table, see here. From the docs, an example command looks like this:

COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')

However, I want my data to be ingested via a partition, which can be accomplished with the

PARTITIONED BY

command in a normal ingestion. Is it possible to execute a

COPY INTO

command with partitioning? Simply adding the partitioning clause yields the following error:

Error in SQL statement: ParseException: 
mismatched input 'PARTITIONED' expecting {<EOF>, ';'}(line 8, pos 0)

Re: COPY INTO: How to add a partitioning?

Mooune_DBU — Fri, 18 Jun 2021 21:46:07 GMT

If you're looking to partition your `target_path` table, then it's recommended to define the partition keys prior to the COPY INTO command (at the DDL level)

E.g.

// Drop table if it already exists without the partition key defined (OPTIONNAL)
DROP TABLE IF EXISTS delta.`target_path`;
 
// Create with partition key (If table doesn't already exist with partitionning)
CREATE TABLE delta.`target_path`
  PARTITION BY key;
 
COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')

Re: COPY INTO: How to add a partitioning?

alexott — Wed, 23 Jun 2021 05:33:44 GMT

should it be `IF NOT EXISTS` in CREATE TABLE?

Re: COPY INTO: How to add a partitioning?

Mooune_DBU — Wed, 23 Jun 2021 14:43:18 GMT

Yes you are correct.