Re: COPY INTO: How to add a partitioning?

User16765131552 · ‎06-18-2021

The command

COPY INTO

from Databricks provides an idempotent file ingestion into a delta table, see here. From the docs, an example command looks like this:

COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')

However, I want my data to be ingested via a partition, which can be accomplished with the

PARTITIONED BY

command in a normal ingestion. Is it possible to execute a

COPY INTO

command with partitioning? Simply adding the partitioning clause yields the following error:

Error in SQL statement: ParseException: 
mismatched input 'PARTITIONED' expecting {<EOF>, ';'}(line 8, pos 0)

Mooune_DBU · ‎06-18-2021

If you're looking to partition your `target_path` table, then it's recommended to define the partition keys prior to the COPY INTO command (at the DDL level)

E.g.

// Drop table if it already exists without the partition key defined (OPTIONNAL)
DROP TABLE IF EXISTS delta.`target_path`;
 
// Create with partition key (If table doesn't already exist with partitionning)
CREATE TABLE delta.`target_path`
  PARTITION BY key;
 
COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')

alexott · ‎06-22-2021

should it be `IF NOT EXISTS` in CREATE TABLE?

Mooune_DBU · ‎06-23-2021

Yes you are correct.