COPY INTO: How to add a partitioning?

User16765131552
Databricks Employee
Databricks Employee

The command 

COPY INTO

 from Databricks provides an idempotent file ingestion into a delta table, see here. From the docs, an example command looks like this:

COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')

However, I want my data to be ingested via a partition, which can be accomplished with the 

PARTITIONED BY

 command in a normal ingestion. Is it possible to execute a 

COPY INTO

 command with partitioning? Simply adding the partitioning clause yields the following error:

Error in SQL statement: ParseException: 
mismatched input 'PARTITIONED' expecting {<EOF>, ';'}(line 8, pos 0)

Mooune_DBU
Databricks Employee
Databricks Employee

If you're looking to partition your `target_path` table, then it's recommended to define the partition keys prior to the COPY INTO command (at the DDL level)

E.g.

// Drop table if it already exists without the partition key defined (OPTIONNAL)
DROP TABLE IF EXISTS delta.`target_path`;
 
// Create with partition key (If table doesn't already exist with partitionning)
CREATE TABLE delta.`target_path`
  PARTITION BY key;
 
COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')
 

alexott
Databricks Employee
Databricks Employee

should it be `IF NOT EXISTS` in CREATE TABLE?

Mooune_DBU
Databricks Employee
Databricks Employee

Yes you are correct.