COPY INTO: How to add a partitioning?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2021 01:27 PM
The command
COPY INTO
from Databricks provides an idempotent file ingestion into a delta table, see here. From the docs, an example command looks like this:
COPY INTO delta.`target_path`
FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
FILEFORMAT = CSV
PATTERN = 'folder1/file_[a-g].csv'
FORMAT_OPTIONS('header' = 'true')
However, I want my data to be ingested via a partition, which can be accomplished with the
PARTITIONED BY
command in a normal ingestion. Is it possible to execute a
COPY INTO
command with partitioning? Simply adding the partitioning clause yields the following error:
Error in SQL statement: ParseException:
mismatched input 'PARTITIONED' expecting {<EOF>, ';'}(line 8, pos 0)
- Labels:
-
CSV
-
Delta
-
Delta table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2021 02:46 PM
If you're looking to partition your `target_path` table, then it's recommended to define the partition keys prior to the COPY INTO command (at the DDL level)
E.g.
// Drop table if it already exists without the partition key defined (OPTIONNAL)
DROP TABLE IF EXISTS delta.`target_path`;
// Create with partition key (If table doesn't already exist with partitionning)
CREATE TABLE delta.`target_path`
PARTITION BY key;
COPY INTO delta.`target_path`
FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
FILEFORMAT = CSV
PATTERN = 'folder1/file_[a-g].csv'
FORMAT_OPTIONS('header' = 'true')
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 10:33 PM
should it be `IF NOT EXISTS` in CREATE TABLE?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 07:43 AM
Yes you are correct.