cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema

aladda
Honored Contributor II
Honored Contributor II

I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with explicitly defined schema vs inferring it, etc

1 ACCEPTED SOLUTION

Accepted Solutions

aladda
Honored Contributor II
Honored Contributor II

Here's an example for predefined schema

  1. Using COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below

%sql 

CREATE OR REPLACE TABLE copy_into_bronze_test 

(id STRING, reading_time TIMESTAMP, device_type STRING, device_id STRING, device_operational_status STRING, reading_1 double, reading_2 double, reading_3 double)

USING DELTA

 

%sql 

COPY INTO copy_into_bronze_test

FROM

 (

   SELECT

     _c0 as ID,

     CAST (_c1 AS timestamp) AS reading_time,

     _c2 as device_type,

     _c3 as device_id,

     _c4 as device_operational_status,

     CAST(_c5 as double) as reading_1,

     CAST(_c6 as double) as reading_2,

     CAST(_c7 as double) as reading_3

  

FROM

 'dbfs:/FileStore/flight/anand.ladda/*_no_headers.csv'

 )

FILEFORMAT = CSV FORMAT_OPTIONS('sep' = ',','inferSchema' = 'false','header' = 'false')

View solution in original post

2 REPLIES 2

aladda
Honored Contributor II
Honored Contributor II

Here's an example for predefined schema

  1. Using COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below

%sql 

CREATE OR REPLACE TABLE copy_into_bronze_test 

(id STRING, reading_time TIMESTAMP, device_type STRING, device_id STRING, device_operational_status STRING, reading_1 double, reading_2 double, reading_3 double)

USING DELTA

 

%sql 

COPY INTO copy_into_bronze_test

FROM

 (

   SELECT

     _c0 as ID,

     CAST (_c1 AS timestamp) AS reading_time,

     _c2 as device_type,

     _c3 as device_id,

     _c4 as device_operational_status,

     CAST(_c5 as double) as reading_1,

     CAST(_c6 as double) as reading_2,

     CAST(_c7 as double) as reading_3

  

FROM

 'dbfs:/FileStore/flight/anand.ladda/*_no_headers.csv'

 )

FILEFORMAT = CSV FORMAT_OPTIONS('sep' = ',','inferSchema' = 'false','header' = 'false')

DataInsight
New Contributor II

How does this work when you have 200 tables to load with each table having 50+ columns how to build this code in a generic way the csv file do not have the headers and when loaded without select its not taking predefined schema

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.