cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema

aladda
Honored Contributor II
Honored Contributor II

I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with explicitly defined schema vs inferring it, etc

1 ACCEPTED SOLUTION

Accepted Solutions

aladda
Honored Contributor II
Honored Contributor II

Here's an example for predefined schema

  1. Using COPY INTO with a predefined table schema โ€“ Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below

%sql 

CREATE OR REPLACE TABLE copy_into_bronze_test 

(id STRING, reading_time TIMESTAMP, device_type STRING, device_id STRING, device_operational_status STRING, reading_1 double, reading_2 double, reading_3 double)

USING DELTA

 

%sql 

COPY INTO copy_into_bronze_test

FROM

 (

   SELECT

     _c0 as ID,

     CAST (_c1 AS timestamp) AS reading_time,

     _c2 as device_type,

     _c3 as device_id,

     _c4 as device_operational_status,

     CAST(_c5 as double) as reading_1,

     CAST(_c6 as double) as reading_2,

     CAST(_c7 as double) as reading_3

  

FROM

 'dbfs:/FileStore/flight/anand.ladda/*_no_headers.csv'

 )

FILEFORMAT = CSV FORMAT_OPTIONS('sep' = ',','inferSchema' = 'false','header' = 'false')

View solution in original post

2 REPLIES 2

aladda
Honored Contributor II
Honored Contributor II

Here's an example for predefined schema

  1. Using COPY INTO with a predefined table schema โ€“ Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below

%sql 

CREATE OR REPLACE TABLE copy_into_bronze_test 

(id STRING, reading_time TIMESTAMP, device_type STRING, device_id STRING, device_operational_status STRING, reading_1 double, reading_2 double, reading_3 double)

USING DELTA

 

%sql 

COPY INTO copy_into_bronze_test

FROM

 (

   SELECT

     _c0 as ID,

     CAST (_c1 AS timestamp) AS reading_time,

     _c2 as device_type,

     _c3 as device_id,

     _c4 as device_operational_status,

     CAST(_c5 as double) as reading_1,

     CAST(_c6 as double) as reading_2,

     CAST(_c7 as double) as reading_3

  

FROM

 'dbfs:/FileStore/flight/anand.ladda/*_no_headers.csv'

 )

FILEFORMAT = CSV FORMAT_OPTIONS('sep' = ',','inferSchema' = 'false','header' = 'false')

DataInsight
New Contributor II

How does this work when you have 200 tables to load with each table having 50+ columns how to build this code in a generic way the csv file do not have the headers and when loaded without select its not taking predefined schema

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group