Databricks

aladda · ‎05-14-2021

I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with explicitly defined schema vs inferring it, etc

aladda · ‎06-21-2021

Here's an example for predefined schema

Using COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below

%sql

CREATE OR REPLACE TABLE copy_into_bronze_test

(id STRING, reading_time TIMESTAMP, device_type STRING, device_id STRING, device_operational_status STRING, reading_1 double, reading_2 double, reading_3 double)

USING DELTA

%sql

COPY INTO copy_into_bronze_test

FROM

(

SELECT

_c0 as ID,

CAST (_c1 AS timestamp) AS reading_time,

_c2 as device_type,

_c3 as device_id,

_c4 as device_operational_status,

CAST(_c5 as double) as reading_1,

CAST(_c6 as double) as reading_2,

CAST(_c7 as double) as reading_3

FROM

'dbfs:/FileStore/flight/anand.ladda/*_no_headers.csv'

)

FILEFORMAT = CSV FORMAT_OPTIONS('sep' = ',','inferSchema' = 'false','header' = 'false')

View solution in original post

aladda · ‎06-21-2021

Here's an example for predefined schema

Using COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below

%sql

CREATE OR REPLACE TABLE copy_into_bronze_test

(id STRING, reading_time TIMESTAMP, device_type STRING, device_id STRING, device_operational_status STRING, reading_1 double, reading_2 double, reading_3 double)

USING DELTA

%sql

COPY INTO copy_into_bronze_test

FROM

(

SELECT

_c0 as ID,

CAST (_c1 AS timestamp) AS reading_time,

_c2 as device_type,

_c3 as device_id,

_c4 as device_operational_status,

CAST(_c5 as double) as reading_1,

CAST(_c6 as double) as reading_2,

CAST(_c7 as double) as reading_3

FROM

'dbfs:/FileStore/flight/anand.ladda/*_no_headers.csv'

)

FILEFORMAT = CSV FORMAT_OPTIONS('sep' = ',','inferSchema' = 'false','header' = 'false')

DataInsight · a month ago

How does this work when you have 200 tables to load with each table having 50+ columns how to build this code in a generic way the csv file do not have the headers and when loaded without select its not taking predefined schema

Databricks

How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Register now and save 50% on training at Data + AI Summit!