Databricks Community

DataInsight · ‎03-28-2024

How do i use copy into command to load 200+ tables with 50+ columns into a delta lake table with predefined schema. I am looking for a more generic approach to be handled in pyspark code.

I am aware that we can pass the column expression into the select clause but passing column name into the select clause seems to be more tedious task.

any help over this is really appreciated

Lakshay · ‎03-28-2024

Does your source data have same number of columns as your target Delta tables? In that case, you can do it this way:
COPY INTO my_pipe_data
FROM 's3://my-bucket/pipeData'
FILEFORMAT = CSV
FORMAT_OPTIONS ('mergeSchema' = 'true',
'delimiter' = '|',
'header' = 'true')

Databricks Community

Copy Into command to copy into delta table with predefined schema and csv file has no headers

Join Us as a Local Community Builder!

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Find Sensitive Data at Scale with Data Classification in Unity Catalog

Solution Accelerator Series | #6 - Adverse Drug Event Detection

Announcing Backfill Runs in Lakeflow Jobs for Higher Quality Downstream Data

🚀 New: Databricks Interactive Architecture Design Workshops