cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Does Databricks plan to create a Python API of the COPY INTO spark SQL statement in the future ?

gb_dbx
New Contributor II

Hi,

I am wondering if Databricks has planned to create a Python API of spark SQL's COPY INTO statement ?

In my company we created some kind of a Python wrapper of the SQL COPY INTO statement, but it has lots of design issues and is hard to maintain. I believe it would be better for Databricks to maintain such an API, but maybe I am not looking at the right place ?
For example the MERGE INTO statement has an API but is maintained by the Delta Lake project since the MERGE INTO statement is a feature of Delta Lake.

4 REPLIES 4

Retired_mod
Esteemed Contributor III

Hi @gb_dbx, As of now, Databricks does not have an official Python API for the COPY INTO statement similar to the MERGE INTO API maintained by the Delta Lake project. The COPY INTO command is primarily used through SQL to load data into Delta tables.

Witold
Honored Contributor

I'm wondering, what does COPY INTO offer, which Auto Loader doesn't have?

gb_dbx
New Contributor II

Okay maybe I should take a look at Auto Loader then, I didn't know Auto Loader could basically do the same as COPY INTO, I originally thought it was only used for streaming and not batch ingestion.
And Auto Loader has a dedicated Python API then ?

And also if both Auto Loader and COPY INTO exists and kind of does the same thing, why does the two of them exists, instead of having only one of them and deprecating the other if it's a double use ? It's a bit confusing haha.

szymon_dybczak
Esteemed Contributor III

@Hi @gb_dbx ,

Copy into is the same as auto loader with directory listing mode. But auto loader also supports file notification mode. That's the main difference. Auto loader support pyspark api and can be scheduled to run in Databricks Jobs as a batch job by using Trigger.AvailableNow

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now