cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Does Databricks plan to create a Python API of the COPY INTO spark SQL statement in the future ?

gb_dbx
New Contributor II

Hi,

I am wondering if Databricks has planned to create a Python API of spark SQL's COPY INTO statement ?

In my company we created some kind of a Python wrapper of the SQL COPY INTO statement, but it has lots of design issues and is hard to maintain. I believe it would be better for Databricks to maintain such an API, but maybe I am not looking at the right place ?
For example the MERGE INTO statement has an API but is maintained by the Delta Lake project since the MERGE INTO statement is a feature of Delta Lake.

4 REPLIES 4

Kaniz_Fatma
Community Manager
Community Manager

Hi @gb_dbx, As of now, Databricks does not have an official Python API for the COPY INTO statement similar to the MERGE INTO API maintained by the Delta Lake project. The COPY INTO command is primarily used through SQL to load data into Delta tables.

Witold
Contributor

I'm wondering, what does COPY INTO offer, which Auto Loader doesn't have?

gb_dbx
New Contributor II

Okay maybe I should take a look at Auto Loader then, I didn't know Auto Loader could basically do the same as COPY INTO, I originally thought it was only used for streaming and not batch ingestion.
And Auto Loader has a dedicated Python API then ?

And also if both Auto Loader and COPY INTO exists and kind of does the same thing, why does the two of them exists, instead of having only one of them and deprecating the other if it's a double use ? It's a bit confusing haha.

@Hi @gb_dbx ,

Copy into is the same as auto loader with directory listing mode. But auto loader also supports file notification mode. That's the main difference. Auto loader support pyspark api and can be scheduled to run in Databricks Jobs as a batch job by using Trigger.AvailableNow

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group