Does Databricks plan to create a Python API of the COPY INTO spark SQL statement in the future ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-07-2024 06:44 AM
Hi,
I am wondering if Databricks has planned to create a Python API of spark SQL's COPY INTO statement ?
In my company we created some kind of a Python wrapper of the SQL COPY INTO statement, but it has lots of design issues and is hard to maintain. I believe it would be better for Databricks to maintain such an API, but maybe I am not looking at the right place ?
For example the MERGE INTO statement has an API but is maintained by the Delta Lake project since the MERGE INTO statement is a feature of Delta Lake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-09-2024 05:05 AM
Hi @gb_dbx, As of now, Databricks does not have an official Python API for the COPY INTO statement similar to the MERGE INTO API maintained by the Delta Lake project. The COPY INTO command is primarily used through SQL to load data into Delta tables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-09-2024 08:25 AM
I'm wondering, what does COPY INTO offer, which Auto Loader doesn't have?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2024 06:10 AM
Okay maybe I should take a look at Auto Loader then, I didn't know Auto Loader could basically do the same as COPY INTO, I originally thought it was only used for streaming and not batch ingestion.
And Auto Loader has a dedicated Python API then ?
And also if both Auto Loader and COPY INTO exists and kind of does the same thing, why does the two of them exists, instead of having only one of them and deprecating the other if it's a double use ? It's a bit confusing haha.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2024 06:42 AM - edited 08-12-2024 06:45 AM
@Hi @gb_dbx ,
Copy into is the same as auto loader with directory listing mode. But auto loader also supports file notification mode. That's the main difference. Auto loader support pyspark api and can be scheduled to run in Databricks Jobs as a batch job by using Trigger.AvailableNow

