Databricks Community

vanverne · ‎11-24-2024

Hello,

I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector.

Here is a simplified version of my current workflow:

I create a temporary view from a DataFrame.
I insert data from this temporary view into a target table.
I need to capture the auto-generated IDs (e.g., id_col) for each inserted row to log additional information in another table.

I have tried a `returning` statement, which would work perfectly, however, it's not available.

`id_col` column is auto-generated upon data insertion.

# creating a temporary table (using local notebook, this helps pass off compute to databricks and return only the results)

data_tuples = [tuple(x) for x in df.to_numpy()]
create_view_sql = "create or replace temporary view my_temp_view as select * from values"
values_str = ", ".join([f"({', '.join([format_value2(item) for item in row])})" for row in data_tuples])
column_names_str = ", ".join(df.columns.tolist())
create_view_sql += values_str + f" as t({column_names_str})"
cursor = conn.cursor()
cursor.execute(create_view_sql)

# this is where I need help - the returning functionality doesn't exist for databricks

query = """
insert into my_table (col2, col3)
select col2, col3
from my_temp_view
returning id_col
"""

cursor.execute(query)
new_ids = cursor.fetchall()

Thanks for your help and time!

agallard · ‎11-25-2024

Hi @vanverne,

Unfortunately, as of now, Databricks SQL does not support the RETURNING clause directly when inserting rows into a table. This limitation makes it tricky to capture auto-generated IDs during an INSERT operation.

However, you can achieve the desired functionality by using a combination of temporary tables, merge queries, or other approaches. Below are a few alternatives to help you capture the auto-generated IDs for your use case.

Let me know if you need further clarification or assistance!

Regards!

Alfonso Gallardo
-------------------
 I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark

vanverne · ‎12-05-2024

Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon? If not, could we open a case to add that functionality?

Databricks Community

Assistance with Capturing Auto-Generated IDs in Databricks SQL

li.media.uploader-dialog.title

Connect with Databricks Users in Your Area

Securely share data, analytics and AI

Check Out the Latest Videos on DatabricksTV

Data Intelligence for Data Engineers