cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Assistance with Capturing Auto-Generated IDs in Databricks SQL

vanverne
New Contributor II

Hello,

I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector. 

Here is a simplified version of my current workflow:

  1. I create a temporary view from a DataFrame.
  2. I insert data from this temporary view into a target table.
  3. I need to capture the auto-generated IDs (e.g., id_col) for each inserted row to log additional information in another table.

I have tried a `returning` statement, which would work perfectly, however, it's not available. 

`id_col` column is auto-generated upon data insertion. 

# creating a temporary table (using local notebook, this helps pass off compute to databricks and return only the results)

data_tuples = [tuple(x) for x in df.to_numpy()]
create_view_sql = "create or replace temporary view my_temp_view as select * from values"
values_str = ", ".join([f"({', '.join([format_value2(item) for item in row])})" for row in data_tuples])
column_names_str = ", ".join(df.columns.tolist())
create_view_sql += values_str + f" as t({column_names_str})"
cursor = conn.cursor()
cursor.execute(create_view_sql)

# this is where I need help - the returning functionality doesn't exist for databricks 

query = """
insert into my_table (col2, col3)
select col2, col3
from my_temp_view
returning id_col
"""

cursor.execute(query)
new_ids = cursor.fetchall()

 

Thanks for your help and time! 

2 REPLIES 2

agallard
Contributor

Hi @vanverne,

Unfortunately, as of now, Databricks SQL does not support the RETURNING clause directly when inserting rows into a table. This limitation makes it tricky to capture auto-generated IDs during an INSERT operation.

However, you can achieve the desired functionality by using a combination of temporary tables, merge queries, or other approaches. Below are a few alternatives to help you capture the auto-generated IDs for your use case.

Let me know if you need further clarification or assistance!

Regards!

 
 
 
 
 
Alfonso Gallardo
-------------------
 I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark

vanverne
New Contributor II

Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon? If not, could we open a case to add that functionality? 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group