Assistance with Capturing Auto-Generated IDs in Databricks SQL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-24-2024 07:15 PM
Hello,
I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector.
Here is a simplified version of my current workflow:
- I create a temporary view from a DataFrame.
- I insert data from this temporary view into a target table.
- I need to capture the auto-generated IDs (e.g., id_col) for each inserted row to log additional information in another table.
I have tried a `returning` statement, which would work perfectly, however, it's not available.
`id_col` column is auto-generated upon data insertion.
# creating a temporary table (using local notebook, this helps pass off compute to databricks and return only the results)
data_tuples = [tuple(x) for x in df.to_numpy()]
create_view_sql = "create or replace temporary view my_temp_view as select * from values"
values_str = ", ".join([f"({', '.join([format_value2(item) for item in row])})" for row in data_tuples])
column_names_str = ", ".join(df.columns.tolist())
create_view_sql += values_str + f" as t({column_names_str})"
cursor = conn.cursor()
cursor.execute(create_view_sql)
# this is where I need help - the returning functionality doesn't exist for databricks
query = """
insert into my_table (col2, col3)
select col2, col3
from my_temp_view
returning id_col
"""
cursor.execute(query)
new_ids = cursor.fetchall()
Thanks for your help and time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2024 04:12 AM
Hi @vanverne,
Unfortunately, as of now, Databricks SQL does not support the RETURNING clause directly when inserting rows into a table. This limitation makes it tricky to capture auto-generated IDs during an INSERT operation.
However, you can achieve the desired functionality by using a combination of temporary tables, merge queries, or other approaches. Below are a few alternatives to help you capture the auto-generated IDs for your use case.
Let me know if you need further clarification or assistance!
Regards!
-------------------
I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 02:09 PM
Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon? If not, could we open a case to add that functionality?