Hello,
I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector.
Here is a simplified version of my current workflow:
- I create a temporary view from a DataFrame.
- I insert data from this temporary view into a target table.
- I need to capture the auto-generated IDs (e.g., id_col) for each inserted row to log additional information in another table.
I have tried a `returning` statement, which would work perfectly, however, it's not available.
`id_col` column is auto-generated upon data insertion.
# creating a temporary table (using local notebook, this helps pass off compute to databricks and return only the results)
data_tuples = [tuple(x) for x in df.to_numpy()]
create_view_sql = "create or replace temporary view my_temp_view as select * from values"
values_str = ", ".join([f"({', '.join([format_value2(item) for item in row])})" for row in data_tuples])
column_names_str = ", ".join(df.columns.tolist())
create_view_sql += values_str + f" as t({column_names_str})"
cursor = conn.cursor()
cursor.execute(create_view_sql)
# this is where I need help - the returning functionality doesn't exist for databricks
query = """
insert into my_table (col2, col3)
select col2, col3
from my_temp_view
returning id_col
"""
cursor.execute(query)
new_ids = cursor.fetchall()
Thanks for your help and time!