Databricks

Dataengineer_mm · ‎03-13-2023

I want to create a surrogate in the delta table And i used the identity column id-Generated as Default

Can i insert rows into the delta table using only spark.sql like Insert query ? or i can also use write delta format options? If i use the df.write it says mismatched columns ? Can someone help on how to implement this ?

NandiniN · ‎03-14-2023

Hello @Menaka Murugesan ,

If you are using the identity column, I believe you would have created the table as below, (starts with value 1 and step 1)

CREATE TABLE my_table (
  id INT IDENTITY (1, 1) PRIMARY KEY,
  value STRING
)

You can insert values in the identity table as

INSERT INTO my_table (value) VALUES ('Value 1'), ('Value 2'), ('Value 3')

The id column values will be automatically generated by the identity column with values of 1, 2, and 3 respectively.

Alternatively, you can also use Spark SQL to insert data into the Delta table with an identity column. Here's an example of how to do this:

spark.sql("INSERT INTO my_table (value) VALUES ('Value 1'), ('Value 2'), ('Value 3')")

To write data into a Delta table with an identity column using the df.write API in PySpark

data = [("Value 1",), ("Value 2",), ("Value 3",)]
columns = ["value"]
 
df = spark.createDataFrame(data, columns)
 
df.write.format("delta").option("path", "/path/to/my_table").option("overwriteSchema", "true").option("idCol", "id").save()

The id column will be auto generated.

I think you are getting mismatched columns error because you are trying to insert the value for id? Please clarify further on the error and how are you inserting the value.

Hope this helps.

Kaniz · ‎03-18-2023

Hi @Menaka Murugesan(Customer), We haven’t heard from you since the last response from @Nandini N (Customer), and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Databricks

Surrogate key using identity column.

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs