cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Surrogate key using identity column.

Dataengineer_mm
New Contributor

I want to create a surrogate in the delta table And i used the identity column id-Generated as Default

Can i insert rows into the delta table using only spark.sql like Insert query ? or i can also use write delta format options? If i use the df.write it says mismatched columns ? Can someone help on how to implement this ?

2 REPLIES 2

NandiniN
Valued Contributor II
Valued Contributor II

Hello @Menaka Murugesanโ€‹ ,

If you are using the identity column, I believe you would have created the table as below, (starts with value 1 and step 1)

CREATE TABLE my_table (
  id INT IDENTITY (1, 1) PRIMARY KEY,
  value STRING
)

You can insert values in the identity table as

INSERT INTO my_table (value) VALUES ('Value 1'), ('Value 2'), ('Value 3')

The id column values will be automatically generated by the identity column with values of 1, 2, and 3 respectively.

Alternatively, you can also use Spark SQL to insert data into the Delta table with an identity column. Here's an example of how to do this:

spark.sql("INSERT INTO my_table (value) VALUES ('Value 1'), ('Value 2'), ('Value 3')")

To write data into a Delta table with an identity column using the df.write API in PySpark

data = [("Value 1",), ("Value 2",), ("Value 3",)]
columns = ["value"]
 
df = spark.createDataFrame(data, columns)
 
df.write.format("delta").option("path", "/path/to/my_table").option("overwriteSchema", "true").option("idCol", "id").save()

The id column will be auto generated.

I think you are getting mismatched columns error because you are trying to insert the value for id? Please clarify further on the error and how are you inserting the value.

Hope this helps.

Kaniz
Community Manager
Community Manager

Hi @Menaka Murugesanโ€‹(Customer)โ€‹, We havenโ€™t heard from you since the last response from @Nandini Nโ€‹  (Customer)โ€‹, and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.