cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Not able to insert into identity column through spark

Shivaprasad
New Contributor III

I have a delta table with identity column and not able to insert data using spark. I am using 15.4LTS. any idea what needed to be done

 

Table Name : account

Column Def: account_dimension_id BIGINT GENERATED BY DEFAULT AS IDENTITY,

df = spark.read.format("csv").load("abfss://databricks-storage@sa14127e1dv0101.dfs.core.windows.net/catalogs/data-relationship-management-drm/onprem_acct_dimension.csv");

df.write.format("delta").mode("append").insertInto("account")

I am able to insert through sql but not through spark

file has 7 columns but table has 8 columns including the identity column

 Error: error: [DELTA_INSERT_COLUMN_ARITY_MISMATCH] Cannot write to 'data_relationship_managment_pdz.data_relationship_managment_schema.account', not enough data columns; target table has 8 column(s) but the inserted data has 7 column(s) SQLSTATE: 42802

1 ACCEPTED SOLUTION

Accepted Solutions

Shivaprasad
New Contributor III

Thanks, that resolved the issue

View solution in original post

7 REPLIES 7

Walter_C
Databricks Employee
Databricks Employee

The error states that there is a missing column on the data that you are adding if you print the df and so a select on the table do you see that all the columns matches?

Shivaprasad
New Contributor III

My first column in the table is the identity column and my expectation is it should get auto incremented so it's not in my data frame. I am able to insert data through sql and identity column get auto incremented but not when I try to insert through spark.

PiotrMi
New Contributor III

Hey @Shivaprasad 

Could you try and switch last part "insertInto" to "saveAsTable"?

df.write.mode("append").format("delta").saveAsTable( "account")

Shivaprasad
New Contributor III

Thanks. Still giving data mismatch error -

table : account
account_dimension_id BIGINT GENERATED BY DEFAULT AS IDENTITY,

df = spark.read.format("csv").option("header", "true").load("abfss://databricks-storage@sa14127e1dv0101.dfs.core.windows.net/catalogs/data-relationship-management-drm/onprem_acct_dimension.csv")
for col in df.columns:
df = df.withColumnRenamed(col, col.lower())
display(df)
df.write.format("delta").mode("append").option("mergeSchema", "true").saveAsTable("account")

Error:
[DELTA_FAILED_TO_MERGE_FIELDS] Failed to merge fields 'insert_date' and 'insert_date' SQLSTATE: 22005
File <command-2201570317935975>, line 8
 df = df.withColumnRenamed(col, col.lower())

 

 

PiotrMi
New Contributor III

Hey @Shivaprasad 

I think you have error due to different datatypes for column insert_date. Probably its coming as a string from CSV file. Please check what are both data types - in tables and in dataframe created based on file. Cast it if needed. Error should be fixed then.

Shivaprasad
New Contributor III

Thanks, that resolved the issue

PiotrMi
New Contributor III

Hey @Shivaprasad 

Welcome :). Please consider to mark my last answer as a solution so others can easily find.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group