- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-13-2024 09:48 AM
I have a delta table with identity column and not able to insert data using spark. I am using 15.4LTS. any idea what needed to be done
Table Name : account
Column Def: account_dimension_id BIGINT GENERATED BY DEFAULT AS IDENTITY,
df = spark.read.format("csv").load("abfss://databricks-storage@sa14127e1dv0101.dfs.core.windows.net/catalogs/data-relationship-management-drm/onprem_acct_dimension.csv");
df.write.format("delta").mode("append").insertInto("account")
I am able to insert through sql but not through spark
file has 7 columns but table has 8 columns including the identity column
Error: error: [DELTA_INSERT_COLUMN_ARITY_MISMATCH] Cannot write to 'data_relationship_managment_pdz.data_relationship_managment_schema.account', not enough data columns; target table has 8 column(s) but the inserted data has 7 column(s) SQLSTATE: 42802
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2024 09:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-13-2024 12:31 PM
The error states that there is a missing column on the data that you are adding if you print the df and so a select on the table do you see that all the columns matches?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-13-2024 01:08 PM
My first column in the table is the identity column and my expectation is it should get auto incremented so it's not in my data frame. I am able to insert data through sql and identity column get auto incremented but not when I try to insert through spark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-14-2024 09:43 AM
Hey @Shivaprasad
Could you try and switch last part "insertInto" to "saveAsTable"?
df.write.mode("append").format("delta").saveAsTable( "account")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-15-2024 06:18 PM
Thanks. Still giving data mismatch error -
table : account
account_dimension_id BIGINT GENERATED BY DEFAULT AS IDENTITY,
df = spark.read.format("csv").option("header", "true").load("abfss://databricks-storage@sa14127e1dv0101.dfs.core.windows.net/catalogs/data-relationship-management-drm/onprem_acct_dimension.csv")
for col in df.columns:
df = df.withColumnRenamed(col, col.lower())
display(df)
df.write.format("delta").mode("append").option("mergeSchema", "true").saveAsTable("account")
Error:
[DELTA_FAILED_TO_MERGE_FIELDS] Failed to merge fields 'insert_date' and 'insert_date' SQLSTATE: 22005
File <command-2201570317935975>, line 8
df = df.withColumnRenamed(col, col.lower())
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2024 02:31 AM
Hey @Shivaprasad
I think you have error due to different datatypes for column insert_date. Probably its coming as a string from CSV file. Please check what are both data types - in tables and in dataframe created based on file. Cast it if needed. Error should be fixed then.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2024 09:13 AM
Thanks, that resolved the issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2024 09:42 AM
Hey @Shivaprasad
Welcome :). Please consider to mark my last answer as a solution so others can easily find.

