12-13-2024 09:48 AM
I have a delta table with identity column and not able to insert data using spark. I am using 15.4LTS. any idea what needed to be done
Table Name : account
Column Def: account_dimension_id BIGINT GENERATED BY DEFAULT AS IDENTITY,
df = spark.read.format("csv").load("abfss://databricks-storage@sa14127e1dv0101.dfs.core.windows.net/catalogs/data-relationship-management-drm/onprem_acct_dimension.csv");
df.write.format("delta").mode("append").insertInto("account")
I am able to insert through sql but not through spark
file has 7 columns but table has 8 columns including the identity column
Error: error: [DELTA_INSERT_COLUMN_ARITY_MISMATCH] Cannot write to 'data_relationship_managment_pdz.data_relationship_managment_schema.account', not enough data columns; target table has 8 column(s) but the inserted data has 7 column(s) SQLSTATE: 42802
12-16-2024 09:13 AM
12-13-2024 12:31 PM
The error states that there is a missing column on the data that you are adding if you print the df and so a select on the table do you see that all the columns matches?
12-13-2024 01:08 PM
My first column in the table is the identity column and my expectation is it should get auto incremented so it's not in my data frame. I am able to insert data through sql and identity column get auto incremented but not when I try to insert through spark.
12-14-2024 09:43 AM
Hey @Shivaprasad
Could you try and switch last part "insertInto" to "saveAsTable"?
df.write.mode("append").format("delta").saveAsTable( "account")
12-15-2024 06:18 PM
Thanks. Still giving data mismatch error -
table : account
account_dimension_id BIGINT GENERATED BY DEFAULT AS IDENTITY,
df = spark.read.format("csv").option("header", "true").load("abfss://databricks-storage@sa14127e1dv0101.dfs.core.windows.net/catalogs/data-relationship-management-drm/onprem_acct_dimension.csv")
for col in df.columns:
df = df.withColumnRenamed(col, col.lower())
display(df)
df.write.format("delta").mode("append").option("mergeSchema", "true").saveAsTable("account")
Error:
[DELTA_FAILED_TO_MERGE_FIELDS] Failed to merge fields 'insert_date' and 'insert_date' SQLSTATE: 22005
File <command-2201570317935975>, line 8
df = df.withColumnRenamed(col, col.lower())
12-16-2024 02:31 AM
Hey @Shivaprasad
I think you have error due to different datatypes for column insert_date. Probably its coming as a string from CSV file. Please check what are both data types - in tables and in dataframe created based on file. Cast it if needed. Error should be fixed then.
12-16-2024 09:13 AM
Thanks, that resolved the issue
12-16-2024 09:42 AM
Hey @Shivaprasad
Welcome :). Please consider to mark my last answer as a solution so others can easily find.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group