Databricks Community

Shivaprasad · ‎12-13-2024

I have a delta table with identity column and not able to insert data using spark. I am using 15.4LTS. any idea what needed to be done

Table Name : account

Column Def: account_dimension_id BIGINT GENERATED BY DEFAULT AS IDENTITY,

df = spark.read.format("csv").load("abfss://databricks-storage@sa14127e1dv0101.dfs.core.windows.net/catalogs/data-relationship-management-drm/onprem_acct_dimension.csv");

df.write.format("delta").mode("append").insertInto("account")

I am able to insert through sql but not through spark

file has 7 columns but table has 8 columns including the identity column

Error: error: [DELTA_INSERT_COLUMN_ARITY_MISMATCH] Cannot write to 'data_relationship_managment_pdz.data_relationship_managment_schema.account', not enough data columns; target table has 8 column(s) but the inserted data has 7 column(s) SQLSTATE: 42802

Shivaprasad · ‎12-16-2024

Thanks, that resolved the issue

View solution in original post

Walter_C · ‎12-13-2024

The error states that there is a missing column on the data that you are adding if you print the df and so a select on the table do you see that all the columns matches?

Shivaprasad · ‎12-13-2024

My first column in the table is the identity column and my expectation is it should get auto incremented so it's not in my data frame. I am able to insert data through sql and identity column get auto incremented but not when I try to insert through spark.

PiotrMi · ‎12-14-2024

Hey @Shivaprasad

Could you try and switch last part "insertInto" to "saveAsTable"?

df.write.mode("append").format("delta").saveAsTable( "account")

Shivaprasad · ‎12-15-2024

Thanks. Still giving data mismatch error -

table : account
account_dimension_id BIGINT GENERATED BY DEFAULT AS IDENTITY,

df = spark.read.format("csv").option("header", "true").load("abfss://databricks-storage@sa14127e1dv0101.dfs.core.windows.net/catalogs/data-relationship-management-drm/onprem_acct_dimension.csv")
for col in df.columns:
df = df.withColumnRenamed(col, col.lower())
display(df)
df.write.format("delta").mode("append").option("mergeSchema", "true").saveAsTable("account")

Error:
[DELTA_FAILED_TO_MERGE_FIELDS] Failed to merge fields 'insert_date' and 'insert_date' SQLSTATE: 22005
File <command-2201570317935975>, line 8
df = df.withColumnRenamed(col, col.lower())

PiotrMi · ‎12-16-2024

Hey @Shivaprasad

I think you have error due to different datatypes for column insert_date. Probably its coming as a string from CSV file. Please check what are both data types - in tables and in dataframe created based on file. Cast it if needed. Error should be fixed then.