cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Paul92S
New Contributor III

Hi,

I am having an issue of loading source data into a delta table/ unity catalog. The error we are recieving is the following:

grpc_message:"[DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (isnull(\'metric_name) OR (length(\'metric_name) <= 0))

We get this issue when executing the line below:

source_df.write.format("delta").mode("overwrite").option("mergeSchema", "true").save(destination_mount_filepath)
 
The background of this is we are loading raw data into a delta table in mounted storage.
I have tried unmounting/remounting storage and recreating the container in storage account. I have checked and validated the data and there are no rows that would fall to this match. I have tried dropping the Unity Catalog tables and using the dbutils to remove the old data behind the Catalog tables to no avail.
1 ACCEPTED SOLUTION

Accepted Solutions

Paul92S
New Contributor III

Hi Palash, 

We have Unity Catalogs created off external table locations, which are mounted to Databricks from azure DL. If a change of schema has come from upstream this causes this issue. To resolve, read in new source schema names from raw. Performed ALTER TABLE include these new columns then had to run UPDATE SET on the columns to give them an empty string ''. This then allowed for dataframe to overwrite the files in mounted storage without causing an error

View solution in original post

2 REPLIES 2

Palash01
Contributor III

 Hey @Paul92S 

Looking at the error message it looks like column "metric_nameis the culprit here:

Understanding the Error:

  • Character Limit Violation: The error indicates that values in the metric_name column are exceeding the maximum length allowed for char/varchar data types in Unity Catalog.
  • Schema Mismatch: This likely stems from a discrepancy between the source data schema and the existing Delta table schema.

Troubleshooting Steps:

  • Use source_df.printSchema() to verify the actual schema of your source data, paying close attention to the metric_name column's data type and length.
  • Use spark.read.format("delta").load(destination_mount_filepath).printSchema() to inspect the current schema of the Delta table.
  • If feasible, consider altering the metric_name column in the Delta table to accommodate longer values using ALTER TABLE TABLE_NAME ALTER COLUMN COLUMN_NAME;
  • If increasing column size isn't viable, explore truncating values in the metric_name column before writing to the Delta table.
  • If the error message refers to isnull('metric_name'), ensure there are no unintended null values in that column.
  • If empty strings are valid for metric_name, explicitly set those values to empty strings in the source DataFrame.
  • Also, check if the incremental files has a different schema vs the file you are using to create the table.

Follow-ups are appreciated!

Leave a like if this helps! Kudos,
Palash

Paul92S
New Contributor III

Hi Palash, 

We have Unity Catalogs created off external table locations, which are mounted to Databricks from azure DL. If a change of schema has come from upstream this causes this issue. To resolve, read in new source schema names from raw. Performed ALTER TABLE include these new columns then had to run UPDATE SET on the columns to give them an empty string ''. This then allowed for dataframe to overwrite the files in mounted storage without causing an error

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.