Databricks Community

Paul92S · ‎02-20-2024

Hi,

I am having an issue of loading source data into a delta table/ unity catalog. The error we are recieving is the following:

grpc_message:"[DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (isnull(\'metric_name) OR (length(\'metric_name) <= 0))

We get this issue when executing the line below:

source_df.write.format("delta").mode("overwrite").option("mergeSchema", "true").save(destination_mount_filepath)

The background of this is we are loading raw data into a delta table in mounted storage.

I have tried unmounting/remounting storage and recreating the container in storage account. I have checked and validated the data and there are no rows that would fall to this match. I have tried dropping the Unity Catalog tables and using the dbutils to remove the old data behind the Catalog tables to no avail.

Paul92S · ‎04-26-2024

Hi Palash,

We have Unity Catalogs created off external table locations, which are mounted to Databricks from azure DL. If a change of schema has come from upstream this causes this issue. To resolve, read in new source schema names from raw. Performed ALTER TABLE include these new columns then had to run UPDATE SET on the columns to give them an empty string ''. This then allowed for dataframe to overwrite the files in mounted storage without causing an error

View solution in original post

Palash01 · ‎02-20-2024

Hey @Paul92S

Looking at the error message it looks like column "metric_name" is the culprit here:

Understanding the Error:

Character Limit Violation: The error indicates that values in the metric_name column are exceeding the maximum length allowed for char/varchar data types in Unity Catalog.
Schema Mismatch: This likely stems from a discrepancy between the source data schema and the existing Delta table schema.

Troubleshooting Steps:

Use source_df.printSchema() to verify the actual schema of your source data, paying close attention to the metric_name column's data type and length.
Use spark.read.format("delta").load(destination_mount_filepath).printSchema() to inspect the current schema of the Delta table.
If feasible, consider altering the metric_name column in the Delta table to accommodate longer values using ALTER TABLE TABLE_NAME ALTER COLUMN COLUMN_NAME;
If increasing column size isn't viable, explore truncating values in the metric_name column before writing to the Delta table.
If the error message refers to isnull('metric_name'), ensure there are no unintended null values in that column.
If empty strings are valid for metric_name, explicitly set those values to empty strings in the source DataFrame.
Also, check if the incremental files has a different schema vs the file you are using to create the table.

Follow-ups are appreciated!

Leave a like if this helps! Kudos,
Palash

Paul92S · ‎04-26-2024

Hi Palash,

We have Unity Catalogs created off external table locations, which are mounted to Databricks from azure DL. If a change of schema has come from upstream this causes this issue. To resolve, read in new source schema names from raw. Performed ALTER TABLE include these new columns then had to run UPDATE SET on the columns to give them an empty string ''. This then allowed for dataframe to overwrite the files in mounted storage without causing an error