- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-20-2024 09:29 AM
Hi,
I am having an issue of loading source data into a delta table/ unity catalog. The error we are recieving is the following:
grpc_message:"[DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (isnull(\'metric_name) OR (length(\'metric_name) <= 0))
We get this issue when executing the line below:
- Labels:
-
Delta Lake
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-26-2024 07:43 AM
Hi Palash,
We have Unity Catalogs created off external table locations, which are mounted to Databricks from azure DL. If a change of schema has come from upstream this causes this issue. To resolve, read in new source schema names from raw. Performed ALTER TABLE include these new columns then had to run UPDATE SET on the columns to give them an empty string ''. This then allowed for dataframe to overwrite the files in mounted storage without causing an error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-20-2024 05:49 PM
Hey @Paul92S
Looking at the error message it looks like column "metric_name" is the culprit here:
Understanding the Error:
- Character Limit Violation: The error indicates that values in the metric_name column are exceeding the maximum length allowed for char/varchar data types in Unity Catalog.
- Schema Mismatch: This likely stems from a discrepancy between the source data schema and the existing Delta table schema.
Troubleshooting Steps:
- Use source_df.printSchema() to verify the actual schema of your source data, paying close attention to the metric_name column's data type and length.
- Use spark.read.format("delta").load(destination_mount_filepath).printSchema() to inspect the current schema of the Delta table.
- If feasible, consider altering the metric_name column in the Delta table to accommodate longer values using ALTER TABLE TABLE_NAME ALTER COLUMN COLUMN_NAME;
- If increasing column size isn't viable, explore truncating values in the metric_name column before writing to the Delta table.
- If the error message refers to isnull('metric_name'), ensure there are no unintended null values in that column.
- If empty strings are valid for metric_name, explicitly set those values to empty strings in the source DataFrame.
- Also, check if the incremental files has a different schema vs the file you are using to create the table.
Follow-ups are appreciated!
Palash
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-26-2024 07:43 AM
Hi Palash,
We have Unity Catalogs created off external table locations, which are mounted to Databricks from azure DL. If a change of schema has come from upstream this causes this issue. To resolve, read in new source schema names from raw. Performed ALTER TABLE include these new columns then had to run UPDATE SET on the columns to give them an empty string ''. This then allowed for dataframe to overwrite the files in mounted storage without causing an error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2024 10:29 AM
Setting this config change before making the write command solved it for us: spark.conf.set("spark.sql.legacy.charVarcharAsString", True)