12-11-2025 05:46 AM
Hello,
I'm conducting research on utilizing CLS in a project. We are implementing a lookup table to determine what tags a user can see. The CLS function looks like this:
CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_column(value VARIANT, tag STRING)
RETURNS VARIANT
COMMENT 'ABAC utility: Completely mask string values with asterisks'
RETURN
CASE
WHEN EXISTS (
SELECT 1
FROM {catalog}.{schema}.lookup_access
WHERE (lookup_access.tag = mask_column.tag
AND principal = current_user())
AND can_view = true
) THEN value
WHEN schema_of_variant(value) = 'DATE' THEN '1970-01-01'::VARIANT
WHEN schema_of_variant(value) = 'TIMESTAMP' THEN '1970-01-01'::VARIANT
WHEN schema_of_variant(value) = 'INT' THEN 0::VARIANT
WHEN schema_of_variant(value) = 'BIGINT' THEN 0::VARIANT
WHEN schema_of_variant(value) = 'DOUBLE' THEN 0.00::VARIANT
WHEN schema_of_variant(value) = 'STRING' THEN '***'::VARIANT
ELSE NULL::VARIANT
END;Used alone, it works perfectly fine. It also works as intended when implemented with a policy. But, when using a policy with this function, and selecting a DATE type column from a table, the following error appears:
[INVALID_VARIANT_CAST] The variant value `0` cannot be cast into `"DATE"`. Please use `try_variant_get` instead. SQLSTATE: 22023If I remove the date fields from the select query, the function works as intended. Additionally, if I remove the lookup table query from the UDF, the policy also works for date-type columns.
Why is this happening? Are there any workarounds, or is this a known bug?
Thank you!
12-11-2025 09:00 AM
Your masking UDF returns a VARIANT, but it is being applied as a column-level security (CLS) policy on a column with a fixed data type (e.g., DATE).
Inside a CLS policy, Databricks requires that the final returned value must cast back into the target column’s data type.
This is not enforced when you call the function directly (because VARIANT can hold anything), but is enforced when CLS rewrites the query, because the engine injects something similar to:
CAST(mask_column(value, 'TAG') AS DATE)
Then this happens:
When CLS checks your lookup table, the EXISTS(...) subquery returns FALSE,
So the function moves to the masking cases,
For DATE, the default mask is '1970-01-01'::VARIANT — that part is OK,
But before hitting the DATE fallback, the engine evaluates all CASE branches,
and the INT fallback (0::VARIANT) is also part of the CASE expression tree,
The engine tries to determine a common supertype of all CASE branches,
It incorrectly concludes that the CASE can return an INT-like VARIANT,
And when casting back to DATE, the VARIANT 0 cannot be cast to DATE → error:
[INVALID_VARIANT_CAST] The variant value `0` cannot be cast into "DATE".
This only happens when:
✔ CLS is used
✔ AND the UDF queries another table (EXISTS)
✔ AND the masked column is DATE/TIMESTAMP
✔ AND the UDF returns numeric values for other types
This is not encountered when simply calling the UDF directly.
12-11-2025 09:00 AM
Your masking UDF returns a VARIANT, but it is being applied as a column-level security (CLS) policy on a column with a fixed data type (e.g., DATE).
Inside a CLS policy, Databricks requires that the final returned value must cast back into the target column’s data type.
This is not enforced when you call the function directly (because VARIANT can hold anything), but is enforced when CLS rewrites the query, because the engine injects something similar to:
CAST(mask_column(value, 'TAG') AS DATE)
Then this happens:
When CLS checks your lookup table, the EXISTS(...) subquery returns FALSE,
So the function moves to the masking cases,
For DATE, the default mask is '1970-01-01'::VARIANT — that part is OK,
But before hitting the DATE fallback, the engine evaluates all CASE branches,
and the INT fallback (0::VARIANT) is also part of the CASE expression tree,
The engine tries to determine a common supertype of all CASE branches,
It incorrectly concludes that the CASE can return an INT-like VARIANT,
And when casting back to DATE, the VARIANT 0 cannot be cast to DATE → error:
[INVALID_VARIANT_CAST] The variant value `0` cannot be cast into "DATE".
This only happens when:
✔ CLS is used
✔ AND the UDF queries another table (EXISTS)
✔ AND the masked column is DATE/TIMESTAMP
✔ AND the UDF returns numeric values for other types
This is not encountered when simply calling the UDF directly.
12-12-2025 01:24 AM
Thank you for an insightful answer @Poorva21.
I conclude from your reasoning that this is the result of an optimization/engine error. It seems like I will need to resort to a workaround for the date columns then...