cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

CLS function with lookup fails on dates

skuvisk
New Contributor

Hello,

I'm conducting research on utilizing CLS in a project. We are implementing a lookup table to determine what tags a user can see. The CLS function looks like this:

CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_column(value VARIANT, tag STRING)
  RETURNS VARIANT
  COMMENT 'ABAC utility: Completely mask string values with asterisks'
  RETURN 
  CASE
    WHEN EXISTS (
      SELECT 1
      FROM {catalog}.{schema}.lookup_access
        WHERE (lookup_access.tag = mask_column.tag 
        AND principal = current_user()) 
        AND can_view = true
    ) THEN value
    WHEN schema_of_variant(value) = 'DATE' THEN '1970-01-01'::VARIANT
    WHEN schema_of_variant(value) = 'TIMESTAMP' THEN '1970-01-01'::VARIANT
    WHEN schema_of_variant(value) = 'INT' THEN 0::VARIANT
    WHEN schema_of_variant(value) = 'BIGINT' THEN 0::VARIANT
    WHEN schema_of_variant(value) = 'DOUBLE' THEN 0.00::VARIANT
    WHEN schema_of_variant(value) = 'STRING' THEN '***'::VARIANT
    ELSE NULL::VARIANT
  END;

Used alone, it works perfectly fine. It also works as intended when implemented with a policy. But, when using a policy with this function, and selecting a DATE type column from a table, the following error appears:

[INVALID_VARIANT_CAST] The variant value `0` cannot be cast into `"DATE"`. Please use `try_variant_get` instead. SQLSTATE: 22023

If I remove the date fields from the select query, the function works as intended. Additionally, if I remove the lookup table query from the UDF, the policy also works for date-type columns.

Why is this happening? Are there any workarounds, or is this a known bug?

Thank you!

 

 

2 REPLIES 2

Poorva21
New Contributor II

Your masking UDF returns a VARIANT, but it is being applied as a column-level security (CLS) policy on a column with a fixed data type (e.g., DATE).

Inside a CLS policy, Databricks requires that the final returned value must cast back into the target columnโ€™s data type.

This is not enforced when you call the function directly (because VARIANT can hold anything), but is enforced when CLS rewrites the query, because the engine injects something similar to:

CAST(mask_column(value, 'TAG') AS DATE)

 

Then this happens:

When CLS checks your lookup table, the EXISTS(...) subquery returns FALSE,

So the function moves to the masking cases,

For DATE, the default mask is '1970-01-01'::VARIANT โ€” that part is OK,

But before hitting the DATE fallback, the engine evaluates all CASE branches,

and the INT fallback (0::VARIANT) is also part of the CASE expression tree,

The engine tries to determine a common supertype of all CASE branches,

It incorrectly concludes that the CASE can return an INT-like VARIANT,

And when casting back to DATE, the VARIANT 0 cannot be cast to DATE โ†’ error:

[INVALID_VARIANT_CAST] The variant value `0` cannot be cast into "DATE".

 

This only happens when:

โœ” CLS is used

โœ” AND the UDF queries another table (EXISTS)

โœ” AND the masked column is DATE/TIMESTAMP

โœ” AND the UDF returns numeric values for other types

This is not encountered when simply calling the UDF directly.

skuvisk
New Contributor

Thank you for an insightful answer @Poorva21

I conclude from your reasoning that this is the result of an optimization/engine error. It seems like I will need to resort to a workaround for the date columns then...

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now