โ10-12-2023 09:28 AM
Dear all,
We are working on the column masking topic recently, using the column filter feature of Unity Catalog.
We recently face the problem of masking a nested column (a sub-column within a STRUCT type column).
We just wonder if this is even possible with Unity Catalog to mask only this sub-column, or we have no other option than to mask the whole STRUCT column.
Best regards
Tien Dat PHAN
โ10-12-2023 02:05 PM
To update on this topic, not only the STRUCT type but also the ARRAY type is having issue with applying the MASK function. My guess is that since both STRUCT and ARRAY types are not standard types of SQL, the ALTER COLUMN to add MASK to those columns always have issues.
It always results in the following exception, even we try with just simple ARRAY<INT> type:
"If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog. To tolerate the error on drop use DROP SCHEMA IF EXISTS."
Could you please take a look and see if we have any solution at all for masking those column types?
Best regards
Tien Dat PHAN
โ12-27-2023 05:27 AM
Maybe old fashioned Dynamic view will do the work?
โ12-27-2023 01:08 PM
Thanks for your suggestion.
The core concern of this thread remains the same, does not matter if we use VIEW or MASKING functions. That is to which extend Databricks SQL support complex type manipulation? Can we make a function/a VIEW definition that utilizes/alters a sub-field of a complex-typed column for whatever purpose we want?
Does it even support it at all? If it is not supported, is there any plan to support it in future? Or what would be the alternatives?
โ12-26-2023 11:15 PM
Same as my concern. I try to mask with Decimal datatype but It doesn't work as well. The example of DBX for column mask maybe work well with simple datatype like string. Somehow it doesn't meet our requirements for data governance.
โ12-27-2023 01:02 PM
For your Decimal datatype masking, you just need to make sure that your SQL function declaration contains correct datatype.
Something like: CREATE FUNCTION catalogName.default.my_function(input decimal(x, y)) RETURNS DECIMAL(x, y) RETURN input;
It should work.
On the other hand, my concern is for masking or filtering on sub-field of nested column types like STRUCT or ARRAY.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group