cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Filter sensitive data on nested column

TienDat
New Contributor III

Dear all,

We are working on the column masking topic recently, using the column filter feature of Unity Catalog.

We recently face the problem of masking a nested column (a sub-column within a STRUCT type column).

We just wonder if this is even possible with Unity Catalog to mask only this sub-column, or we have no other option than to mask the whole STRUCT column.

Best regards

Tien Dat PHAN

5 REPLIES 5

TienDat
New Contributor III

To update on this topic, not only the STRUCT type but also the ARRAY type is having issue with applying the MASK function. My guess is that since both STRUCT and ARRAY types are not standard types of SQL, the ALTER COLUMN to add MASK to those columns always have issues. 

It always results in the following exception, even we try with just simple ARRAY<INT> type:

"If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog. To tolerate the error on drop use DROP SCHEMA IF EXISTS."

Could you please take a look and see if we have any solution at all for masking those column types?

Best regards

Tien Dat PHAN

Wojciech_BUK
Valued Contributor III

Maybe old fashioned Dynamic view will do the work?

TienDat
New Contributor III

Thanks for your suggestion.

The core concern of this thread remains the same, does not matter if we use VIEW or MASKING functions. That is to which extend Databricks SQL support complex type manipulation? Can we make a function/a VIEW definition that utilizes/alters a sub-field of a complex-typed column for whatever purpose we want?

Does it even support it at all? If it is not supported, is there any plan to support it in future? Or what would be the alternatives?

 

DucNguyen
New Contributor II

Same as my concern. I try to mask with Decimal datatype but It doesn't work as well. The example of DBX for column mask maybe work well with simple datatype like string. Somehow it doesn't meet our requirements for data governance.

TienDat
New Contributor III

For your Decimal datatype masking, you just need to make sure that your SQL function declaration contains correct datatype. 

Something like: CREATE FUNCTION catalogName.default.my_function(input decimal(x, y)) RETURNS DECIMAL(x, y) RETURN input;

It should work.

On the other hand, my concern is for masking or filtering on sub-field of nested column types like STRUCT or ARRAY.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group