cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
cancel
Showing results for 
Search instead for 
Did you mean: 

Filter sensitive data on nested column

TienDat
New Contributor III

Dear all,

We are working on the column masking topic recently, using the column filter feature of Unity Catalog.

We recently face the problem of masking a nested column (a sub-column within a STRUCT type column).

We just wonder if this is even possible with Unity Catalog to mask only this sub-column, or we have no other option than to mask the whole STRUCT column.

Best regards

Tien Dat PHAN

5 REPLIES 5

TienDat
New Contributor III

To update on this topic, not only the STRUCT type but also the ARRAY type is having issue with applying the MASK function. My guess is that since both STRUCT and ARRAY types are not standard types of SQL, the ALTER COLUMN to add MASK to those columns always have issues. 

It always results in the following exception, even we try with just simple ARRAY<INT> type:

"If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog. To tolerate the error on drop use DROP SCHEMA IF EXISTS."

Could you please take a look and see if we have any solution at all for masking those column types?

Best regards

Tien Dat PHAN

Maybe old fashioned Dynamic view will do the work?

TienDat
New Contributor III

Thanks for your suggestion.

The core concern of this thread remains the same, does not matter if we use VIEW or MASKING functions. That is to which extend Databricks SQL support complex type manipulation? Can we make a function/a VIEW definition that utilizes/alters a sub-field of a complex-typed column for whatever purpose we want?

Does it even support it at all? If it is not supported, is there any plan to support it in future? Or what would be the alternatives?

 

DucNguyen
New Contributor II

Same as my concern. I try to mask with Decimal datatype but It doesn't work as well. The example of DBX for column mask maybe work well with simple datatype like string. Somehow it doesn't meet our requirements for data governance.

TienDat
New Contributor III

For your Decimal datatype masking, you just need to make sure that your SQL function declaration contains correct datatype. 

Something like: CREATE FUNCTION catalogName.default.my_function(input decimal(x, y)) RETURNS DECIMAL(x, y) RETURN input;

It should work.

On the other hand, my concern is for masking or filtering on sub-field of nested column types like STRUCT or ARRAY.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.