cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Data linkage and analysis with masked data

whuang
New Contributor II

Hello

Looking for guidance to keep data masked if it needs to link with unmasked data, or analysis requires a combination of masked and unmasked data. How have you tackle this challenge?

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions

MoJaMa
Databricks Employee
Databricks Employee

Then you will have a problem unfortunately because from the perspective of the engine the principal is not allowed to see/use the real values from dataset1.

The principal needs to be able to unmask the data from dataset1 for the actual "value" to be able to join to dataset2.

If they only see the masked value then you're joining *** to 123 which will not work.

The masking rules should be setup in some sort of centralized fashion (ideally using using Governed Tags) so that the same rules apply to the same classification of data on all the datasets in a catalog/schema (using ABAC). Else you are gonna run into these inconsistencies.

View solution in original post

4 REPLIES 4

MoJaMa
Databricks Employee
Databricks Employee

Assuming you are using Databricks Table RLS/CLM or ABAC RLS/CLM, this will work out of the box for the principal/identity who's allowed to see the dataset1:unmasked version of the masked data and dataset2: unmasked data.

Though I would question how the same data element can be tagged in a way to be masked in 1 table and unmasked in another.

Maybe you can elaborate with the actual business scenario and personas.

whuang
New Contributor II

Thanks for your initial response. The use case is principal/identity is allowed to see dataset 1 masked and dataset2 unmasked.

dataset 1 is from UnityCatalog1 where the principal/identity does not have control over masking policy

dataset2 is from UnityCatalog2 where the principal/identity has control over masking policy

Any insight would be appreciated.

Thanks.

MoJaMa
Databricks Employee
Databricks Employee

Then you will have a problem unfortunately because from the perspective of the engine the principal is not allowed to see/use the real values from dataset1.

The principal needs to be able to unmask the data from dataset1 for the actual "value" to be able to join to dataset2.

If they only see the masked value then you're joining *** to 123 which will not work.

The masking rules should be setup in some sort of centralized fashion (ideally using using Governed Tags) so that the same rules apply to the same classification of data on all the datasets in a catalog/schema (using ABAC). Else you are gonna run into these inconsistencies.

whuang
New Contributor II

Thank you for responding. It definitely gives us some thought of how far to extend "data mesh" principles in the physical data layer.