cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Any Advice on Dynamic Masking while maintaining performance?

tana_sakakimiya
New Contributor III

I plan to mask columns with a specific tag like "sensitive", "PII" which represents that the column values are ought to be seen by privileged user groups because they contain credentials or personal identity data.

To implement that i plan to create a function and apply to a catalog by policy.

However, I am worried on performance. did anyone try this and experience performance issue?
or is there anyone has better approach to perform the task?

Note that i have a requirement not to encrpt data.

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

saurabh18cs
Honored Contributor

Hi @tana_sakakimiya 

Your approachโ€”using Unity Catalog column tags (like "sensitive" or "PII") and applying masking policies based on those tagsโ€”is a recommended and scalable way to manage data access in Databricks, especially for compliance and privacy. Masking policies are evaluated at query time, performance impact is minimal if logic is simple. only complex masking expressions involving udf's or regex may slow it down.

I would suggest to use ABAC (attribute based access control) which is coming soon already in private preview. ABAC allows you to control access to data based on attributes (tags, labels, or properties) of users, groups, or data objects, rather than just roles (RBAC). Dynamically evaluated and highly flexible. This approach avoid maintaining lot of roles with organisation changes.

An simple example for column masking rule under ABAC:

SET RULE analyst_sales_mask
ON CATALOG business_unit
COLUMN MASK mask_pii
TO `privileged_employees`
FOR TABLES
WHEN has_tag(โ€˜txnโ€™)
WHEN COLUMNS col_has_tag(โ€˜piiโ€™)

Br

Saurabh

View solution in original post

1 REPLY 1

saurabh18cs
Honored Contributor

Hi @tana_sakakimiya 

Your approachโ€”using Unity Catalog column tags (like "sensitive" or "PII") and applying masking policies based on those tagsโ€”is a recommended and scalable way to manage data access in Databricks, especially for compliance and privacy. Masking policies are evaluated at query time, performance impact is minimal if logic is simple. only complex masking expressions involving udf's or regex may slow it down.

I would suggest to use ABAC (attribute based access control) which is coming soon already in private preview. ABAC allows you to control access to data based on attributes (tags, labels, or properties) of users, groups, or data objects, rather than just roles (RBAC). Dynamically evaluated and highly flexible. This approach avoid maintaining lot of roles with organisation changes.

An simple example for column masking rule under ABAC:

SET RULE analyst_sales_mask
ON CATALOG business_unit
COLUMN MASK mask_pii
TO `privileged_employees`
FOR TABLES
WHEN has_tag(โ€˜txnโ€™)
WHEN COLUMNS col_has_tag(โ€˜piiโ€™)

Br

Saurabh

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now