cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Masking techniques for more PII columns

Phani1
Valued Contributor II

Hi Databricks Team,

We would appreciate it if you could inform us about the situations when Column-Masking, Row-Level Filtering, and Attributed-Based Masking should be utilized, as well as the recommended technique for handling large data volumes containing 100's of (around 1k ) PII columns.

Regard

Phanindra

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @Phani1, Iโ€™d be happy to provide some guidance on when to use Column-Masking, Row-Level Filtering, and Attribute-Based Masking in Databricks, as well as how to handle large data volumes with numerous PII columns.

 

Column-Masking: Column-Masking is used when you want to control access to specific columns in your data. Itโ€™s particularly useful when you have sensitive data in certain columns that should not be accessible to all users. For example, you might want to mask personally identifiable information (PII) like social security numbers or credit card numbers. In Databricks, you can use the MASK function to apply data masking to specific columns in a Delta ta....

 

Row-Level Filtering: Row-Level Filtering is used when you want to control access to specific rows in your data. This is useful when different users or user groups should have access to different subsets of the data. For instance, a regional manager might only need access to data from their specific region. In Databricks, you can use row filters to apply a filter to a table so that subsequent queries only ....

 

Attribute-Based Masking: Attribute-Based Masking is a dynamic approach to data access control that c.... Itโ€™s useful when access to data needs to be controlled based on the attributes of the user requesting the data. For example, an HR department might be allowed to see PII, while other departments are not.

 

When dealing with large data volumes with numerous PII columns, you might consider the following strategies:

  1. Encryption: Use libraries like Fernet for encryption, along with user-defined functions (UDFs), and ....
  2. Optimization: Use Databricks Delta, which is designed for large-scale data workloads. It provides mechanisms like data skipping and file compaction to optimize data processing. The OPTIMIZE command in Databricks can be used to compact files and get a file size of up to 1GB, wh....
  3. Masking: Apply column-masking to PII columns to protect sensitive data.

Kaniz_Fatma
Community Manager
Community Manager

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 
 

@Kaniz_Fatma 

Thanks for the answers. To extend this question further, how can we apply column masks to DLT tables?  DLT table is using Materialized Views under the hood. And databricks has mad coumn mask enabled for materialized view under Public Review. But same code not working in DLT table yet, any idea how or when DLT tables could apply column mask functions?

BingWang_2-1726624780227.png

DLT tables column mask not working yet: 

BingWang_3-1726625000760.png

Regards

Bing

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group