cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Masking techniques for more PII columns

Phani1
Valued Contributor

Hi Databricks Team,

We would appreciate it if you could inform us about the situations when Column-Masking, Row-Level Filtering, and Attributed-Based Masking should be utilized, as well as the recommended technique for handling large data volumes containing 100's of (around 1k ) PII columns.

Regard

Phanindra

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Phani1, I’d be happy to provide some guidance on when to use Column-Masking, Row-Level Filtering, and Attribute-Based Masking in Databricks, as well as how to handle large data volumes with numerous PII columns.

 

Column-Masking: Column-Masking is used when you want to control access to specific columns in your data. It’s particularly useful when you have sensitive data in certain columns that should not be accessible to all users. For example, you might want to mask personally identifiable information (PII) like social security numbers or credit card numbers. In Databricks, you can use the MASK function to apply data masking to specific columns in a Delta ta....

 

Row-Level Filtering: Row-Level Filtering is used when you want to control access to specific rows in your data. This is useful when different users or user groups should have access to different subsets of the data. For instance, a regional manager might only need access to data from their specific region. In Databricks, you can use row filters to apply a filter to a table so that subsequent queries only ....

 

Attribute-Based Masking: Attribute-Based Masking is a dynamic approach to data access control that c.... It’s useful when access to data needs to be controlled based on the attributes of the user requesting the data. For example, an HR department might be allowed to see PII, while other departments are not.

 

When dealing with large data volumes with numerous PII columns, you might consider the following strategies:

  1. Encryption: Use libraries like Fernet for encryption, along with user-defined functions (UDFs), and ....
  2. Optimization: Use Databricks Delta, which is designed for large-scale data workloads. It provides mechanisms like data skipping and file compaction to optimize data processing. The OPTIMIZE command in Databricks can be used to compact files and get a file size of up to 1GB, wh....
  3. Masking: Apply column-masking to PII columns to protect sensitive data.

Kaniz
Community Manager
Community Manager

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 
 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.