cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Masking techniques for more PII columns

Phani1
Valued Contributor

Hi Databricks Team,

We would appreciate it if you could inform us about the situations when Column-Masking, Row-Level Filtering, and Attributed-Based Masking should be utilized, as well as the recommended technique for handling large data volumes containing 100's of (around 1k ) PII columns.

Regard

Phanindra

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Phani1, Iโ€™d be happy to provide some guidance on when to use Column-Masking, Row-Level Filtering, and Attribute-Based Masking in Databricks, as well as how to handle large data volumes with numerous PII columns.

 

Column-Masking: Column-Masking is used when you want to control access to specific columns in your data. Itโ€™s particularly useful when you have sensitive data in certain columns that should not be accessible to all users. For example, you might want to mask personally identifiable information (PII) like social security numbers or credit card numbers. In Databricks, you can use the MASK function to apply data masking to specific columns in a Delta ta....

 

Row-Level Filtering: Row-Level Filtering is used when you want to control access to specific rows in your data. This is useful when different users or user groups should have access to different subsets of the data. For instance, a regional manager might only need access to data from their specific region. In Databricks, you can use row filters to apply a filter to a table so that subsequent queries only ....

 

Attribute-Based Masking: Attribute-Based Masking is a dynamic approach to data access control that c.... Itโ€™s useful when access to data needs to be controlled based on the attributes of the user requesting the data. For example, an HR department might be allowed to see PII, while other departments are not.

 

When dealing with large data volumes with numerous PII columns, you might consider the following strategies:

  1. Encryption: Use libraries like Fernet for encryption, along with user-defined functions (UDFs), and ....
  2. Optimization: Use Databricks Delta, which is designed for large-scale data workloads. It provides mechanisms like data skipping and file compaction to optimize data processing. The OPTIMIZE command in Databricks can be used to compact files and get a file size of up to 1GB, wh....
  3. Masking: Apply column-masking to PII columns to protect sensitive data.

Kaniz
Community Manager
Community Manager

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 
 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!