cancel
Showing results for 
Search instead for 
Did you mean: 
Houston
cancel
Showing results for 
Search instead for 
Did you mean: 

How to enforce dynamic data Masking and access control directly in Databricks Unity Catalog

arifhassan
New Contributor II

Introduction

Data teams today face a constant challenge — balancing data accessibility and data privacy.
As organizations modernize analytics in the cloud, they must protect sensitive fields like identifiers, contracts, and regions dynamically — not through static SQL views.

Databricks Unity Catalog introduces a unified governance layer that simplifies this challenge.
It provides fine-grained access control, column-level masking, and row-level security (RLS) — enabling compliance and data protection directly within the platform.

This guide walks you through a complete, reusable process to automate both column masking and row-level filtering for Delta tables using Unity Catalog.
Each step includes clear code examples and an explanation so you can reproduce it end-to-end in your environment.

 

⚙️ Process Flow Overview

┌──────────────────────┐
1. Input Parameters
└──────────┬───────────┘

┌──────────────────────┐
2. Create Mask UDF
└──────────┬───────────┘

┌──────────────────────┐
3. Tag Columns
└──────────┬───────────┘

┌──────────────────────┐
4. Apply Column Mask
└──────────┬───────────┘

┌──────────────────────┐
5. Create Row Filter
└──────────┬───────────┘

┌──────────────────────┐
6. Apply RLS Policy
└──────────┬───────────┘

┌──────────────────────┐
7. Validate & Audit
└──────────────────────┘
 

1. Input & Setup

Create notebook widgets and read configuration parameters dynamically.

# Widgets (no defaults)
dbutils.widgets.text("schema", "")
dbutils.widgets.text("table", "")
dbutils.widgets.text("tag_column", "") # e.g., "member_id, medicaid_id"
dbutils.widgets.text("lookup_column", "") # e.g., region or contract_name
dbutils.widgets.text("mask_policy", "") # e.g., mask_sensitive_data
dbutils.widgets.text("sensitivity_tag", "") # e.g., pii_sensitive
dbutils.widgets.text("CommonConfig_schema", "") # e.g., governance_config
# Read values
schema = dbutils.widgets.get("schema").strip()
table = dbutils.widgets.get("table").strip()
tag_column = dbutils.widgets.get("tag_column").strip()
lookup_column = dbutils.widgets.get("lookup_column").strip()
mask_policy = dbutils.widgets.get("mask_policy").strip()
sensitivity_tag = dbutils.widgets.get("sensitivity_tag").strip()
config_schema = dbutils.widgets.get("schema").strip()
# Current catalog & derive names
catalog = spark.sql("SELECT current_catalog()").collect()[0][0]
table_name = f"{catalog}.{schema}.{table}"
mask_fn_name = f"{config_schema}.{mask_policy}"
rowfilter_fn = f"{config_schema}.{mask_policy}_row_rls"
tag_columns = [c.strip() for c in tag_column.split(",") if c.strip()]
print("Table:", table_name)
print("Mask Function:", mask_fn_name)
print("Row Filter:", rowfilter_fn)

💡 Purpose:
Provide flexible, reusable inputs for any Delta table in Unity Catalog.

 

2. Create the Masking Function

Create a masking UDF that hides data for unauthorized users or restricted segments.

ALLOWED_GROUP = "<ALLOWED_GROUP>"
RESTRICTED_VALUES = ["<RESTRICTED_VALUE_1>", "<RESTRICTED_VALUE_2>"]
restricted_list_sql = ", ".join([f"'{v}'" for v in RESTRICTED_VALUES])
create_mask_fn_sql = f"""
CREATE OR REPLACE FUNCTION {mask_fn_name}(tag_column STRING, segment_attr STRING)
RETURNS STRING
RETURN
CASE
WHEN segment_attr IN ({restricted_list_sql})
AND NOT is_account_group_member('{ALLOWED_GROUP}')
THEN 'xxxx'
ELSE tag_column
END;
"""
spark.sql(create_mask_fn_sql)
print(" Masking function created.")

💡 Purpose:
Enforces column-level data protection at query time, dynamically based on user identity.

 

3. Tag Sensitive Columns

Assign metadata tags to sensitive columns for governance and audit visibility.

for col in tag_columns:
spark.sql(f"""
ALTER TABLE {table_name}
ALTER COLUMN `{col}`
SET TAGS ('masking_group' = '{sensitivity_tag}');
"""
)
print(" Tags applied.")

💡 Purpose:
Centralized discovery and classification of sensitive data.

 

4. Apply Column Mask Policy

Attach the masking UDF to each tagged column.

for col in tag_columns:
spark.sql(f"""
ALTER TABLE {table_name}
ALTER COLUMN `{col}`
SET MASK {mask_fn_name}
USING COLUMNS ({lookup_column});
"""
)
print(" Column masks attached.")

💡 Purpose:
Automatically redacts data for unauthorized users while allowing full access for approved groups.

 

5. Create Row-Level Security Function

Define a Boolean UDF that controls which rows each user can see.

create_rowfilter_sql = f"""
CREATE OR REPLACE FUNCTION {rowfilter_fn}(segment_attr STRING)
RETURNS BOOLEAN
RETURN
CASE
WHEN is_account_group_member('{ALLOWED_GROUP}') THEN TRUE
WHEN segment_attr IN ({restricted_list_sql}) THEN FALSE
ELSE TRUE
END;
"""

spark.sql(create_rowfilter_sql)
print(" Row filter function created.")

💡 Purpose:
Implements row-level visibility rules based on user group membership.

 

6. Apply the Row Filter Policy

Attach the RLS function to the table.

apply_rowfilter_sql = f"""
ALTER TABLE {table_name}
SET ROW FILTER {rowfilter_fn} ON ({lookup_column});
"""

spark.sql(apply_rowfilter_sql)
print(" Row filter applied.")

💡 Purpose:
Automatically excludes restricted rows from query results for unauthorized users.

 

7. Validate & Audit

Run validation queries to verify masking and row filtering behavior.

# Example: check masked values for restricted users
first_col = tag_columns[0]
validation_sql = f"""
SELECT `{first_col}` AS masked_value, {lookup_column} AS segment
FROM {table_name}
WHERE {lookup_column} IN ({restricted_list_sql})
LIMIT 25
"""

display(spark.sql(validation_sql))

Inspect governance metadata:

SELECT * FROM information_schema.column_tags WHERE table_name = '<table>';
SELECT * FROM information_schema.column_masks WHERE table_name = '<table>';

💡 Purpose:
Confirm policy enforcement and ensure auditability via Unity Catalog metadata.

 

Conclusion

By combining column masking and row-level security, Databricks Unity Catalog allows teams to automate data protection without relying on complex application logic or multiple view layers.
This framework provides a reusable, metadata-driven approach that scales across datasets while maintaining compliance and transparency.

With just a few parameterized steps, you can ensure every Delta table in your Lakehouse is both governed and accessible — safely.

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now