cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
jeffreyaven
Databricks Employee
Databricks Employee

Stop Wiring Security Table by Table: How ABAC Changes Everything in Unity Catalog

If you have ever managed column masks or row filters across more than a handful of tables in Unity Catalog, every new table that lands with PII means another ALTER TABLE ... SET MASK command. Another function to create. Another thing to forget. ABAC (Attribute-Based Access Control) fixes this: you tag data once and let policies do the rest.

This article walks through the three approaches to fine-grained access control (FGAC) in Unity Catalog, explains why ABAC is the one that actually scales.

The Challenge: Scaling Fine-Grained Access Control (FGAC)

Imagine you have a customers table with email addresses and phone numbers. You need to mask those columns so that only members of the data_admins group can see them in clear text. Easy enough.

Now imagine you have 50 tables with email columns. Or 200. Every time a data engineer adds a new table with an email field, someone has to remember to create and attach a mask. If they forget, that PII sits there unprotected until an audit catches it. The security model becomes a to-do list instead of a system.

Unity Catalog gives you three ways to solve this. Let's look at each one, then talk about why only one of them holds up at scale.

Approach 1: Manual Column Masks and Row Filters

This is the most common starting point. You write a SQL function that defines both what to mask and who gets to see the original value, then you attach this function to a specific column on a specific table.

Here is a column mask for email addresses on samples.bakehouse.customers:

-- The function defines WHAT to mask AND WHO sees what
CREATE OR REPLACE FUNCTION silver.mask_email(email_address STRING)
RETURNS STRING
RETURN CASE
  WHEN is_account_group_member('data_admins') THEN email_address
  ELSE CONCAT('****@', SPLIT(email_address, '@')[1])
END;

Then you attach it:

ALTER TABLE silver.customers
ALTER COLUMN email_address SET MASK silver.mask_email;

Row filters follow the same pattern. You write a function that returns TRUE for rows the user is allowed to see, then attach it to the table:

-- Again: WHAT to filter AND WHO sees what
CREATE OR REPLACE FUNCTION silver.country_filter(country STRING)
RETURNS BOOLEAN
RETURN CASE
  WHEN is_account_group_member('data_admins') THEN TRUE
  WHEN is_account_group_member('account users')
    THEN country IN ('Australia', 'Japan')
  ELSE FALSE
END;

ALTER TABLE silver.franchises
SET ROW FILTER silver.country_filter ON (country);

This works fine for a few tables. The problem is that both the function and the binding are per-table, per-column. The function itself encodes the WHO (which groups see what) alongside the WHAT (how to transform the data). There is no separation of concerns.

Every column needs its own ALTER TABLE statement. Every new table needs manual attention.

Approach 2: Dynamic Views

Dynamic views embed security logic directly in a view definition using CASE expressions and context functions. They combine row filtering and column masking in a single object.

CREATE OR REPLACE VIEW silver.v_customers_secure AS
SELECT
  customerID,
  first_name,
  last_name,
  CASE
    WHEN is_account_group_member('data_admins') THEN email_address
    ELSE CONCAT('****@', SPLIT(email_address, '@')[1])
  END AS email_address,
  CASE
    WHEN is_account_group_member('data_admins') THEN phone_number
    ELSE CONCAT('***-***-', RIGHT(phone_number, 4))
  END AS phone_number,
  city,
  country
FROM silver.customers
WHERE is_account_group_member('data_admins') OR country = 'Australia';

Dynamic views are the recommended approach when sharing data via Delta Sharing, because row filters and column masks are not applied to Delta Sharing recipients. The security logic in a dynamic view is enforced because recipients query the view, not the underlying table.

But dynamic views still suffer from the same scaling problem. Each view is hand-crafted for a specific table. Add a new table with PII and you need a new view. The security logic is spread across dozens of view definitions with no central point of control.

Approach 3: ABAC (The One That Scales)

ABAC separates the concerns. The masking function defines only the WHAT. A policy object defines the WHO and the WHERE (which scope it applies to). A governed tag connects them.

abac_overview.png

Here is the key difference. Look at the mask function for ABAC:

-- The function defines ONLY the WHAT
-- No is_account_group_member() check, no WHO logic
CREATE OR REPLACE FUNCTION silver.mask_email(email_address STRING)
RETURNS STRING
RETURN CONCAT('****@', SPLIT(email_address, '@')[1]);

There is no is_account_group_member() call. The function does not care who is querying. It only knows how to mask an email address. The WHO is handled entirely by the policy.

Step 1: Tag the Column

Governed tags are the bridge between your data and your policies. You apply a tag to any column that contains a certain class of data:

ALTER TABLE silver.customers
ALTER COLUMN email_address SET TAGS ('class.email_address');

Unity Catalog ships with system-defined governed tags like class.email_address, class.us_ssn, and others. You can also create your own through the Catalog Explorer UI or programmatically via the Databricks API.

Step 2: Create the Policy

The policy ties it all together. It says: for a given scope, find all columns with a matching tag, apply this mask function, and apply it to these groups:

CREATE OR REPLACE POLICY pii_email_policy
ON SCHEMA silver
COLUMN MASK silver.mask_email
TO `account users`
EXCEPT `adminuser@mycompany.com`
FOR TABLES
MATCH COLUMNS hasTag('class.email_address') AS email_address
ON COLUMN email_address;

That is it. Every column in the silver schema that carries the class.email_address tag is now masked for all members of `account users` (everyone essentially) except for specific principals (users or service principals which are exempt from the policy).

You can also scope policies at the catalog level.  A single ON CATALOG policy would cover every schema and every table in the entire catalog.

Tag columns once (which is a good practice anyway) -> define the Policy once -> That's it!

ABAC for Row Filters Too

Everything above focused on column masking, but ABAC works for row filters as well. The pattern is the same: write a function that defines the filter logic (the WHAT), tag the relevant column, and create a policy.

-- Function: WHAT only (returns TRUE for permitted rows)
CREATE OR REPLACE FUNCTION silver.filter_apac_only(region STRING)
RETURNS BOOLEAN
RETURN region = 'APAC';

-- Tag the column
ALTER TABLE silver.customers
ALTER COLUMN region SET TAGS ('region_filter');

-- Policy: WHO + WHERE
CREATE OR REPLACE POLICY filter_apac_rows_policy
ON CATALOG bakehouse
ROW FILTER silver.filter_apac_only
TO `apac-users`
FOR TABLES
MATCH COLUMNS hasTag('region_filter') AS region
USING COLUMNS (region);

Now every table in the bakehouse catalog that has a column tagged with region_filter will automatically restrict apac-users to APAC rows only. No per-table wiring needed.

Deciding on the FGAC Approach

Manual column masks, row filters and dynamic views may still have their place in your FGAC strategy, the attached decision tree provides some guidance on what to use when and why.

fgac_decision_tree.png

Things to Know Before You Start

1. Governed tags are not the same as regular tags. Regular tags (the ones you create implicitly with SET TAGS) are for search and discovery. ABAC policies only work with governed tags, which are defined through the Catalog Explorer API/UI and have controlled allowed values.

2. Manual masks and ABAC policies cannot coexist on the same column.  If a column already has a manual mask, you need to drop it before an ABAC policy can take effect. Run ALTER TABLE ... ALTER COLUMN ... DROP MASK first.

3. Only one row filter policy can apply to a table per user at a time. Plan your policy scoping accordingly.

4. Policies inherit downward. A catalog-level policy covers all schemas and tables within it. A schema-level policy covers all tables in the schema. Use narrower scopes for exceptions.

5. DBR 16.4+ or Serverless is required. Users on older runtimes cannot query ABAC-secured tables at all. If you have mixed runtime versions, consider scoping policies to specific groups so that users outside those groups can still access the tables on older runtimes.

Wrapping Up

Manual masks and row filters are fine when you have a small number of tables. Dynamic views are useful when you are sharing data through Delta Sharing. But if you are operating at any real scale, ABAC is the approach that actually works long term.

The core insight is simple: separate the WHAT from the WHO. Let the function be a pure transformation. Let the policy handle targeting. Let the tag be the glue. When your next data engineer creates a table and tags the email column, the mask is already there waiting for it.

That is security that scales.

2 Comments