cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Masking of PII data

ShankarM
Contributor

We have a below requirement 

  • There is a history table where data need to be loaded incrementally.  
  • This table contains a PII field which has been masked using a custom masking function (allow visibility for a specific user group, XXXX for rest). 
  • When we run the ingestion job, we face an error saying cluster is not able to load data into the table where a column is masked. 

    Can you let us know what is the issue.

2 REPLIES 2

Ale_Armillotta
Contributor III

Hi  @ShankarM ,

This is a known limitation with Unity Catalog column masking policies: write operations such as MERGE INTO and INSERT are not supported on tables that have a column mask policy applied. When your ingestion job tries to load data into a table with a masked column, Databricks blocks the write at the engine level โ€” this is the error you're seeing.

There are two common root causes to check:

1. Compute access mode mismatch

Row filters and column masks require your cluster to run in Shared access mode (not Single User / Assigned mode). If you're using a Single User cluster, upgrade it to Shared access mode or use a SQL Warehouse instead: Access Mode -> Shared.

2. Unsupported write operation

If your incremental load uses MERGE INTO, this is the likely culprit. MERGE is not supported on tables with column mask policies. Consider these alternatives:

  • Option A โ€“ Use streaming append: If new rows are only appended (no upserts), use Delta streaming:(df.writeStream.format("delta").option("checkpointLocation", "/path/to/checkpoint") .outputMode("append") .toTable("catalog.schema.your_history_table"))
  • Option B โ€“ Stage then merge without the mask: Load data into a staging table (without a mask policy), run your MERGE there, then copy final results into the masked table using INSERT with a service principal that has the UNMASK privilege โ€” avoiding MERGE on the masked table directly.
  • Option C โ€“ Use INSERT OVERWRITE with partitions: For partition-based incremental loads, INSERT OVERWRITE on specific partitions is supported even on masked tables.

The column mask policy is enforced at read time (users without privilege see XXXX), but certain write paths are restricted to prevent policy bypass. The recommended long-term pattern is to keep masking at the read layer and use unmasked staging tables for ingestion pipelines.

References:

emma_s
Databricks Employee
Databricks Employee

Hi, I believe this is a permissions issue related to the Service Principal who runs the job. If the pipeline is doing a merge into statement then it needs access to be able to see the rows, so it can tell how to match them.

First thing to do is to make sure the service principal is in the account level group that the masking function uses.

I hope this helps.


Thanks,

Emma