<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Masking of PII data in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/masking-of-pii-data/m-p/152264#M53797</link>
    <description>&lt;P class=""&gt;Hi&amp;nbsp;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/89037"&gt;@ShankarM&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P class=""&gt;This is a known limitation with Unity Catalog column masking policies: &lt;STRONG&gt;write operations such as MERGE INTO and INSERT are not supported on tables that have a column mask policy applied&lt;/STRONG&gt;. When your ingestion job tries to load data into a table with a masked column, Databricks blocks the write at the engine level — this is the error you're seeing.&lt;/P&gt;&lt;P class=""&gt;There are two common root causes to check:&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;1. Compute access mode mismatch&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;Row filters and column masks require your cluster to run in &lt;STRONG&gt;Shared access mode&lt;/STRONG&gt; (not Single User / Assigned mode). If you're using a Single User cluster, upgrade it to Shared access mode or use a SQL Warehouse instead: &lt;STRONG&gt;Access Mode -&amp;gt; Shared&lt;/STRONG&gt;.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;2. Unsupported write operation&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;If your incremental load uses MERGE INTO, this is the likely culprit. MERGE is not supported on tables with column mask policies. Consider these alternatives:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;Option A – Use streaming append&lt;/STRONG&gt;: If new rows are only appended (no upserts), use Delta streaming:&lt;STRONG&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;df&lt;SPAN class=""&gt;.&lt;/SPAN&gt;writeStream&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;format&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"delta"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;option&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"checkpointLocation"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;"/path/to/checkpoint"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;SPAN class=""&gt;.&lt;/SPAN&gt;outputMode&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"append"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;SPAN class=""&gt;.&lt;/SPAN&gt;toTable&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"catalog.schema.your_history_table"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Option B – Stage then merge without the mask&lt;/STRONG&gt;: Load data into a staging table (without a mask policy), run your MERGE there, then copy final results into the masked table using INSERT with a service principal that has the UNMASK privilege — avoiding MERGE on the masked table directly.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Option C – Use INSERT OVERWRITE with partitions&lt;/STRONG&gt;: For partition-based incremental loads, INSERT OVERWRITE on specific partitions is supported even on masked tables.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The column mask policy is enforced at &lt;STRONG&gt;read time&lt;/STRONG&gt;&lt;SPAN&gt; (users without privilege see &lt;/SPAN&gt;XXXX&lt;SPAN&gt;), but certain write paths are restricted to prevent policy bypass. The recommended long-term pattern is to keep masking at the read layer and use unmasked staging tables for ingestion pipelines.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;References:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;A class="" href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/" rel="noopener" target="_blank"&gt;Row filters and column masks – Databricks AWS docs&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A class="" href="https://learn.microsoft.com/en-us/azure/databricks/tables/row-and-column-filters" rel="noopener" target="_blank"&gt;Azure Databricks – Row and column filters&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Fri, 27 Mar 2026 09:07:13 GMT</pubDate>
    <dc:creator>Ale_Armillotta</dc:creator>
    <dc:date>2026-03-27T09:07:13Z</dc:date>
    <item>
      <title>Masking of PII data</title>
      <link>https://community.databricks.com/t5/data-engineering/masking-of-pii-data/m-p/152248#M53793</link>
      <description>&lt;P&gt;We have a below requirement&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;There is a history table where data need to be loaded incrementally. &amp;nbsp;&lt;/LI&gt;&lt;LI&gt;This table contains a PII field which has been masked using a custom masking function (allow visibility for a specific user group, XXXX for rest).&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;When we run the ingestion job, we face an error saying cluster is not able to load data into the table where a column is masked.&amp;nbsp;&lt;P&gt;Can you let us know what is the issue.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Fri, 27 Mar 2026 06:44:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/masking-of-pii-data/m-p/152248#M53793</guid>
      <dc:creator>ShankarM</dc:creator>
      <dc:date>2026-03-27T06:44:35Z</dc:date>
    </item>
    <item>
      <title>Re: Masking of PII data</title>
      <link>https://community.databricks.com/t5/data-engineering/masking-of-pii-data/m-p/152264#M53797</link>
      <description>&lt;P class=""&gt;Hi&amp;nbsp;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/89037"&gt;@ShankarM&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P class=""&gt;This is a known limitation with Unity Catalog column masking policies: &lt;STRONG&gt;write operations such as MERGE INTO and INSERT are not supported on tables that have a column mask policy applied&lt;/STRONG&gt;. When your ingestion job tries to load data into a table with a masked column, Databricks blocks the write at the engine level — this is the error you're seeing.&lt;/P&gt;&lt;P class=""&gt;There are two common root causes to check:&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;1. Compute access mode mismatch&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;Row filters and column masks require your cluster to run in &lt;STRONG&gt;Shared access mode&lt;/STRONG&gt; (not Single User / Assigned mode). If you're using a Single User cluster, upgrade it to Shared access mode or use a SQL Warehouse instead: &lt;STRONG&gt;Access Mode -&amp;gt; Shared&lt;/STRONG&gt;.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;2. Unsupported write operation&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;If your incremental load uses MERGE INTO, this is the likely culprit. MERGE is not supported on tables with column mask policies. Consider these alternatives:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;Option A – Use streaming append&lt;/STRONG&gt;: If new rows are only appended (no upserts), use Delta streaming:&lt;STRONG&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;df&lt;SPAN class=""&gt;.&lt;/SPAN&gt;writeStream&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;format&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"delta"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;option&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"checkpointLocation"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;"/path/to/checkpoint"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;SPAN class=""&gt;.&lt;/SPAN&gt;outputMode&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"append"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;SPAN class=""&gt;.&lt;/SPAN&gt;toTable&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"catalog.schema.your_history_table"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Option B – Stage then merge without the mask&lt;/STRONG&gt;: Load data into a staging table (without a mask policy), run your MERGE there, then copy final results into the masked table using INSERT with a service principal that has the UNMASK privilege — avoiding MERGE on the masked table directly.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Option C – Use INSERT OVERWRITE with partitions&lt;/STRONG&gt;: For partition-based incremental loads, INSERT OVERWRITE on specific partitions is supported even on masked tables.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The column mask policy is enforced at &lt;STRONG&gt;read time&lt;/STRONG&gt;&lt;SPAN&gt; (users without privilege see &lt;/SPAN&gt;XXXX&lt;SPAN&gt;), but certain write paths are restricted to prevent policy bypass. The recommended long-term pattern is to keep masking at the read layer and use unmasked staging tables for ingestion pipelines.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;References:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;A class="" href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/filters-and-masks/" rel="noopener" target="_blank"&gt;Row filters and column masks – Databricks AWS docs&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A class="" href="https://learn.microsoft.com/en-us/azure/databricks/tables/row-and-column-filters" rel="noopener" target="_blank"&gt;Azure Databricks – Row and column filters&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Fri, 27 Mar 2026 09:07:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/masking-of-pii-data/m-p/152264#M53797</guid>
      <dc:creator>Ale_Armillotta</dc:creator>
      <dc:date>2026-03-27T09:07:13Z</dc:date>
    </item>
    <item>
      <title>Re: Masking of PII data</title>
      <link>https://community.databricks.com/t5/data-engineering/masking-of-pii-data/m-p/152275#M53799</link>
      <description>&lt;P&gt;Hi, I believe this is a permissions issue related to the Service Principal who runs the job. If the pipeline is doing a merge into statement then it needs access to be able to see the rows, so it can tell how to match them.&lt;/P&gt;
&lt;P&gt;First thing to do is to make sure the service principal is in the account level group that the masking function uses.&lt;/P&gt;
&lt;P&gt;I hope this helps.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;&lt;BR /&gt;Emma&lt;/P&gt;</description>
      <pubDate>Fri, 27 Mar 2026 09:36:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/masking-of-pii-data/m-p/152275#M53799</guid>
      <dc:creator>emma_s</dc:creator>
      <dc:date>2026-03-27T09:36:04Z</dc:date>
    </item>
  </channel>
</rss>

