Databricks Community

carlos_tasayco · Thursday

Hello,

My table in databricks is using a function to mask PII columns, in my power BI report I am using import mode.

I notice that my report in power BI will have the PII masked depending on the privileges of the user who publish the report, it means if the user who publish the report can see unmask data every user can see the report with unmask data.

I am trying to refresh the semantic model of the report in a databricks job, but despite the user who create the connection can see unmask data the report is still masked.

So, my question is: How can I solve this? is there a way to publish the report directly from databricks but having the report with the PII unmasked.

Thanks in advance.

balajij8 · Thursday

You are encountering a key intersection nuance of Unity Catalog Dynamic Data Masking and Power BI Import Mode semantics.

In Power BI Import mode, data is extracted and baked into the semantic model at refresh time. The extraction query is executed using the credentials of the identity configured for the dataset refresh. If that identity has UNMASKED privileges in Unity Catalog, the data is pulled unmasked into Power BI. Once imported, the data loses its Databricks-side masking context. Anyone with access to the Power BI report will see the unmasked data, which is a severe security and compliance risk.

Even though the user who created the connection can see unmasked data, the actual identity executing the refresh in the Power BI Service does not have the UNMASKED privilege. Power BI dataset refresh credentials are distinct from the person who authored the report. If the dataset is configured to refresh using a Service Principal, a Gateway account or a different user account that lacks the explicit UNMASKED grant, Unity Catalog will dynamically mask the data during the extraction phase.

You can follow below

Import Mode Refresh Identity Fix (If Import Mode is required) - If you must use Import mode for performance reasons, you must ensure the exact identity performing the refresh has the correct privileges and you must restrict report access. Find the exact account (User or Service Principal) being used for OAuth2 authentication under Data Source Credentials and give it relevant access to view unmasked data in report. You can run the Databricks job or Power BI refresh again and the data will now import unmasked. Because the data is now unmasked in Power BI, you must apply Power BI Object-Level Security (OLS) or Row-Level Security (RLS) to hide or restrict the PII columns for general report viewers.
DirectQuery Recommended- You can move away from Import mode for PII-heavy tables and the users can dynamically view unmasked data based on their privilege in the UC. Configure the Databricks data source in Power BI to use OAuth2 with SSO. When a user views the report, Power BI passes that specific user’s Entra ID identity to Databricks. Unity Catalog evaluates the Dynamic Data Masking policy. If the user has UNMASKED privilege, they see the data. If they dont, Databricks returns the masked data.

You can automate the publication of a Power BI report directly from a Databricks Workflow using the Power BI API or the Python client library. However, the publishing mechanism does not dictate the masking.

You can update the report to use DirectQuery with SSO as it delegates PII governance to Unity Catalog, ensures masking based on the viewer's identity and removes the security burden from Power BI.

KrisJohannesen · Friday

A couple of points below that I think are worth a clarification on the above answer, which is generally correct.

- There is no such thing as an UNMASKED privilege. If a masking function is applied, it gets executed no matter what. You can be exempt from the masking either in the function itself, or by being in the Except for clause if using ABAC

- Be aware that DirectQuery does not work if you are on AWS on the Databricks side. There is no direct integration between AWS and Microsoft to allow for the user context to come through correctly

- There are no Power BI native functions that work the same way as a masking function in Databricks. RLS works on rows, OLS can only hide the column not mask it fully/partially.

TLDR: If you use Import mode, the data is imported in the security context of the user used for the refresh - this determines masking. DirectQuery can execute the query on behalf of the user directly. Usually DirectQuery means a bit of a performance hit on the Power BI side + added cost on the Databricks side as well.

carlos_tasayco · Friday

Hi thanks for your response,
More context, I cannot use direct query is a very heavy dataset. I use import mode, of course I know in import mode the dataset keeps the masking if the user who publish is not exempt of it. I am testing to skip this using a job where the task is a power bi refresh, using the connection of a user who can see the data unmasked but is not working, the report is still masked.

Thanks in advance

emma_s · 12m ago

Hi,

I've done some research on this and I think you maybe updating the credentials in the wrong place. It's slightly confusing but when you update a power bi dashboard it uses the credentials in the Power BI connection set up not the job credentials.

Go to Power BI Service → Settings → Dataset → Data Source Credentials.
Change the authentication to use a Service Principal or user account that is explicitly exempt from the masking function in Unity Catalog.
In Databricks, ensure that identity is lwill not be applied to this user.

After updating credentials, refresh the dataset — the data should now import unmasked

Worth also noting that the recommended way to respect RLS with Power BI is to use Direct Query. When you say the dataset is too big. If your queries and tables are structured correctly then it should handle it better as you're using the power of spark sql to aggregate the data rather than Power BI. Unless you are actually needing to pull the detailed dataset into the dashboard with each query, but this is rarely needed. It's definitely worth some more testing on this.

Linking to docs for reference https://learn.microsoft.com/en-us/power-query/connectors/databricks-azure

Thanks,

Emma