Thursday
Hello,
My table in databricks is using a function to mask PII columns, in my power BI report I am using import mode.
I notice that my report in power BI will have the PII masked depending on the privileges of the user who publish the report, it means if the user who publish the report can see unmask data every user can see the report with unmask data.
I am trying to refresh the semantic model of the report in a databricks job, but despite the user who create the connection can see unmask data the report is still masked.
So, my question is: How can I solve this? is there a way to publish the report directly from databricks but having the report with the PII unmasked.
Thanks in advance.
Thursday
You are encountering a key intersection nuance of Unity Catalog Dynamic Data Masking and Power BI Import Mode semantics.
In Power BI Import mode, data is extracted and baked into the semantic model at refresh time. The extraction query is executed using the credentials of the identity configured for the dataset refresh. If that identity has UNMASKED privileges in Unity Catalog, the data is pulled unmasked into Power BI. Once imported, the data loses its Databricks-side masking context. Anyone with access to the Power BI report will see the unmasked data, which is a severe security and compliance risk.
Even though the user who created the connection can see unmasked data, the actual identity executing the refresh in the Power BI Service does not have the UNMASKED privilege. Power BI dataset refresh credentials are distinct from the person who authored the report. If the dataset is configured to refresh using a Service Principal, a Gateway account or a different user account that lacks the explicit UNMASKED grant, Unity Catalog will dynamically mask the data during the extraction phase.
You can follow below
You can automate the publication of a Power BI report directly from a Databricks Workflow using the Power BI API or the Python client library. However, the publishing mechanism does not dictate the masking.
You can update the report to use DirectQuery with SSO as it delegates PII governance to Unity Catalog, ensures masking based on the viewer's identity and removes the security burden from Power BI.
Friday
A couple of points below that I think are worth a clarification on the above answer, which is generally correct.
- There is no such thing as an UNMASKED privilege. If a masking function is applied, it gets executed no matter what. You can be exempt from the masking either in the function itself, or by being in the Except for clause if using ABAC
- Be aware that DirectQuery does not work if you are on AWS on the Databricks side. There is no direct integration between AWS and Microsoft to allow for the user context to come through correctly
- There are no Power BI native functions that work the same way as a masking function in Databricks. RLS works on rows, OLS can only hide the column not mask it fully/partially.
TLDR: If you use Import mode, the data is imported in the security context of the user used for the refresh - this determines masking. DirectQuery can execute the query on behalf of the user directly. Usually DirectQuery means a bit of a performance hit on the Power BI side + added cost on the Databricks side as well.
Friday
Hi thanks for your response,
More context, I cannot use direct query is a very heavy dataset. I use import mode, of course I know in import mode the dataset keeps the masking if the user who publish is not exempt of it. I am testing to skip this using a job where the task is a power bi refresh, using the connection of a user who can see the data unmasked but is not working, the report is still masked.
Thanks in advance
12m ago
Hi,
I've done some research on this and I think you maybe updating the credentials in the wrong place. It's slightly confusing but when you update a power bi dashboard it uses the credentials in the Power BI connection set up not the job credentials.
After updating credentials, refresh the dataset โ the data should now import unmasked
Worth also noting that the recommended way to respect RLS with Power BI is to use Direct Query. When you say the dataset is too big. If your queries and tables are structured correctly then it should handle it better as you're using the power of spark sql to aggregate the data rather than Power BI. Unless you are actually needing to pull the detailed dataset into the dashboard with each query, but this is rarely needed. It's definitely worth some more testing on this.
Linking to docs for reference https://learn.microsoft.com/en-us/power-query/connectors/databricks-azure
Thanks,
Emma