cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
Liphuan
Databricks Employee
Databricks Employee

Designer (5).jpeg

In today's data-driven world, organizations are increasingly relying on business intelligence tools like Power BI to make informed decisions. However, as data becomes more accessible, the risk of unauthorized access and data breaches also increases. Therefore, it's essential to ensure that data is protected and secure.

This blog post outlines how security features in Power BI and Databricks can be used to work together to ensure secure data access to Power BI on top of Databricks. We'll discuss the importance of authentication, access control patterns, and networking security features to ensure that your data remains secure.

Table of Contents


Authentication Methods

Before discussing access control with Power BI with Databricks, we need to understand authentication. Databricks needs to know who's accessing the data (authentication) and then determine what data they have access to (access control).

There are three different methods for Power BI to authenticate to Databricks: username and password, Personal Access Token (PAT), and Open Authentication (OAuth). It's worth noting that these authentication methods are named differently in Power BI Desktop versus Power BI Service.

Authentication Methods

Power BI Desktop

Power BI Service

Username/password

Username/password

Basic

Personal Access Token (PAT)

Personal Access Token (PAT)

Key

Open Authentication (OAuth)

Azure Active Directory (Azure Databricks Connector)

OIDC (Databricks Connector)

OAuth2


Username and Password

When considering the choice of authentication method, it is also worth noting that “username and password” does not support MFA or 2FA. If your organization requires MFA/2FA for authentication, you cannot use “username and password” to authenticate.

Personal Access Tokens

When it comes to Personal Access Tokens (PAT), it can be used for all clouds and it is also the only authentication method that supports Azure service principals (Azure service principals can only be used for Azure, and are not applicable for Databricks on AWS or GCP). In the case of Personal Access Token, authentication is performed by Databricks and not the identity provider.

As per Databricks' security best practice, you should set an expiration date for your Personal Access Token, as it is not safe for you to have a key that does not expire. If the key ends up in the wrong hands, your security access will be compromised. When you have an expiration date for your Personal Access Token, you will need to do key rotation. It's best practice to bake that key rotation as part of your CI/CD process for Power BI.

Open Authentication (OAuth)

The last - and probably most frequently used - authentication method is OAuth. This is worth noting that if you're using Databricks on AWS and you're using Azure Active Directory as OAuth method, Databricks recently rolled out AAD passthrough for Databricks on AWS, which means you can utilize Unity Catalog’s access control in Power BI, please see “access control patterns” later in this blog. In order to use AAD passthrough, you will need to set up account level SSO, please see self-enrolment for private preview.

If you are using the Azure Databricks connector, the OAuth method in Power BI Desktop is called Azure Active Directory (AAD). This is because AAD is the default of OAuth mechanism that Azure Databricks uses. If you're using the Databricks connector, the OAuth method will be called OIDC in Power BI Desktop.

Authentication in the Authoring Process 

Typically, a Power BI dashboard is developed and designed in Power BI Desktop, and once finalized, it is published to the Power BI Service for broader distribution and access. However, the authentication setup is usually different in the authoring process, as broken down in the following common scenarios:

  • Authoring a semantic model in Power BI Desktop
  • Refreshing semantic models in Power BI Service
  • Querying data in Power BI semantic models using Direct Query mode.

Avnish_Jain_0-1715092948350.png


Authentication in Power BI Desktop

As a Power BI developer using Power BI Desktop to access Databricks, you will authenticate with your personal account using one of the three previously described methods. Your authentication request is sent to the Databricks SQL Warehouse, which verifies data access privileges through Unity Catalog before returning the query results accordingly.

Avnish_Jain_1-1715094633478.png

 

Authentication in Power BI Service

Once you complete your semantic model, you publish it to Power BI Service. Here, you'll need to configure the data source credentials using one of three authentication methods. Typically, these credentials are associated with a service account—not your personal account. A service account, like powerbi_refresh@yourcompany.com, is created by an organization and isn't tied to an individual user. This approach helps prevent issues when users leave the organization, ensuring ongoing data refreshes without interruption.

Alternatively, you can use AAD passthrough, which allows the end user's identity to be used with Unity Catalog. More details on this will be covered later in the blog.

Note!

A service principal differs from a service account. It is created through app registration and is associated with a service rather than an individual. Service principals must authenticate to Databricks using a Personal Access Token, as other methods requiring interactive login are not supported for service principals.



Access Control Patterns from Power BI Service

In the Power BI Service, data refresh can occur in different modes such as Import, Composite, or Direct Query. For the Import and Composite modes, data refresh is typically handled using a service account. The same service account can also be used with Direct Query mode, or alternatively, users can opt to use their own Azure Active Directory (AAD) credentials for authentication and data access purposes.

When employing service accounts, it is critical to consider data security and governance, due to the elevated permissions these accounts often possess compared to regular business user accounts. Organizations must carefully implement robust data security measures to manage these elevated permissions effectively.

In this section, we will discuss access control patterns for each of the scenarios once the semantic model is published to Power BI Service.

Avnish_Jain_2-1715095047221.png


Import/Composite on Service Account

When a table in a semantic model uses Import or Composite mode, Power BI needs to perform refreshes. These refreshes authenticate to Databricks using a service account. During the refresh, the SQL Warehouse will check the data access permissions in Unity Catalog based on the privileges assigned to the connecting user - in this case the higher privileged service account - and will send the query results back to Power BI.

If you need to have finer grain access control for end users in Power BI, you must set up row-level security within Power BI itself. This is necessary because Databricks does not have the context to see which individual end users are accessing the semantic model; it only recognizes the access rights of the service account.

Direct Query on Service Account

In Direct Query mode, Power BI can also use service accounts to connect to a SQL warehouse. Similar to the Import mode, the SQL warehouse consults the Unity Catalog to verify the service account’s access to specific tables, rows, and columns, and then returns the query results.

Just as in Import mode, to manage access control for specific end users or groups within Power BI, setting up row-level security is essential. This ensures that even though the SQL warehouse checks access permissions for the service account, Power BI tailors data visibility based on the defined user or group privileges.

Direct Query on AAD Passthrough

In both of the above scenarios, fine grained access control policies would need to be set up both in Unity Catalog (for users that are consuming directly through DBSQL) and Power BI. If you do not want to set up a dual security model, and instead use Unity Catalog to govern everything without having to set up access control in Power BI, you can opt for AAD passthrough.

This is the preferred architecture if you want to simply your access control setup. You will need to set up a direct query semantic model. There will be a checkbox in “Edit data source credential” that says “Report viewers can only access data source with their own Power BI identities using DirectQuery”. If you tick that checkbox, Power BI will pass down the AAD identity to Databricks SQL Warehouse, which will enable Unity Catalog to check whether that end user's AAD identity has access to certain tables, views, and rows/columns.

Liphuan_2-1713544315708.png


Pragmatic Implementation: Combination of Methods

In reality, you will use a combination of these scenarios. You are unlikely to have 100% of your semantic models in Import or Direct Query, and consequently, you are likely going to set up security or access control in both Power BI and Unity Catalog.

If you are using Unity Catalog to govern Power BI’s end users data access in Databricks, you can govern the access to tables, to rows (row filter in Unity Catalog is equivalent to traditional row level security), to mask columns (for sensitive data), and to dynamic views. This governance is only available if you're using direct query with AAD passthrough.

Networking Security

For secure networking access to a Databricks workspace, you can set up Front-end Private Link or IP access list. This means access to a Databricks workspace, ODBC/JDBC connections, REST API calls and Power BI connections all need to originate from a private network or from certain IP ranges. Therefore you will need a Power BI Gateway when you have either Front-end Private Link or IP access list enabled. If you have a back-end only Private Link, you don't need a gateway.

You have two choices for the gateway. One is a VNet gateway, which is a feature that recently went to GA. This is a Microsoft managed solution and so you don't have to worry about maintenance. However, this is a premium only feature.

Another option is the On-premises data gateway which despite the name, your serving layer does not need to be on-premise. This is a traditional way of setting up a data gateway hosted on a VM which you would need to maintain yourself. The advantage is that it's fully controlled by you, so you can set up scaling and load balancing if you need to.

Please see
here for the considerations and documentation for On-premises gateway.

Summary

In conclusion, securely connecting Power BI to Databricks can be achieved using a combination of Power BI and Databricks features. By implementing proper authentication, access control patterns, and networking security features, you can ensure that your data remains secure and protected.

 

1 Comment
johnjohn
New Contributor

Hi,

My colleague (that has no access to Databricks) need to build a Power Bi Report using a table I created on Databricks. My idea was to generate a PAT and give it to him in order to be able to work on the Power BI Report.

In this way, he could possibly connect to all the table I have access to. How can I prevent this situation? Unfortunately I'm not able to create a service account on Databricks (if that could be a solution).

Thank you very much!