cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the best practice for connecting Power BI to Azure Databricks?

AlbertWang
Contributor III

I refer this document to connect Power BI Desktop and Power BI Service to Azure Databricks.

Connect Power BI to Azure Databricks - Azure Databricks | Microsoft Learn

However, I have a couple of quesitions and concerns. Can anyone kindly help?

  1. It seems like there are only the below three authentication options when connect Power BI Desktop and Power BI Service to Azure Databricks. On Power BI Desktop, the most secure one is Microsoft Entra ID. However, on Power BI Service (using Microsoft Entra ID Service Principal), there is no Client Credential authentication option. That means, Power BI Service cannot get and use a 1-hour valid access token. Either use PAT or Databricks Principal Service client id and secure as username / password. Neither of these two are good enough.
    1. Personal Access Token: Enter your Azure Databricks personal access token.
    2. Microsoft Entra ID: Click Sign in and then follow the on-screen instructions.
    3. Username / Password: Not applicable.
  2. It seems like the Power BI cannot use Job Computer to access Databricks. What types of compute does the Power BI Partner Connect support? I don't really want to startup a all-purpose compute every time when Power BI loads data.

Thank you.

Regards,

Albert

9 REPLIES 9

AlbertWang
Contributor III

Moreover, when I create an all-purpose compute for the Power BI Service, I realize that I cannot choose a service principal as the user for the single user access option. Shall I use `no isolation shared`?

AlbertWang_0-1729644272778.png

 

AlbertWang
Contributor III

Looks like this is the only option. I need to use multiple-node mode. Cannot use single-node mode because I also need the compute to support Unity Catalog.

single-node mode does not support adding service principal as the user for single user access for singn user mode;

single-node mode does not support shared mode.

Shared-no-isolation mode does not support unity catalog.

AlbertWang_1-1729645320708.png

 

AlbertWang
Contributor III

Hi guys, can anybody help?

h_h_ak
Contributor

Hi Albert, 

why you are not using Databricks SQL Warehouse?

Here’s a list of benefits for using Databricks SQL Warehouse with Power BI, covering performance, scalability, caching, and more:

 

1. Optimized Performance for BI Queries: SQL Warehouses are tuned for analytical workloads, providing low-latency and high-performance query execution for Power BI dashboards and reports.

2. Scalability: Autoscaling ensures SQL Warehouses can adjust resources dynamically based on query demand, handling varying workloads without manual intervention.

3. Direct Query Support: Enables real-time data access in Power BI with DirectQuery, so users always see the latest data without requiring a local data cache.

4. Reduced Query Latency through Caching: SQL Warehouses cache query results, enabling faster response times for frequently accessed data, enhancing dashboard interactivity.

5. Resource Efficiency: Cached results lower the need for re-processing large datasets, reducing compute costs and optimizing resource usage.

6. Enhanced User Experience: With caching, Power BI users experience faster load times and smoother interactions, especially for filtering, drilling down, and refreshing views.

7. Automatic Cache Management: Databricks SQL Warehouse handles cache updates and invalidations automatically, ensuring data accuracy without manual cache clearing.

8. Query Consolidation: When multiple users or reports request similar data, the warehouse can serve these queries from cache, reducing redundant queries and improving performance during high-usage periods.

9. Connection Stability and Security: Managed connections offer robust security, including identity management, encryption, and fine-grained access controls, which are critical for compliance and secure data handling.….

Hi @h_h_ak  Thank you for your reply. Do you mean SQL warehouses (Serverless SQL)?

One reason we don't use it is because the price. In our region, the cheapest all-purpose compute and SQL warehouse are:

  1. All-purpose compute: D4ads v5 - USD $0.81/hour (DBU+VMs)
  2. Serverless SQL warehouse: 2X-Small (Standard_E8ds_v4) - USD $3.80/hour (DBU)

I know 2X-Small is faster than D4ads v5.

However, currently we import data from Databricks to Power BI in a daily schedule. Therefore, we don't care too much about the compute startup time and the querying performance.

Moreover, Serverless SQL warehouse does not like Snowflake, the Serverless SQL warehouse will auto stop after 10 minutes of inactivity. That means, we need to pay 10 more minutes for nothing.

Therefore, I think job compute is the best option to use. However, Power BI connect does not support using Databricks job compute.

Btw, whatever compute we use, we still need to use PAT of the Service Principal to authenticate the Power BI Service with Databricks workspace.

Thank you.

Regards,

Albert

h_h_ak
Contributor

Yes, you can select serverless or classic mode.

You can also set auto-termination to 1 min, if cluster is not used:

https://community.databricks.com/t5/warehousing-analytics/1-min-auto-termination/td-p/65534


But have in mind you have also to pay for the cluster-up time in case of all-purpose cluster..

And here a really new post about seamless integration:

https://www.databricks.com/blog/announcing-general-availability-publish-microsoft-power-bi-service-u...

Thank you for your further assistance. However, it seems like the cluster autotermination time cannot be less than 10 minutes.

AlbertWang_0-1730322682975.png

Using CLI.

databricks clusters edit --json '{\"cluster_id\":\"1234-123456-abcdef1f\",\"spark_version\":\"15.4.x-scala2.12\",\"autotermination_minutes\":1,\"autoscale\":{\"max_workers\":1,\"min_workers\":1},\"node_type_id\":\"Standard_D4ads_v5\"}'
>>
Error: The cluster autotermination time cannot be less than 10 minutes.

KamilN
New Contributor II

For SQL Warehouse Serverless it's actually 5 minutes:

KamilN_0-1731406907494.png

In the above example you tried to add classing compute cluster.

Kamil

Akshay_Petkar
Contributor

Hi @AlbertWang 

You have multiple options to connect Power BI with Databricks:

  1. Using Cluster Credentials: Under the cluster details, go to the Advanced Options and select JDBC/ODBC. Here, you’ll find the necessary credentials, such as hostname and HTTP path, for connecting.

  2. Using SQL Warehouse: Go to the SQL Warehouse section and select Connection Details, where you’ll find the hostname and HTTP path needed for the connection.

  3. Using Partner Connect: Click on Power BI within Partner Connect. This will prompt you to download a Power BI connection file for easy setup.

  4. Using Delta Sharing: You can also connect Power BI with Databricks via Delta Sharing for data access.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group