- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Introduction
This article is a must-read if you manage data and analytics with Databricks on AWS, Power BI, and Microsoft Entra ID. Integrating Databricks on AWS with Power BI through Single Sign-On (SSO) using Microsoft Entra ID can streamline your data analytics workflow significantly. Leveraging SSO in Power BI enhances security, simplifies access management, and provides a more efficient, user-friendly experience. This comprehensive guide will walk you through the process of configuring the preview for SSO integration of Databricks on AWS, Power BI, and Microsoft Entra ID, and it includes guided demo content to help you follow along
When to use Direct Query and SSO in Power BI
There are three scenarios Direct Query will be more appropriate than Import mode.
- The semantic model is too big for the Power BI capacity (Pro capacity model size limit is 1GB, P1 capacity size limit is 25GB, see details of model size limit here)
- Data in Databricks is frequently updated, eg streaming tables. In this scenario, it is recommended to use Direct Query mode to reflect the latest changes in data in Power BI.
You want to use a single access control mechanism across Databricks and Power BI, ie. Unity Catalog. As using Import mode, you will have to set up RLS in Power BI, you will end up with a dual access control mechanism in both Unity Catalog and Power BI, Direct Query mode does not require access control in Power BI and can utilize ACL in Unity Catalog (requires Entra ID as an identity provider and Azure Databricks connector).
Databricks & Power BI Connector Support & Considerations
Background
Databricks launched the first Power BI connector, Azure Databricks, in 2020, enabling seamless connectivity for Azure Databricks users with Entra ID as their identity provider to Power BI. Since then, we have expanded support for additional cloud deployments and authentication options by introducing a second connector, Databricks in 2022, for Databricks users on AWS.
When using Azure Databricks
Since Azure's default OAuth mechanism is Entra ID, there is only one connector choice when using Azure Databricks - the Azure Databricks Connector. In this connector, username/password and personal access token authentications are identical to those of the Databricks connector, but the default OAuth mechanism is Entra ID.
When using Databricks on AWS
For Databricks on AWS with Entra ID as an identity provider, you can use Azure Databricks connector or Databricks connector based on requirements. If you have a requirement to use the Direct Query Viewer Credential option (let Power BI pass end user identity down to Databricks and use Unity Catalog for access control of end users), you will need to use Azure Databricks connector. Otherwise, the recommendation is to use the Databricks connector due to the fact that you can control the OAuth application settings such as the access and refresh token lifetime TTL.
Private Preview - Configure Databricks on AWS sign-on from Power BI with Microsoft Entra ID
A private preview (documentation) is available allowing SSO to Power BI from Databricks on AWS using Microsoft Entra ID. This guide, written as of July 9, 2024, will walk you through the configuration process. As always refer to our documentation for more information on the level of support to expect for our various previews release types.
Requirements
Before you configure SSO from Power BI from Databricks on AWS with Microsoft Entra ID, make sure you have the following:
- Minimum Power BI version 2.124.581.0 (Jan 2024), but we recommend using the latest version i.e. June 2024 (at the time of writing the article).
- A Microsoft Entra user with a valid email claim. This email claim must also be a user in your Databricks workspace.
- Unified login enabled for your Databricks workspace
- Your Microsoft Entra tenant ID. See How to find your Microsoft Entra tenant ID in the Microsoft Entra documentation.
The Microsoft Entra tenant ID must match the Power BI tenant ID. To find your Power BI tenant ID, see link. Your tenant ID is the value after ctid= in the tenant URL.
Note
If your workspace doesn’t meet the requirements, contact your Databricks representative to enroll your Microsoft Entra tenant in the Private Preview. You’ll have to provide your Microsoft Entra tenant ID and your Databricks account ID.
Self-serve enrollment steps
To enroll your Microsoft Entra tenant in the Private Preview, follow the steps in SSO to Databricks with Microsoft Entra ID (formerly Azure Active Directory), but don’t paste the OpenID Connect metadata document value into the OpenID issuer URL field. Instead, your OpenID issuer URL must include your Microsoft Entra tenant ID and end with a forward slash. The format must be exactly the following: https://sts.windows.net/<tenant-id>/
Detailed step by step
The below section provides a detailed step by step on the Self-serve enrollment steps above and shows how to configure settings in AWS Account Console and Azure Portal for Entra ID. Feel free to skip to the next section if you have already completed this.
AWS Databricks Account Console
Important Note: The following steps involve adjusting settings in the Single sign-on section of the Account Console. Prior to implementation, it is strongly recommended to test this in a sandbox environment. To avoid any potential issues with access to Databricks during single sign-on testing, it is suggested to keep the account console open in a separate browser window. You may also establish emergency access with security keys as a precaution against lockouts. Refer to the Configure emergency access section for instructions on how to set this up.
- Log in to Databricks Account Console as an account administrator.
- Navigate to Settings > Single sign-on
- Copy the value in Databricks redirect URL: https://accounts.cloud.databricks.com/oidc/consume
Azure Portal - Entra ID
- Log into Azure Portal
- Navigate to Microsoft Entra ID
- Add a App registration
- Give it a name
- On the Redirect URI select web and paste the Databricks redirect URL: https://accounts.cloud.databricks.com/oidc/consume
- Gather the required information from the Microsoft Entra ID application: Application (client) ID and the Directory (tenant) ID
- Create a secret - In the left pane, click Certificates & secrets
- Click + New client secret.
- Enter a description and choose an expiration and click Add.
- Copy the secret Value.
AWS Databricks Console
- Return to the Databricks Account Console > Settings > Single sign-on tab and enter values you copied from the identity provider application to the Client ID, Client secret.
- IMPORTANT - For the OpenID issuer URL field, this is where the change has happened. Use the following format:
https://sts.windows.net/<tenant-id>/ - Press Save
- Click Test-SSO and verify that you get SSO Test Connection succeded
- Click Enable SSO
- Enable Unified login
- Confirm
- Choose either All workspaces or Selected Workspaces
Troubleshoot configuration
If the Test SSO validation step in SSO to Databricks with Microsoft Entra ID (formerly Azure Active Directory) fails, verify the following:
- The OpenID issuer URL contains your Microsoft Entra tenant ID and ends with a forward slash. For example:
https://sts.windows.net/<tenant-id>/
- Your Microsoft Entra user has a valid email claim. This email claim must also be a user in your Databricks workspace.
Make sure your user email property is consistent with your User principal name property. Both properties must be members of your Databricks workspace.
You might have to configure optional claims. To do this, follow https://learn.microsoft.com/entra/identity-platform/optional-claims. Make sure you have the email claim in both id token and access token.
- One way to check is through Azure Portal > Microsoft Entra ID > Select the App registration > Token configuration > Add optional claim > email > add. If you are adding this optional claim as a non-admin, you will need an admin to approve the scope.
Demo - Databricks Notebook - Unity Catalog security
If you want to validate and follow along with some demo content, we have provided a Github repo that you can sync to your Databricks workspace. It contains a Databricks Notebook and Power BI template file that will take you through an end to end demo. You need to have permissions to create catalogs, schemas, tables and groups in your Databricks environment.
- Create a git folder and sync the Github repo databricks-blogposts and use the Sparse checkout mode to only sync the subfolder awsdb-pbi-sso
- Open the notebook file aws_databricks_powerbi_sso from the repo and subfolder awsdb-pbi-sso and populate the parameters for catalog, schema and repo_path in the widgets in the beginning of the notebook. To get the repo path you can use the copy URL/path feature
- Create 2 groups: ap_demo_admin and ap_demo_fr and add yourself to the group ap_demo_fr
- Attach an All Purpose Compute cluster (SQL and serverless clusters does not work) and run the notebook cell by cell, which will take you through a demo of setting up Unity Catalog fine-grained access control using row-level and column-level security.
- When you get to the section about Power BI Desktop continue to the next section below.
Enable SSO to access reports in Power BI
You can now enable SSO so that when you publish a report to the Power BI service, users can access reports built using DirectQuery storage mode by passing their Microsoft Entra ID (formerly Azure Active Directory) credentials to Databricks. If you want to follow along with some demo content go to option 1 or if you want to see how to connect manually the proceed to option 2.
Option 1 Use the Demo Power BI Template File
On your Windows Machine with Power BI Desktop installed, download the Power BI template file from the github repo (https://github.com/databricks-solutions/databricks-blogposts/tree/main/awsdb-pbi-sso) and open it. Populate the below parameters:
- Server_Hostname (From your SQL warehouse connection strings)
- HTTP_Path (From your SQL warehouse connection strings)
- Catalog_Name (What you defined in the RLS notebook)
- Schema_Name (What you defined in the RLS notebook)
- When asked for authentication, select the Azure Active Directory (Entra ID) option.
- Save the report, give it a name and Publish it to the Power BI service
Option 2 Build Your Own Report
- Open a blank report in Power BI Desktop, go to Get data.
-
When creating a connection from Power BI Desktop, select the Azure Databricks connector. Although the connector name is Azure Databricks, it works with Databricks on AWS. Do not select the Databricks connector.
- Populate Server hostname and HTTP path(From your SQL warehouse connection strings)
- Make sure you are using Direct Query Mode , click OK.
-
Populate your credentials to authenticate using Azure Active Directory (Entra ID).
- Select <catalog>.<schema>.customers, click on load (<catalog> and <schema> are what you defined in the RLS notebook)
- Build the visualization by dragging two card visualizations, a map visualization, and a table visualization into the report canvas.
- Save the report, give it a name and Publish it to the Power BI service
Power BI Service
Enable SSO access to the report and underlying data source.
- In Power BI service, go to your workspace, and find the semantic model you just published, and click the three dots next to it, click on settings
- Expand Data source credentials, and then click Edit credentials.
- On the configuration dialog, select Report viewers can only access this data source with their own Power BI identities using Direct Query, and then click Sign in.
- With this option selected, access to the data source is handled using DirectQuery and managed using the Microsoft Entra ID identity of the user who is accessing the report. If you don’t select this option, only you, as the user who published the report, have access to the Databricks data source.
Conclusion
Integrating Databricks on AWS with Power BI through Single Sign-On (SSO) using Microsoft Entra ID offers a robust solution for managing access to data in Databricks through Power BI seamlessly. By following the steps outlined in this guide, you can enhance security, simplify access management, and create a more efficient and unified data analytics workflow. The ability to leverage SSO in Power BI will streamline access control for your end-to-end data platform. This integration empowers your team to focus on deriving insights from data rather than managing two separate governance models in Databricks and Power BI. With the demo notebook and pbit file provided in the GitHub outlined in the Resources part of this guide below, you can recreate this SSO passthrough demo yourself easily and replicate it for your own access control use cases.
Resources
- Databricks Documentation - Configure Databricks on AWS sign-on from Power BI with Microsoft Entra ID – Self enrollment for this feature
- Github - Power BI on AWS Databricks using SSO Passthrough – Github Repo to download the notebook and pbit that is used in this blog
- How to connect #PowerBI to #Databricks on AWS using SSO passthrough and use Unity Catalog ACL in PBI
- How to choose #databricks connector for #powerbi: Azure Databricks Connector vs Databricks Connector
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.