cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
alex_phu
Databricks Employee
Databricks Employee

title_logo2.png

Introduction

This article is a must-read if you manage data and analytics with Databricks on AWS, Power BI, and Microsoft Entra ID. Integrating Databricks on AWS with Power BI through Single Sign-On (SSO) using Microsoft Entra ID can streamline your data analytics workflow significantly. Leveraging SSO in Power BI enhances security, simplifies access management, and provides a more efficient, user-friendly experience. This comprehensive guide will walk you through the process of configuring the preview for SSO integration of Databricks on AWS, Power BI, and Microsoft Entra ID, and it includes guided demo content to help you follow along

alex_phu_0-1722376071853.png

 

When to use Direct Query and SSO in Power BI

There are three scenarios Direct Query will be more appropriate than Import mode.

  • The semantic model is too big for the Power BI capacity (Pro capacity model size limit is 1GB, P1 capacity size limit is 25GB, see details of model size limit here)
  • Data in Databricks is frequently updated, eg streaming tables. In this scenario, it is recommended to use Direct Query mode to reflect the latest changes in data in Power BI.

You want to use a single access control mechanism across Databricks and Power BI, ie. Unity Catalog. As using Import mode, you will have to set up RLS in Power BI, you will end up with a dual access control mechanism in both Unity Catalog and Power BI, Direct Query mode does not require access control in Power BI and can utilize ACL in Unity Catalog (requires Entra ID as an identity provider and Azure Databricks connector).

Databricks & Power BI Connector Support & Considerations

Background

Databricks launched the first Power BI connector, Azure Databricks, in 2020, enabling seamless connectivity for Azure Databricks users with Entra ID as their identity provider to Power BI. Since then, we have expanded support for additional cloud deployments and authentication options by introducing a second connector, Databricks in 2022, for Databricks users on AWS.  

When using Azure Databricks

Since Azure's default OAuth mechanism is Entra ID, there is only one connector choice when using Azure Databricks - the Azure Databricks Connector. In this connector, username/password and personal access token authentications are identical to those of the Databricks connector, but the default OAuth mechanism is Entra ID. 

When using Databricks on AWS

For Databricks on AWS with Entra ID as an identity provider, you can use Azure Databricks connector or Databricks connector based on requirements.  If you have a requirement to use the Direct Query Viewer Credential option (let Power BI pass end user identity down to Databricks and use Unity Catalog for access control of end users), you will need to use Azure Databricks connector. Otherwise, the recommendation is to use the Databricks connector due to the fact that you can control the OAuth application settings such as the access and refresh token lifetime TTL.

alex_phu_1-1722376104616.png

 

Private Preview - Configure Databricks on AWS sign-on from Power BI with Microsoft Entra ID

A private preview (documentation) is available allowing SSO to Power BI from Databricks on AWS using Microsoft Entra ID. This guide, written as of July 9, 2024, will walk you through the configuration process. As always refer to our documentation for more information on the level of support to expect for our various previews release types.

Requirements

Before you configure SSO from Power BI from Databricks on AWS with Microsoft Entra ID, make sure you have the following:

  • Minimum Power BI version 2.124.581.0 (Jan 2024), but we recommend using the latest version i.e. June 2024 (at the time of writing the article).
  • A Microsoft Entra user with a valid email claim. This email claim must also be a user in your Databricks workspace.
  • Unified login enabled for your Databricks workspace
  • SAML SSO disabled for your workspace
  • Your Microsoft Entra tenant ID. See How to find your Microsoft Entra tenant ID in the Microsoft Entra documentation.

The Microsoft Entra tenant ID must match the Power BI tenant ID. To find your Power BI tenant ID, see link. Your tenant ID is the value after ctid= in the tenant URL.

Note
If your workspace doesn’t meet the requirements, contact your Databricks representative to enroll your Microsoft Entra tenant in the Private Preview. You’ll have to provide your Microsoft Entra tenant ID and your Databricks account ID.

 

Self-serve enrollment steps

To enroll your Microsoft Entra tenant in the Private Preview, follow the steps in SSO to Databricks with Microsoft Entra ID (formerly Azure Active Directory), but don’t paste the OpenID Connect metadata document value into the OpenID issuer URL field. Instead, your OpenID issuer URL must include your Microsoft Entra tenant ID and end with a forward slash. The format must be exactly the following: https://sts.windows.net/<tenant-id>/

Detailed step by step

The below section provides a detailed step by step on the Self-serve enrollment steps above and shows how to configure settings in AWS Account Console and Azure Portal for Entra ID. Feel free to skip to the next section if you have already completed this.

AWS Databricks Account Console

Important Note: The following steps involve adjusting settings in the Single sign-on section of the Account Console. Prior to implementation, it is strongly recommended to test this in a sandbox environment. To avoid any potential issues with access to Databricks during single sign-on testing, it is suggested to keep the account console open in a separate browser window. You may also establish emergency access with security keys as a precaution against lockouts. Refer to the Configure emergency access section for instructions on how to set this up.

Azure Portal - Entra ID

  • Log into Azure Portal
  • Navigate to Microsoft Entra ID
    alex_phu_2-1722373641882.png
  • Add a App registration
  • Give it a name
  • On the Redirect URI select web and paste the Databricks redirect URL: https://accounts.cloud.databricks.com/oidc/consume
    alex_phu_4-1722373948314.png
  • Gather the required information from the Microsoft Entra ID application: Application (client) ID and the Directory (tenant) ID
    alex_phu_3-1722373913194.png
  • Create a secret - In the left pane, click Certificates & secrets
  • Click + New client secret.
  • Enter a description and choose an expiration and click Add.
    alex_phu_0-1722374104144.png
  • Copy the secret Value.
    alex_phu_1-1722374127464.png

     



AWS Databricks Console

  • Return to the Databricks Account Console > Settings > Single sign-on tab and enter values you copied from the identity provider application to the Client ID, Client secret.

  • IMPORTANT - For the OpenID issuer URL field, this is where the change has happened. Use the following format:
    https://sts.windows.net/<tenant-id>/

  • Press Savealex_phu_2-1722374179815.png
  • Click Test-SSO and verify that you get SSO Test Connection succeded
  • Click Enable SSO
    alex_phu_3-1722374208924.png
  • Enable Unified loginalex_phu_4-1722374250558.png
  • Confirm
    alex_phu_5-1722374268591.png
  • Choose either All workspaces or Selected Workspaces
    alex_phu_6-1722374297047.png

     

Troubleshoot configuration

If the Test SSO validation step in SSO to Databricks with Microsoft Entra ID (formerly Azure Active Directory) fails, verify the following:

  • The OpenID issuer URL contains your Microsoft Entra tenant ID and ends with a forward slash. For example:
    https://sts.windows.net/<tenant-id>/
  • Your Microsoft Entra user has a valid email claim. This email claim must also be a user in your Databricks workspace.
    Make sure your user email property is consistent with your User principal name property. Both properties must be members of your Databricks workspace.

You might have to configure optional claims. To do this, follow https://learn.microsoft.com/entra/identity-platform/optional-claims. Make sure you have the email claim in both id token and access token.

  • One way to check is through Azure Portal > Microsoft Entra ID > Select the App registration > Token configuration > Add optional claim > email > add. If you are adding this optional claim as a non-admin, you will need an admin to approve the scope.

    alex_phu_7-1722374365652.png
    alex_phu_8-1722374374156.png

     


Demo - Databricks Notebook - Unity Catalog security

If you want to validate and follow along with some demo content, we have provided a Github repo that you can sync to your Databricks workspace. It contains a Databricks Notebook and Power BI template file that will take you through an end to end demo. You need to have permissions to create catalogs, schemas, tables and groups in your Databricks environment.

  • Create a git folder and sync the Github repo databricks-blogposts and use the Sparse checkout mode to only sync the subfolder awsdb-pbi-sso

    temp1234.png

  • Open the notebook file aws_databricks_powerbi_sso from the repo and subfolder awsdb-pbi-sso and populate the parameters for catalog, schema and repo_path in the widgets in the beginning of the notebook. To get the repo path you can use the copy URL/path feature
    temp1345.png

    alex_phu_10-1722374513531.png

     

  • Create 2 groups: ap_demo_admin and ap_demo_fr and add yourself to the group ap_demo_fr
  • Attach an All Purpose Compute cluster (SQL and serverless clusters does not work) and run the notebook cell by cell, which will take you through a demo of setting up Unity Catalog fine-grained access control using row-level and column-level security.
  • When you get to the section about Power BI Desktop continue to the next section below.

 

Enable SSO to access reports in Power BI

You can now enable SSO so that when you publish a report to the Power BI service, users can access reports built using DirectQuery storage mode by passing their Microsoft Entra ID (formerly Azure Active Directory) credentials to Databricks. If you want to follow along with some demo content go to option 1  or if you want to see how to connect manually the proceed to option 2.

Option 1 Use the Demo Power BI Template File

On your Windows Machine with Power BI Desktop installed, download the Power BI template file from the github repo (https://github.com/databricks-solutions/databricks-blogposts/tree/main/awsdb-pbi-sso) and open it. Populate the below parameters:

  • Server_Hostname (From your SQL warehouse connection strings)
  • HTTP_Path (From your SQL warehouse connection strings)
  • Catalog_Name (What you defined in the RLS notebook)
  • Schema_Name (What you defined in the RLS notebook)
    alex_phu_12-1722374699079.png
    alex_phu_0-1722375035783.png

     

  • When asked for authentication, select the Azure Active Directory (Entra ID) option.
  • Save the report, give it a name and Publish it to the Power BI service
    alex_phu_14-1722374786448.png

     

Option 2 Build Your Own Report

  • Open a blank report in Power BI Desktop, go to Get data.
    alex_phu_15-1722374859253.png
  • When creating a connection from Power BI Desktop, select the Azure Databricks connector. Although the connector name is Azure Databricks, it works with Databricks on AWS. Do not select the Databricks connector. 

    alex_phu_1-1722375118435.png
  • Populate Server hostname and HTTP path(From your SQL warehouse connection strings)
    alex_phu_2-1722375173865.png
  • Make sure you are using Direct Query Mode , click OK.
    alex_phu_3-1722375199555.png
  • Populate your credentials to authenticate using Azure Active Directory (Entra ID).

  • Select <catalog>.<schema>.customers, click on load (<catalog> and <schema> are what you defined in the RLS notebook)
  • Build the visualization by dragging two card visualizations, a map visualization, and a table visualization into the report canvas.
  • Save the report, give it a name and Publish it to the Power BI service

    alex_phu_4-1722375497993.png

     

Power BI Service

Enable SSO access to the report and underlying data source.

  • In Power BI service, go to your workspace, and find the semantic model you just published, and click the three dots next to it, click on settings
    alex_phu_5-1722375708552.png
  • Expand Data source credentials, and then click Edit credentials.
  • On the configuration dialog, select Report viewers can only access this data source with their own Power BI identities using Direct Query, and then click Sign in.
    alex_phu_6-1722375752484.png
  • With this option selected, access to the data source is handled using DirectQuery and managed using the Microsoft Entra ID identity of the user who is accessing the report. If you don’t select this option, only you, as the user who published the report, have access to the Databricks data source.



Conclusion

Integrating Databricks on AWS with Power BI through Single Sign-On (SSO) using Microsoft Entra ID offers a robust solution for managing access to data in Databricks through Power BI seamlessly. By following the steps outlined in this guide, you can enhance security, simplify access management, and create a more efficient and unified data analytics workflow. The ability to leverage SSO in Power BI will streamline access control for your end-to-end data platform. This integration empowers your team to focus on deriving insights from data rather than managing two separate governance models in Databricks and Power BI. With the demo notebook and pbit file provided in the GitHub outlined in the Resources part of this guide below, you can recreate this SSO passthrough demo yourself easily and replicate it for your own access control use cases.

 

Resources