Authors: Abhishek Pratap (@aps) & Dipankar Kushari (@dkushari)
In this blog, we explore how to synchronize nested groups in Databricks from your organization’s identity provider - Azure Active Directory.
How to Sync nested Azure AD groups to Databricks
System for Cross-domain Identity Management, or SCIM, is an open standard that allows you to automate user provisioning in Databricks. SCIM lets you use an identity provider (IdP) to create users in Databricks, give them the proper level of access, and remove access (deprovision them) when they leave your organization or no longer need access to Databricks. You can use a SCIM provisioning connector in your IdP or invoke the Identity and Access Management SCIM APIs to manage provisioning. You can also use these APIs to manage identities in Databricks directly, without an IdP.
Azure Active Directory (Azure AD) is a cloud-based identity and access management service that enables your employees’ access and single sign-on to external resources, such as Microsoft 365, the Azure portal, and applications such as Databricks. You can set up provisioning to Databricks using Azure Active Directory (Azure AD) at the Databricks account level or at the Databricks workspace level.
Single sign-on (SSO), on the other hand, enables you to authenticate your users using your organization’s identity provider. If your identity provider supports the SAML 2.0 protocol (or, in the case of account-level SSO, the OIDC protocol), you can use Databricks SSO to integrate with your identity provider. SSO makes it easy to centrally manage access to Databricks resources and business applications instead of having to sign in to Databricks using separate user credentials. With SSO enabled, users can access Databricks with their corporate credentials. This delivers a better user experience without the need to manage separate sets of credentials.
One of the important tasks as a Databricks Administrator for your organization when configuring access in Databricks is to integrate and sync your corporate Users and Groups from Azure AD into your Databricks account.
Generally, the user and group mapping is well thought through and reflects the complex organizational structure. Often you may have nested groups (such as a department can have sub-departments, i.e., a parent group representing a department has a child group(s) representing sub-departments) defined in your organization and you need to bring these Users and Groups into a Databricks account with their hierarchical relationships maintained.
There are certain advantages of keeping Databricks group structure the same as your Azure AD group structure
FIG 1: Sample Azure AD Group Structure in large organizations
But there are a few challenges to provisioning nested Azure AD groups in a Databricks account.
These challenges limit an organization to sync their multiple levels of groups into Databricks as they need to restructure their Users and Groups for Databricks. Moreover, the nested groups may have a variable depth which needs a flexible solution to traverse the nested groups in a recursive manner such that a parent group is synced along with all its direct members and all of their child members.
If you have nested Azure AD groups in your organization that you want to sync with your Databricks account then you can follow this post, where we show you how you can seamlessly Sync nested Azure AD groups to Databricks with a few lines of Python code and overcome the limitations in core Azure AD sync infrastructure.
To sync nested groups from your Azure Active Directory to your Databricks account, we have put together a solution described below. This utility allows you to sync Users and Groups, including nested Users and Groups, from Azure AD to Databricks. The code for the solution is available on the GitHub repository.
Note: This is a custom solution (provided as-is) that replaces the Microsoft Enterprise Application.
Before you are ready to run the steps mentioned below, acquire the code and provision the required compute to run it.
The connector performs the actions shown in the diagram below.
FIG 2: Sync nested groups into Databricks
Details of each step on configuration are mentioned below.
Note - You will need to register an application in Azure Active Directory to enable user authentication
Follow the steps below to do the same:
FIG 3: Sample Azure AD Group Structure in large organizations
Note - You'll need this key in your code's configuration files later. This key value will not be displayed again and is not retrievable by any other means, so make sure to save this key from the Azure portal before navigating away to any other screen or blade.
After App registration, the User needs Databricks SCIM details and prepare a config file. The template is here. User needs to populate:
by extending the python app or reuse the PYPI utility
Detailed code can be found at this github location.
To run the solution, follow the steps mentioned below:
You can run this as a Standalone Python app. Follow the instructions below.
pip install nestedaaddb
from nestedaaddb.nested_groups import SyncNestedGroups
sn = SyncNestedGroups()
sn.loadConfig(<<Path of config.cfg>>")
sn.sync(<<Top level Group>>,<<Is Dry Run>>)
<<Top level Group>> : Denotes the top level group in AAD to sync from
<<Is Dry Run>> : Denotes if it is Dry Run.It will only print the Users and Groups to be added but will not create them.
Source Code : Github
Warning: The provided code is offered on an "as-is" basis without any guarantee or warranty. It is strongly recommended to exercise caution and thoroughly test the code in your test environment before using it in a production environment.
You can view your group and its members(i.e. Users and Groups) in the account console groups tab. An example of such a nested group synced from Azure AD is shown below, where the parent group has another group called child as its member.
In this blog, we explored how to synchronize nested groups in Databricks from your organization’s identity provider - Azure Active Directory. Try this solution today to sync your nested groups from Azure AD into your Databricks Account. You can refer to this video for step by step guidance on how to sync nested Azure AD groups to Databricks.
Here are some related links for your reference -
Databricks Workspace Administration – Best Practices for Account, Workspace and Metastore Admins
Manage users, service principals, and groups
Call to Action
Try out this solution today to sync your nested Users and Groups from Azure AD into your Databricks Account. You can refer to this video for step by step guidance on how to sync nested Azure AD groups to Databricks.
Related Blogs
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.