The rise of Software as a Service (Saas) has proliferated the adoption of specialized software in todays modern world. This resulted in enterprises adopting many Saas services to meet their business needs . Using these services also requires an enterprise putting a lot of trust on these services. Enterprises want to be able to control authentication to various Saas applications using their own preferred Identity Management Platform.
Single sign-on (SSO) is an authentication method that enables users to securely authenticate with multiple applications, using just one set of credentials. Instead of individual applications managing user identity and passwords in their respective Identity Provider (IDP) takes the responsibility of managing user credentials in one place and integrating with individual applications to centrally authenticate users and provide a single sign-on experience. Besides managing passwords centrally, SSO also helps in leveraging the security capabilities provided by the IDP namely:
59% of users use similar passwords on multiple apps and on avg 68% of users switch between 10 applications, hence having SSO enabled on your application greatly improves security and user productivity.
There are various industry standard protocols developed which help implement SSO:
The remainder of this blog speaks about how OIDC works and how it is implemented in Databricks with various IDPs.
OpenID Connect (OIDC) is an authentication protocol that works on top of the OAuth 2.0 framework. OAuth 2.0 is designed only for authorization, for granting access to data and features from one application to another. OpenID Connect (OIDC) is a thin layer that sits on top of OAuth 2.0 that adds profile information about the user who is logged on. It provides the application or service with the relevant information about the user.
The purpose of OIDC is for users to provide one set of credentials and access multiple sites. Each time users sign onto an application or service using OIDC, they are redirected to their Authorization Server (or Identity Provider), where they authenticate and are then redirected back to the application or service. It delegates user authentication to the identity provider that hosts the user account and authorizes third-party applications to access the users account.
Below is the list of the various components and the role they play in the OIDC protocol:
It is the user who owns the identity which is stored in the Identity Provider. Ex: a user who has his personal details stored in google. (Remember signing to websites using google email, the client access your info from google for which the user is the resource owner)
The application which wants to connect to the Identity provider and access users profile information for authentication. Ex: Databricks account console wants to access user info from Azure AD
Authorization Server or Identity Provider
The system that has information of the resource owner. The Client connects to the identity provider to access the user information. Ex: Okta, Azure AD, Google Identity
Once the identity provider authenticates the user (resource owner), it redirects the user back to this URL. This URL is provided by the client to the IDP.
The permissions which the client wants on behalf of the user are grouped into scopes. For authentication, the client requests a “profile” scope which contains the user profile info (name, id, email etc) stored in the identity provider. There are other types of scopes used for authorization purpose to control the level of access to the resource(OAuth)
Client ID and Secret
This is the ID and secret used by the client to register and identity itself with the identity provider. The identity provider validates using the ID and secret that the client is indeed what it claims to be before sharing the user info.
A temporary code which is sent back from the identity provider to the user browser. The browser sends this information to the client which then shares it with the identity provider to get the access token and ID token.
The token used by the client which is used to communicate with the Resource Server. This is like a badge or key card that gives the Client permission to request data or perform actions with the Resource Server on your behalf.
ID Token (JWT)
An ID Token contains the information of the user which can be used as proof of the user authentication. ID token is encoded as a JSON Web Token, or JWT and makes sure it comes from the issuer and not tampered with by anyone. The data inside the ID Token are called claims.
Identity provider OpenID configuration document. This gives information of the various endpoint available from the IDP which the client uses to implement the OIDC protocol
Now that we understand the various terminologies, lets understand how an OIDC authentication flow works.
Before the client can use the identity provider to access user information, there needs to be a trust relationship established between the client and the identity provider. The identity provider does not share user information to any application without knowing it first. The client registers itself first with the identity provider, and the identity provider shares a client ID and secret to the client. The client ID and secret are used to identify the client and the secret is information shared securely between the client and the identity provider. Everytime the client requests for a users information, it shares its client ID and secret to identify itself to the identity provider.
There are different ways a client can choose to interact with the identity provider, these are referred as grant types. Choice of grant types depends on the interaction between client,user and the IDP, medium used to authenticate (browser or other smart devices like tv) and if the authentication is between User to Machine(U2M) or Machine to Machine(M2M)
This is used by apps that have a back-end that can communicate with the IDP away from prying eyes. Here the browser only receives the authorization code and actual access token and ID token requests are made between the client backend and the identity provider and not exposed to the browser. This is more secure than Implicit flow.
This is the simplest grant type and is used for server to server communication. Not involving the user, this is for the client to access resources under its direct control rather than of users. Ex. application calling other APIs to access data.
This type is designed for browserless devices that are input constrained and unable to capture user credentials securely. This flow outsources the user authentication to an external device.
Ex: Apple tv apps which need your phone to authenticate using a code.
Let us understand the steps involved when a user (Resource owner) accesses an application (Client) that has enabled SSO using an Identity Provider.
Databricks supports SSO set up at an account console level as well as at individual workspace level. With the introduction of UnifiedLogin, customers need to set up SSO only at account console level and SSO is then enabled on all workspaces which are set up with identity federation. Account console supports both OIDC and SAML based SSO. Databricks recommends using OIDC for SSO.
In the context of OIDC, Databricks account console acts as the client which wants to set up SSO using an identity provider and authenticate users using that. The first step in enabling SSO on the account console is to set up the trust relationship between the account console client app and the identity provider.
The first step is to create a client application in the chosen identity provider. When completed you get the following 3 properties.
Once the above information is captured, this can then be registered in the account console under Settings-> Single Sign-On->OIDC Connect as shown below.
On click of the ‘Enable SSO’ option, the trust relationship is established between the databricks account console and the identity provider. The account console understands how to redirect users to the idp, retrieve authorization code, request for ID token and validate the signature. The identity provider understands how to identify the account console client.
The diagram below illustrates the steps followed when a user authenticates to the account console using OIDC SSO.
The table below covers all the options available to enable OIDC SSO on the account console on the 3 cloud platforms.
SSO is prebuilt in Azure Databricks with Azure AD and no additional steps are required from customers. If customers have non AD as their Identity provider, they can use
identity federation service (SAML and WS-Fed compatible) like Okta or Ping on top of AAD for “external” identity management but azure databricks uses Azure AD for access and ID token
GSuite or Google Cloud Identity Account
Databricks account console users authenticate with their Google Cloud Identity account. Like in Azure Databricks, customers can configure their Google Cloud Identity account to federate with an external SAML 2.0 Identity Provider (IDP) like Azure AD, Okta, Ping, and other IDPs. However, Databricks only interacts directly with the Google Identity Platform APIs.
Use Azure AD as the identity provider to authentication in AWS Databricks account console
Use Okta as the identity provider to authentication in AWS Databricks account console
Use OneLogin as the identity provider to authentication in AWS Databricks account console
Others (Google cloud identity, Ping, KeyCloak etc)
Any identity provider which supports OIDC or SAML can be integrated to databricks on aws.
This blog explains the basics of OIDC protocol for authentication and how it is used to enable Single Sign-On on a Databricks account console on various cloud platforms. SSO using the Identity provider gives customers the option to choose their own identity provider and various tools it provides to securely authenticate their users and meet their enterprise requirements.
The next series of the article talks about how SSO can be enabled using the SAML protocol.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.