cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
HariSelvarajan
Databricks Employee
Databricks Employee

What is Single Sign-On (SSO)?

The rise of Software as a Service (Saas) has proliferated the adoption of specialized software in todays modern world. This resulted in enterprises adopting many Saas services to meet their business needs . Using these services also requires an enterprise putting a lot of trust on these services.  Enterprises want to be able to control authentication to various Saas applications using their own preferred Identity Management Platform.

Single sign-on (SSO) is an authentication method that enables users to securely authenticate with multiple applications, using just one set of credentials. Instead of individual applications managing user identity and passwords in their respective Identity Provider (IDP) takes the responsibility of managing user credentials in one place and integrating with individual applications to centrally authenticate users and provide a single sign-on experience. Besides managing passwords centrally, SSO also helps in leveraging the  security capabilities provided by the IDP namely:

  • Multifactor Authentication (MFA)
  • Risk based access control (invoke additional identification based on user behavior)
  • Conditional access policy (ex: location based access)

59% of users use similar passwords on multiple apps and on avg 68% of users switch between 10 applications, hence having SSO enabled on your application greatly improves security and user productivity.

There are various industry standard protocols developed which help implement SSO:

  • OIDC - OpenID Connect
  • SAML - Security Assertion Markup Language
  • WS-Fed - Web Services Federation
  • LDAP - Lightweight Directory Access Protocol

The remainder of this blog speaks about how OIDC works and how it is implemented in Databricks with various IDPs.

 

Introduction to OIDC

OpenID Connect (OIDC) is an authentication protocol that works on top of the OAuth 2.0 framework. OAuth 2.0 is designed only for authorization, for granting access to data and features from one application to another. OpenID Connect (OIDC) is a thin layer that sits on top of OAuth 2.0 that adds profile information about the user who is logged on. It provides the application or service with the relevant information about the user. 

The purpose of OIDC is for users to provide one set of credentials and access multiple sites. Each time users sign onto an application or service using OIDC, they are redirected to their Authorization Server (or Identity Provider), where they authenticate and are then redirected back to the application or service. It delegates user authentication to the identity provider that hosts the user account and authorizes third-party applications to access the users account. 

Components in the OIDC protocol

Below is the list of the various components and the role they play in the OIDC protocol:

 

 

Component

Role

1

Resource Owner

It is the user who owns the identity which is stored in the Identity Provider. Ex: a user who has his personal details stored in google. (Remember signing to websites using google email, the client access your info from google for which the user is the resource owner)

2

Client

The application which wants to connect to the Identity provider and access users profile information for authentication. Ex: Databricks account console wants to access user info from Azure AD

3

Authorization Server or Identity Provider 

The system that has information of the resource owner. The Client connects to the identity provider to access the user information. Ex: Okta, Azure AD, Google Identity

5

Redirect URL

Once the identity provider authenticates the user (resource owner), it redirects the user back to this URL.  This URL is provided by the client to the IDP.

6

Scope

The permissions which the client wants on behalf of the user are grouped into scopes. For authentication, the client requests a “profile” scope which contains the user profile info (name, id, email etc) stored in the identity provider. There are other types of scopes used for authorization purpose to control the level of access to the resource(OAuth)

7

Client ID and Secret

This is the ID and secret used by the client to register and identity itself with the identity provider. The identity provider validates using the ID and secret that the client is indeed what it claims to be before sharing the user info.

9

Authorization Code 

A temporary code which is sent back from the identity provider to the user browser. The browser sends this information to the client which then shares it with the identity provider to get the access token and ID token.

10

Access Token

The token used by the client which is used to communicate with the Resource Server. This is like a badge or key card that gives the Client permission to request data or perform actions with the Resource Server on your behalf.

11

ID Token (JWT)

An ID Token contains the information of the user which can be used as proof of the user authentication. ID token is encoded as a JSON Web Token, or JWT and makes sure it comes from the issuer and not tampered with by anyone. The data inside the ID Token are called claims.

12

Issuer URL

Identity provider  OpenID configuration document. This gives information of the various endpoint available from the IDP which the client uses to implement the OIDC protocol



OIDC flow

Now that we understand the various terminologies, lets understand how an OIDC authentication flow works.

Establishing Trust

Before the client can use the identity provider to access user information, there needs to be a trust relationship established between the client and the identity provider. The identity provider does not share user information to any application without knowing it first. The client registers itself first with the identity provider, and the identity provider shares a client ID and secret to the client. The client ID and secret are used to identify the client and the secret is information shared securely between the client and the identity provider. Everytime the client requests for a users information, it shares its client ID and secret to identify itself to the identity provider.

Grant Types

There are different ways a client can choose to interact with the identity provider, these are referred as grant types. Choice of grant types depends on the interaction between client,user and the IDP, medium used to authenticate (browser or other smart devices like tv) and if the authentication is between User to Machine(U2M) or Machine to Machine(M2M)

  • Implicit flow

This is used by applications that have no “back-end” logic on the web server, like a Javascript app. All communication between client and the identity provider can be accessed from the browser based tool.

  • Authentication flow

This is used by apps that have a back-end that can communicate with the IDP away from prying eyes. Here the browser only receives the authorization code and actual access token and ID token requests are made between the client backend and the identity provider and not exposed to the browser. This is more secure than Implicit flow.

  • Client credential grant

This is the simplest grant type and is used for server to server communication. Not involving the user, this is for the client to access resources under its direct control rather than of users. Ex. application calling other APIs to access data.

  • Device flow

This type is designed for browserless devices that are input constrained and unable to capture user credentials securely. This flow outsources the user authentication to an external device.

Ex: Apple tv apps which need your phone to authenticate using a code.

 

Steps involved

Let us understand the steps involved when a user (Resource owner) accesses an application (Client) that has enabled SSO using an Identity Provider. 

  1. The user accesses the Client application.
  2. The Client redirects the user to the Identity provider. The client sends its client id (to identify itself to the IDP), redirect URL, response type and scope information.
  3. The Identity Provider authenticates the user. The IDP can choose different ways to authenticate the user: MFA, Yubikey, passwords etc.
  4. The identity provider redirects the user back to the client using the redirect URL and includes the authorization code.
  5. The users browser shares the authorization code to the client application. The authorization code does not contain user information or any other access information, just the authorization code used by the client in the next steps. 
  6. The Client shares the authorization code to the identity provider directly and sends its client ID and secret to authenticate itself to the identity provider. This route does not involve the user or their browser session.
  7. The identity provider validates the authorization code and issues the access token and ID token back to the client.
  8. The Client retrieves the ID token and uses that to establish the identity of the user.

How Databricks supports OIDC SSO at Account Level

Databricks supports SSO set up at an account console level as well as at individual workspace level. With the introduction of UnifiedLogin, customers need to set up SSO only at account console level and SSO is then enabled on all workspaces which are set up with identity federation. Account console supports both OIDC and SAML based SSO. Databricks recommends using OIDC for SSO.

In the context of OIDC, Databricks account console acts as the client which wants to set up SSO using an identity provider and authenticate users using that. The first step in enabling SSO on the account console is to set up the trust relationship between the account console client app and the identity provider. 

Establishing Trust Relationship

The first step is to create a client application in the chosen identity provider. When completed you get the following 3 properties.

  • Client ID: This is the unique identifier of the application in the identity provider
  • Client secret: This is the secret password generated. The account console uses client ID and secret to identify itself with the identity provider
  • OpenId issuer URL: The URL at which your identity-providers OpenID Configuration Document can be found. That OpenID Configuration Document must be found in {issuer-url}/.well-known/openid-configuration. This URL contains the following:
    • authorization_endpoint: The URL where end-users authenticate. This is used by the account console to redirect users to the right place. 
    • claims_supported: An array containing the claims supported
    • issuer: The identifier of the OIDC provider
    • jwks_uri: Where the provider exposes public keys that can be used to validate tokens. When the  IdP sends back the tokens to the account console (client), it signs the content with its own private key. The client validates the signature using the public key available from the jwks_uri and confirms the token came from the right identity provider. 
    • token_endpoint: The URL that apps can use to fetch tokens. The account console uses this endpoint to fetch the ID and access token from the identity provider
  • The redirect URL is also given while creating the application. This can be obtained from the account console under Settings-> Single Sign-On->OIDC Connect. The IDP sends the users browser back to this URL after authentication successfully. This URL is used by the application to retrieve the authorization code

Once the above information is captured, this can then be registered in the account console under Settings-> Single Sign-On->OIDC Connect as shown below.

HariSelvarajan_0-1693329917498.png

 

On click  of the ‘Enable SSO’ option, the trust relationship is established between the databricks account console and the identity provider. The account console understands how to redirect users to the idp, retrieve authorization code, request for ID token and validate the signature. The identity provider understands how to identify the account console client.



SSO flow

The diagram below illustrates the steps followed when a user authenticates to the account console using OIDC SSO.

HariSelvarajan_1-1693329917664.png

 

Reference to OIDC in account console with IDPs

The table below covers all the options available to enable OIDC SSO on the account console on the 3 cloud platforms. 

 

Cloud

Identity Provider

Remarks

Reference

Azure

Azure AD

SSO is prebuilt in Azure Databricks with Azure AD and no additional steps are required from customers. If customers have non AD as their Identity provider, they can use 

 identity federation service (SAML and WS-Fed compatible)  like Okta or Ping on top of AAD for “external” identity management but azure databricks uses Azure AD for access and ID token

Azure Identity Federation

GCP

GSuite or Google Cloud Identity Account

Databricks account console users authenticate with their Google Cloud Identity account. Like in Azure Databricks, customers can configure their Google Cloud Identity account to federate with an external SAML 2.0 Identity Provider (IDP) like Azure AD, Okta, Ping, and other IDPs. However, Databricks only interacts directly with the Google Identity Platform APIs.

SSO

AWS

Azure AD

Use Azure AD as the identity provider to authentication in AWS Databricks account console

SSO Azure AD

 

Okta

Use Okta as the identity provider to authentication in AWS Databricks account console

SSO Okta

 

OneLogin

Use OneLogin as the identity provider to authentication in AWS Databricks account console

SSO OneLogin

 

Others (Google cloud identity, Ping, KeyCloak etc)

Any identity provider which supports OIDC or SAML can be integrated to databricks on aws.

SSO IDP

Conclusion

This blog explains the basics of OIDC protocol for authentication and how it is used to enable Single Sign-On on a Databricks account console on various cloud platforms. SSO using the Identity provider gives customers the option to choose their own identity provider and various tools it provides to securely authenticate their users and meet their enterprise requirements. 

The next series of the article talks about how SSO can be enabled using the SAML protocol.