Databricks Apps is the fastest and most secure way to build data and AI applications on the Databricks Data Intelligence Platform.
As you build and scale applications that rely on data stored in Databricks, implementing proper governance mechanisms, including access controls, becomes crucial for security and compliance.
Databricks Apps supports two complementary authorization models for securely interacting with Databricks resources: service principal authorization and on-behalf-of-user authorization (OBO).
This blog post explains the different authorization models available for Databricks Apps and demonstrates how you can leverage OBO authorization to implement fine-grained permissions in your Databricks Apps.
Let's start with a quick overview of the two authorization models.
By default, each Databricks App is associated with a service principal. A service principal is an identity that provides API-only access to Databricks resources. You can grant and restrict a service principal’s access to resources in the same way as you can a Databricks user.
An app’s service principal is uniquely associated with that app and cannot be reused across apps. This ensures auditability and that an app’s actions can always be traced back to a specific instance of the app.
During or after app creation, you assign the permissions to this service principal that are needed to power app functionality, such as reading data from a Unity Catalog table using a Databricks SQL warehouse.
When a user accesses the app, the service principal’s permissions are used to read the data.
This approach is straightforward but has a significant limitation: all users effectively share the same permissions within the app, regardless of their individual Unity Catalog permissions.
Consider the following scenario. Your Databricks App queries a table with global customer information available in Unity Catalog. Each user of the app should only be able to view data from their own country to satisfy data privacy regulations. In Unity Catalog, you can filter sensitive table data using row filters and column masks. However, as all users are sharing the service principal’s permissions, you cannot enforce these fine-grained user-level permissions.
There are two possible workarounds. The first is to implement custom logic in the app based on the individual user’s permissions. But this approach ends up duplicating permission logic that may already exist in Unity Catalog and adds responsibility for keeping permissions in sync. It also adds significant complexity to the app and is, therefore, error-prone and difficult to audit. When a user's permissions change in Unity Catalog, your app won't automatically reflect those changes.
The second workaround is to create a separate instance of the app per set of unique permissions. While this does offer clear permission isolation, it’s not the most scalable approach, as in complex cases, you might end up with an app instance per user. This approach also suffers from the same issue of creating a mismatch between the source of truth permissions in Unity and the app.
Instead, let’s see how you can use OBO authorization to make use of Unity Catalog as the single governance layer across your data in Databricks.
With on-behalf-of-user authentication, your app performs actions on behalf of the individual user using the app instead of relying on the app's service principal. All actions are performed using the user’s existing Unity Catalog permissions, for example, for interacting with Databricks resources like SQL warehouses and accessing data like tables.
This also means that fine-grained permissions, such as row filters and column masks, are applied when a user queries data through your app. As a consequence, you do not have to implement custom permissions logic in your app, which reduces the risk of permissions getting out of sync with Unity Catalog or bugs leading to unwanted data exposure.
Service principal authorization and OBO authorization are complementary and it’s expected that many apps will use both authorization models.
It’s best to use service principal authorization when your app needs its own identity for tasks requiring uniform access or for app-specific operations independent of user context. This can entail reading global app configuration settings or metadata, writing custom logs or audit events, displaying universally accessible information such as app usage instructions or usage stats, calling external services, or performing background maintenance tasks.
In contrast, OBO authorization is most suitable when your app must respect the individual user's existing Unity Catalog permissions. This applies to any interactions with Unity Catalog objects and Databricks resources, such as, for instance, reading from and writing to tables or volumes, interacting with compute resources such as SQL warehouses or clusters, calling machine learning models, or triggering workflows.
Let's walk through a practical example to demonstrate the difference between these authentication methods.
First, let's create a sample table with customer data using the bakehouse sample dataset present in each Databricks workspace. Run the following command in the Databricks SQL Editor:
CREATE TABLE main.sandbox.sales_customers CLONE samples.bakehouse.sales_customers
As the original table creator, you are the table owner and automatically have all the permissions needed to read its data.
We provide a minimal application for testing both authorization options. To set it up, clone the sample GitHub repository, load it as a Git folder in your Databricks workspace, and deploy the auth-demo sample as a custom app. The GitHub repository has detailed deployment instructions.
Initially, the app service principal does not have any permissions. You can find the name of your app’s unique service principal under the Authentication tab in the Databricks Apps UI.
To finish setting up the demo application, provide the app service principal with two permissions:
Afterwards, you should be able to query data from the sample table with both authorization options presented by the demo application. Both queries should have an identical result and return 300 rows.
Now, we’ll add a row filter function to our sample table to help us verify whether user-specific permissions are being applied correctly.
First, find out your username by running the following command in the SQL editor:
SELECT user();
To create a row filter function, replace name@example.com in the following statement with your own username:
CREATE OR REPLACE FUNCTION main.sandbox.country_filter(country STRING)
RETURN IF(user() = 'name@example.com', country='Japan', True);
This function returns:
Apply this row filter to the sample table:
ALTER TABLE main.sandbox.sales_customers SET ROW FILTER main.sandbox.country_filter ON (country);
When the demo application queries data using service principal authorization, it uses code similar to the following:
from databricks import sql
from databricks.sdk.core import Config
cfg = Config()
conn = sql.connect(
server_hostname=cfg.host,
http_path="<your-warehouse-http-path>",
credentials_provider=lambda: cfg.authenticate,
)
query = f"SELECT * FROM main.sandbox.sales_customers LIMIT 1000"
with conn.cursor() as cursor:
cursor.execute(query)
df = cursor.fetchall_arrow().to_pandas()
print(df.head())
conn.close()
There are two key elements to this code:
Let’s query our sample table in the demo application with service principal authorization:
Because the row filter applies only to you and not the service principal, all 300 rows are being returned. If you were to use service principal authorization for your application, all users accessing the app would also be able to see the full table data.
The demo application can also query data using OBO authorization. Both service principal and OBO authorization can co-exist in the same application. For OBO authorization, it uses the following code:
from databricks import sql
from databricks.sdk.core import Config
from flask import request
cfg = Config()
headers = request.headers
user_token = headers.get("X-Forwarded-Access-Token")
conn = sql.connect(
server_hostname=cfg.host,
http_path="<your-warehouse-http-path>",
access_token=user_token
)
query = f"SELECT * FROM main.sandbox.sales_customers LIMIT 1000"
with conn.cursor() as cursor:
cursor.execute(query)
df = cursor.fetchall_arrow().to_pandas()
print(df.head())
conn.close()
The key difference is the access_token being used instead of the credentials_provider. The databricks-sql-connector uses this provided token to authenticate the connection to the SQL Warehouse as the user associated with the token. No service principal client ID or secret is involved in this specific connection.
Any queries executed over this conn object will be subject to the permissions of the user identified by the obo_token. The user must have the necessary privileges (e.g., CAN_USE on the warehouse, SELECT on the table) for the query to succeed.
If you run the query using OBO authorization in the demo application, the row filter will now only return records where the country is Japan, enforcing our fine-grained permissions:
OBO authorization is a powerful feature for leveraging users’ existing Unity Catalog permissions through a Databricks App.
However, one complexity that needs to be managed with OBO authorization is the concept of privilege escalation. Privilege escalation in this case refers to the app being able to authorize using the user’s credentials to resources that it would not have access to otherwise.
In the worst case, a malicious or careless app could use the user’s credentials in ways that weren’t intended. To prevent this, OBO authorization provides a mechanism to limit the scope of privilege escalation to specific subsets of services or APIs that are explicitly granted by the user of the app.
This concept, called downscoping, uses the pre-defined authorization scopes that restrict the actions an app can perform on the user’s behalf.
When creating or editing an app with OBO authorization enabled, you declare its minimum required scopes. For instance, the sql scope allows querying SQL warehouses and accessing data governed by Unity Catalog permissions associated with that scope.
By default, every app can use the iam.access-control:read and iam.current-user:read scopes to retrieve information about app users. These scopes are required to support on-behalf-of-user authorization in an app.
As a security best practice, follow the principle of least privilege and add only the scopes required by your app. You can monitor the authorization scopes requested by an app under the Authorization tab in the Databricks UI.
When a user first accesses an app that has OBO enabled, a consent process is triggered. The user is presented with a consent screen and must explicitly grant permission for the app to act within the requested scopes. This ensures awareness of how their permissions will be used.
Once consent is granted, the app’s effective OBO permissions are downscoped. It can only operate within the boundaries of the declared and consented scopes.
For example, if an app only has consent for the sql scope, it cannot use the user's identity to interact with Model Serving endpoints via OBO, even if the user has those permissions elsewhere.
This downscoping process allows Databricks Apps to securely leverage individual user permissions for fine-grained access control while simultaneously preventing the app from overreaching its intended functional boundaries.
OBO authorization makes it easy to build secure, permission-aware Databricks Apps with Unity Catalog as the central point of governance.
Use service principal authorization when your app performs background tasks or when all users should have the same, app-defined view of the data or resources. Use OBO authorization when your app needs to respect individual user permissions defined in Unity Catalog, ensuring users only see the data they are authorized to access.
Leveraging Unity Catalog's existing permission model enforces consistent access controls across data and AI use cases, provides a seamless user experience, and reduces application complexity.
To get started with OBO authorization in your own Databricks Apps, refer to the documentation and the code examples provided in the GitHub repository associated with this blog post.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.