cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Azure databricks streamlit app unity catalog access

ndw
New Contributor II

Hi all

I am developing a Databricks app. I will use Databricks asset bundles for deployment.

How can I connect Databricks streamlit app into Databricks unity catalog?

Where should I define the credentials? (Databricks host for dev, qa and prod environments, users, passwords etc)

Which compute should I choose? (SQL Warehouse, All Purpose Compute etc)

Thanks

 

1 REPLY 1

emma_s
Databricks Employee
Databricks Employee

Hi, 

As a starter you may want to try deploying the streamlit starter app from the app UI, this will show you the pattern to connect and pull data into your streamlit app. The following then gives some best practise guidelines on your questions:

1. Unity Catalog and Streamlit App Integration

Streamlit apps run as Databricks Apps within your workspace. To enable data access via Unity Catalog:

  • Grant your app access to Unity Catalog assets, such as tables and volumes, by referencing them in your asset bundle (YAML) as uc_securable, registered_model, schema, or volume types.
  • For tabular data access, you must use a Unity Catalog-enabled compute (more below).
  • If you need to access data in Unity Catalog Volumes (e.g., for file reads/writes), you can include a volume resource and grant the app read/write permission as needed in your bundle.

Note: As of late 2025, mounting UC volumes directly in Streamlit apps via /Volumes is not supported ("Can I mount a Unity Catalog volume in my app? Not today.") but you can use SDK or direct APIs to interact with UC assets.

 


2. Credential and Environment Management

Databricks Asset Bundles support robust environment (dev/qa/prod) isolation and credential management via configuration files and best practices:

a) Environment Separation with Targets

  • Define separate targets in your databricks.yml bundle for dev, qa, and prod. Each target can have its own workspace (host), compute resources, and variables.
  • For each target, you typically specify the corresponding Databricks workspace URL and authentication method (see below for securing credentials).

b) Authentication: Where to Place Credentials

  • DO NOT hardcode secrets or credentials in the bundle YAML.
  • Use the workspace/profile mapping in the target to reference Databricks CLI profiles (.databrickscfg), which are stored securely in your deployment environment (e.g., your CI/CD system). The CLI then picks up the appropriate host, token, client_id, etc., from the matching profile.
  • For sensitive values like personal access tokens or client secrets, use environment variables (e.g., DATABRICKS_TOKEN, DATABRICKS_CLIENT_SECRET) set in your CI/CD environment or local dev machine, not in code or bundle files.
  • You can define additional custom variables in your bundle for parameters (e.g., database names, resource paths), but never for secrets.
  • For secret management within Databricks itself (e.g., API keys to call external services), define secret scopes in your bundle and reference them in your app configuration. Secret scopes can be workspace-native or, on Azure, backed by Key Vault.
Example Target Configuration
targets:
  dev:
    workspace:
      host: https://<dev-workspace-url>
  qa:
    workspace:
      host: https://<qa-workspace-url>
  prod:
    workspace:
      host: https://<prod-workspace-url>
yaml
 

Your CI/CD pipeline (e.g., GitHub Actions, Azure DevOps) should set up the corresponding .databrickscfg files or environment variables for each environment.

 


3. Compute Selection: SQL Warehouse vs. All Purpose Compute

The right compute depends on your workload:

a) For Streamlit Apps Accessing Unity Catalog

  • If your app primarily executes interactive SQL queries or serves BI-style workloads: Use a Databricks SQL Warehouse as the compute assigned in your asset bundle app resource. SQL Warehouses are optimized for concurrent, low-latency SQL workloads, and include serverless options for instant startup and cost efficiency.
  • For workloads that require full Spark (PySpark, DataFrames, ML/AI workloads, or custom Python libraries): Use All Purpose Compute (cluster). If using clusters, make sure to enable Unity Catalog on the cluster with the proper access mode (Standard or Dedicated), as Unity Catalog tables require UC-enabled compute.
    • All Purpose Compute is typically used for development and interactive analytic workloads.
    • For production ETL/batch jobs, use Jobs Compute clusters.
  • For accessing Unity Catalog volumes, models, or data lineage features: Both SQL Warehouses and UC-enabled clusters are supported as long as the relevant permissions and data governance modes are set.
Example Asset Bundle App Resource (for SQL Warehouse)
apps:
  my_streamlit_app:
    source_code_path: ./app
    resources:
      - name: prod-sqlwh
        sql_warehouse:
          id: ${var.sql_warehouse_id}
          permission: CAN_USE
yaml
 

You can similarly define clusters (All Purpose Compute) and assign them via cluster_id.

 


4. Best Practices Summary

  • Separate environments: Use targets for dev, qa, prod; keep workspaces and catalogs isolated.
  • Credential hygiene: Rely on CLI profiles and CI/CD environment variables for host/auth; never commit secrets. Use secret scopes for app-level secrets.
  • Compute choice:
    • SQL Warehouse: Best for BI, analytics, and concurrent SQL (especially serverless).
    • All Purpose Compute: For interactive, Spark-native, or custom Python workloads.
  • Streamlit-Unity Catalog integration: Reference Unity Catalog resources in your app bundle; ensure the compute is UC-enabled.
  • Manage permissions strictly: Principle of least privilege in Unity Catalog, assign necessary grants to the app or responsible groups.