Hi,
As a starter you may want to try deploying the streamlit starter app from the app UI, this will show you the pattern to connect and pull data into your streamlit app. The following then gives some best practise guidelines on your questions:
1. Unity Catalog and Streamlit App Integration
Streamlit apps run as Databricks Apps within your workspace. To enable data access via Unity Catalog:
- Grant your app access to Unity Catalog assets, such as tables and volumes, by referencing them in your asset bundle (YAML) as
uc_securable, registered_model, schema, or volume types.
- For tabular data access, you must use a Unity Catalog-enabled compute (more below).
- If you need to access data in Unity Catalog Volumes (e.g., for file reads/writes), you can include a volume resource and grant the app read/write permission as needed in your bundle.
Note: As of late 2025, mounting UC volumes directly in Streamlit apps via /Volumes is not supported ("Can I mount a Unity Catalog volume in my app? Not today.") but you can use SDK or direct APIs to interact with UC assets.
2. Credential and Environment Management
Databricks Asset Bundles support robust environment (dev/qa/prod) isolation and credential management via configuration files and best practices:
a) Environment Separation with Targets
- Define separate targets in your
databricks.yml bundle for dev, qa, and prod. Each target can have its own workspace (host), compute resources, and variables.
- For each target, you typically specify the corresponding Databricks workspace URL and authentication method (see below for securing credentials).
b) Authentication: Where to Place Credentials
- DO NOT hardcode secrets or credentials in the bundle YAML.
- Use the workspace/profile mapping in the target to reference Databricks CLI profiles (
.databrickscfg), which are stored securely in your deployment environment (e.g., your CI/CD system). The CLI then picks up the appropriate host, token, client_id, etc., from the matching profile.
- For sensitive values like personal access tokens or client secrets, use environment variables (e.g.,
DATABRICKS_TOKEN, DATABRICKS_CLIENT_SECRET) set in your CI/CD environment or local dev machine, not in code or bundle files.
- You can define additional custom variables in your bundle for parameters (e.g., database names, resource paths), but never for secrets.
- For secret management within Databricks itself (e.g., API keys to call external services), define secret scopes in your bundle and reference them in your app configuration. Secret scopes can be workspace-native or, on Azure, backed by Key Vault.
Example Target Configuration
targets:
dev:
workspace:
host: https://<dev-workspace-url>
qa:
workspace:
host: https://<qa-workspace-url>
prod:
workspace:
host: https://<prod-workspace-url>
Your CI/CD pipeline (e.g., GitHub Actions, Azure DevOps) should set up the corresponding .databrickscfg files or environment variables for each environment.
3. Compute Selection: SQL Warehouse vs. All Purpose Compute
The right compute depends on your workload:
a) For Streamlit Apps Accessing Unity Catalog
- If your app primarily executes interactive SQL queries or serves BI-style workloads: Use a Databricks SQL Warehouse as the compute assigned in your asset bundle app resource. SQL Warehouses are optimized for concurrent, low-latency SQL workloads, and include serverless options for instant startup and cost efficiency.
- For workloads that require full Spark (PySpark, DataFrames, ML/AI workloads, or custom Python libraries): Use All Purpose Compute (cluster). If using clusters, make sure to enable Unity Catalog on the cluster with the proper access mode (Standard or Dedicated), as Unity Catalog tables require UC-enabled compute.
- All Purpose Compute is typically used for development and interactive analytic workloads.
- For production ETL/batch jobs, use Jobs Compute clusters.
- For accessing Unity Catalog volumes, models, or data lineage features: Both SQL Warehouses and UC-enabled clusters are supported as long as the relevant permissions and data governance modes are set.
Example Asset Bundle App Resource (for SQL Warehouse)
apps:
my_streamlit_app:
source_code_path: ./app
resources:
- name: prod-sqlwh
sql_warehouse:
id: ${var.sql_warehouse_id}
permission: CAN_USE
You can similarly define clusters (All Purpose Compute) and assign them via cluster_id.
4. Best Practices Summary
- Separate environments: Use
targets for dev, qa, prod; keep workspaces and catalogs isolated.
- Credential hygiene: Rely on CLI profiles and CI/CD environment variables for host/auth; never commit secrets. Use secret scopes for app-level secrets.
- Compute choice:
- SQL Warehouse: Best for BI, analytics, and concurrent SQL (especially serverless).
- All Purpose Compute: For interactive, Spark-native, or custom Python workloads.
- Streamlit-Unity Catalog integration: Reference Unity Catalog resources in your app bundle; ensure the compute is UC-enabled.
- Manage permissions strictly: Principle of least privilege in Unity Catalog, assign necessary grants to the app or responsible groups.