Here are some helpful tips that you might find useful:
Summary of Best Practices and Recommendations for App Whitelisting and Automated App Removal in Databricks
1. Overview and Whitelisting Strategy
To control costs and maintain governance over Databricks app usage in your workspace, the recommended approach is to implement a clear app whitelisting mechanism and automate removal of unapproved apps. The following best practices and guidance summarize the official recommendations and field experience for Databricks Apps.
What is "Whitelisting" in this Context? A "whitelist" is a defined set of Databricks apps that have been reviewed and approved by administrators for use in the workspace. Only apps on this list should be allowed to exist; all others should be flagged and optionally removed.
2. Storage and Management of the Whitelist
There are several practical options for maintaining your workspace's app whitelist:
- Workspace Table or Delta Table: Store a table with app names, owners, approval status, and other metadata. This can be referenced by an administrative notebook for checks and reporting.
- Configuration File: Use a workspace file (YAML, JSON, etc.) with the approved app names.
- Secrets: If you need to store sensitive information (like app IDs tied to privileged resources), you could store whitelist details in Databricks Secrets.
- Unity Catalog Table: For larger environments, a Unity Catalog managed table shared with admins for central control is ideal.
Choose the storage method that best fits your operational security requirements and maintainability preferences.
3. Automated Enforcement Workflow
A robust administrative notebook should:
- Enumerate all current apps in the workspace: Use the official Databricks APIs or SDK to list all app resources.
- Compare with the whitelist: Cross-reference the list of existing apps with your approved whitelist.
- Flag or Remove unapproved apps:
- Unapproved apps should be reported or, if desired, automatically removed.
- Build in logging/audit capabilities and a dry-run mode to help non-disruptively validate changes.
Here is a typical control flow:
```python
# Pseudocode outline (Python-based, can be adapted to Scala/Spark)
approved_apps = load_whitelist() # Load whitelist from table, secret, or file current_apps = databricks_admin_api.list_apps() # List all workspace apps
for app in current_apps:
if app.name not in approved_apps:
# Optional: log, notify, or tag before removal
databricks_admin_api.remove_app(app.id) # Remove unapproved app ``` Be sure to handle exceptions, permissions, and edge cases (apps in transient state, or deployed by critical users) as needed.
4. Permissions and Governance Recommendations
To enforce governance and prevent unwanted apps from being created: - Restrict โCAN MANAGEโ app permission: Only grant this to trusted administrators or peer-reviewed senior developers. - Restrict โCAN USEโ permission to only those groups or users who need access to a given app. - For OBO (on-behalf-of) apps, only enable this feature in trusted environments with peer-reviewed code and restrict additional scopes to the minimum needed.
5. Security, Auditing, and Compliance
Make use of Databricks audit logs: - Track permission changes on apps and who approved or made changes to the whitelist. - Setup workflows to log all admin actions, app creation, sharing, and deletion for compliance audits.
6. Environments and Promotion
- Maintain separate whitelists for dev, staging, and production environments.
- Use CI/CD and Databricks Asset Bundles (DABs) to promote only approved apps between environments.
7. Additional Best Practices
- Regularly review the whitelist and app logs to ensure consistency and compliance.
- Periodically audit installed apps to review cost and usage patterns.
- Isolate apps by workspaces/environments where appropriate to reduce risk surface.
- Document and peer-review all changes to app permissions and whitelist entries.
- Maintain least privilege both on OAuth scopes requested by apps, as well as Databricks resource permissions for app service principals.
Table: Implementation Checklist
Action |
Recommended Practice |
Whitelist Storage |
Workspace table, UC table, config file, or secret |
Enumerate Apps |
Use Databricks REST API or SDK |
Compare and Log Discrepancies |
Cross-reference with whitelist and log/messaging |
Remove Unapproved Apps |
Automated via admin notebook or DABs |
Governance Controls |
Restrict CAN MANAGE and CAN USE rights |
Audit and Review |
Use Databricks audit logs and periodic reviews |
Promotion Across Environments |
Deploy approved apps via CI/CD and DABs |
Documentation and Peer Review |
Require for changes to whitelist or app access |
Ongoing Security Assessment |
Utilize Databricks security center best practices |
Example Policy Logic
- Allow only whitelisted apps: Only apps listed in your whitelist are allowed to run or be present in the workspace.
- Alert or auto-remove all others: For any app detected that's not on the whitelist, admins are alerted, optionally with automatic removal.
- Restrict app modifications: Only those with "CAN MANAGE" access may modify or approve changes to the whitelist.