cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Lakeventory: Automated Asset Discovery for Databricks Workspaces

felipemoz
New Contributor
 
Lakeventory is an open-source inventory tool that automatically discovers and catalogs everything in your Databricks workspaces in minutes.

 

What It Collects?
- Workspace: notebooks, files, directories
- Compute: jobs, clusters, instance pools, policies
- SQL: warehouses, dashboards, queries, alerts, pipelines
- ML: experiments, registered models, versions
- Data: Unity Catalog (catalogs, schemas, tables, volumes), external locations
- Security: secret scopes, tokens, IP access lists, identities
- Repos: Git repositories and credentials
- Delta Sharing: shares, recipients, providers
- Serving: endpoints, vector search, online tables
Exports to Excel or Markdown with workspace metadata auto-detected.

 

Quick Start
 
make inventory​
 
Use Cases
1. Compliance and Audit: Generate Excel reports in under 5 minutes for SOC 2, HIPAA, or internal audits. Share spreadsheets with auditors without manual collection.
2. Migration Planning: Run inventory on both source and target workspaces. Compare the files to create migration checklists and identify dependencies.
3. Cost Optimization: Export to Excel and filter by resource type to find unused warehouses, idle clusters, or orphaned resources. Teams often find 30-40% of resources can be decommissioned.
4. Change Detection: The report shows only added, removed, or modified resources since the last run. Perfect for weekly change reviews or security monitoring. (even having it on git)
5. Scheduled Monitoring: Run with Docker for automated daily snapshots. First run does full inventory, subsequent runs use incremental mode for speed.

 

Key Features
Speed: Inventory a typical workspace in 2-5 minutes
Security: Supports Service Principal (recommended), PAT, or Basic Auth. Auto-detects authentication from environment.
Zero Configuration: No config files required. Just set environment variables and run.
Production Ready: Docker support, Makefile automation, 42 passing tests, comprehensive documentation.

 

Benefits
After deploying Lakeventory, teams report:
- 80% faster audit preparation
- 30-40% cost reduction from identifying unused resources
- Saving 5-10 hours per week on manual inventory tasks
- 100% compliance with asset tracking requirements
- Faster migrations with comprehensive asset lists

 

Repository
License: MIT (free for commercial use)
Status: Alpha release (production-ready, actively maintained)

 

Questions for the Community:
1. What other Databricks assets would you like to see inventoried?
2. Would you use this for compliance, cost optimization, or both?
3. Any integration requests (Jira, ServiceNow, etc.)?

 

Happy to answer questions and incorporate feedback!
0 REPLIES 0