- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-21-2025 02:27 PM
You're touching on a really interesting area! While Databricks hasn't open-sourced predictive optimization,
there have been some community efforts and approaches to build similar functionality:
Community Efforts:
Yes, some teams build DIY solutions using Spark query logs and custom listeners
Focus on liquid clustering column selection and automated stats collection
No full open-source clone exists yet
Common Approaches:
Parse Spark History Server logs for column usage patterns
Custom EventListeners to capture query telemetry
Heuristic-based optimization scheduling
Reality Check:
Targeted solutions (clustering hints, stats automation) are feasible
Full predictive optimization replication is complex
Databricks hasn't indicated plans to open-source it
Bottom Line: Build incrementally - start with query pattern analysis for liquid clustering decisions, then expand based on ROI.