lingareddy_Alva
Esteemed Contributor

Hi @noorbasha534 

You're touching on a really interesting area! While Databricks hasn't open-sourced predictive optimization,
there have been some community efforts and approaches to build similar functionality:

Community Efforts:

Yes, some teams build DIY solutions using Spark query logs and custom listeners
Focus on liquid clustering column selection and automated stats collection
No full open-source clone exists yet

Common Approaches:
Parse Spark History Server logs for column usage patterns
Custom EventListeners to capture query telemetry
Heuristic-based optimization scheduling

Reality Check:
Targeted solutions (clustering hints, stats automation) are feasible
Full predictive optimization replication is complex
Databricks hasn't indicated plans to open-source it

Bottom Line: Build incrementally - start with query pattern analysis for liquid clustering decisions, then expand based on ROI.

 

 

LR