Hi @noorbasha534 ,

That’s a really cool idea and definitely shows initiative - but realistically, it might not be worth the effort. There’s a lot of engineering going on under the hood that would be tough to replicate in-house.

Collecting telemetry and using it for things like liquid clustering and stats gathering could work to some extent, but the effort required to build and maintain something similar would likely outweigh the benefits, especially given how deeply integrated and optimized the native solution is.
If you have external tables I would just take care of regular maintenance of the tables (etc. like running optimize/ vacuum regulary).

Would be awesome if Databricks open-sourced it, though - totally agree with you there.