Databricks Community

Roxio · ‎08-16-2024

I have a query that calls different materialized views, anyway most of the time of the query is spent in "Optimizing query & pruning files" vs the execution.

The difference is like 2-3 secs for the optimization and 300-400ms for the execution

Similar if I run a query against tables, with the same data structure and data, but in the case of tables the difference is 700-800ms for optimization and 70-80ms for the execution

These are results when data is returned by the cache.

My question is:

- Why such a big difference between materialized views and table, when in theory it should be much similar results

- Is there a way that I can try to reduce the time used for "Optimizing query & pruning files"?

Brahmareddy · ‎08-16-2024

Hi Roxio, How are you doing today?

The difference in query times between materialized views and tables likely comes from the complexity of the views, as they often involve more steps in the background. To reduce the optimization time, you can try simplifying the materialized view, ensuring good data partitioning, and using techniques like Z-ordering or clustering. Also, make sure your caching is efficient and check the query execution plan for any bottlenecks. These steps should help speed things up!

Reagtds,

Brahma

View solution in original post

Brahmareddy · ‎08-16-2024

Hi Roxio, How are you doing today?

The difference in query times between materialized views and tables likely comes from the complexity of the views, as they often involve more steps in the background. To reduce the optimization time, you can try simplifying the materialized view, ensuring good data partitioning, and using techniques like Z-ordering or clustering. Also, make sure your caching is efficient and check the query execution plan for any bottlenecks. These steps should help speed things up!

Reagtds,

Brahma