Hi, we understand that the client performs a two-phase query planning process. In our case, the table has around 145,000 Parquet files, and we've observed that the first request becomes a significant bottleneck: the response body is large (655 MB) and takes ~10 seconds. We also tried supporting gzip response but it still takes 10 seconds before second (actual query) request is sent to server. The majority of those files are ultimately filtered out in the second request.
Is this response cached or reused in the client (e.g., Databricks Runtime), or is this metadata fetch expected to occur for every query?
For simple queries like:
SELECT COUNT(1) FROM table WHERE part_col1 = 'val1'
Even though such queries will ultimately be pruned down to a small number of files, the protocol requires returning all 145,000 file entries in the initial response just for the client to begin planning.
Is this the expected performance behavior in such scenarios, or are there best practices to mitigate the overhead?
Appreciate your insights and design rationale here.