Authors: Andrey Mirskiy (@AndreyMirskiy) and Marco Scagliola (@MarcoScagliola)
Welcome to part III of our blog series on “Why Databricks SQL Serverless is the best fit for BI workloads”. In part I of the series we discussed Disk Caching (you can access it through this link: Part I: Disk Cache), focusing on how this method improves query performance by utilizing on-disk data storage, resulting in faster data retrieval. In part II we demonstrated different approaches for basic and advanced performance test plans against Databricks SQL Warehouse with Apache JMeter. In this post we will be discussing Query Result Cache (QRC), a feature which accelerates execution of repetitive queries by caching their results.
Caching in databases is a performance-enhancing technique that minimizes Input and Output (I/O) read and write operations. This approach not only speeds up queries but also has a positive impact on cost.
In the QRC architecture, query results are stored in-memory to maintain low latency access for cache hits, particularly for queries with high Queries Per Second (QPS). This approach ensures rapid data retrieval as results are fetched directly from the cache, eliminating the need to execute a query.
This section provides a repeatable demo scenario showcasing the capabilities of QRC.
For QRC testing we will be using Apache JMeter and the sample test plan based on the TPCH schema which can be found in any Databricks workspace under the samples catalog.
The test plan consists of the following JMeter objects:
SET use_cached_result = false;
4. On the Thread Group named “QRC is ON”, QRC is enabled by default.
In the code below, you can see the sample test query used in the JMeter test plan. The query remains consistent across both JMeter Thread Groups, namely “QRC is OFF'' and “QRC is ON”. As a result, the latter thread group should benefit from the cached query result when running the former thread group.
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '79' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Based on the results of the run with QRC set to OFF, the wall-clock metrics in figure 2 display a total duration of one 1 second and 91 milliseconds. In addition, the IO metrics shown in figure 3 are not empty, indicating that data retrieval occurred to fulfill our query.
When comparing these results with those obtained from the run with QRC set to ON (default), the wall-clock metrics in figure 4 display an overall duration of two hundred and thirty (230) milliseconds. Additionally, the IO metrics in figure 5 are empty, signifying that no data retrieval was necessary to satisfy our query.
Therefore, the query result was retrieved from the Query Result Cache.
This test demonstrates that Databricks SQL Warehouse uses the Query Result Cache (QRC) feature, preserving query results in a cache. This feature enables faster data retrieval by prioritizing cache access over direct disk reads.
In this post we have covered the fundamental principles of Query Result Caching. As demonstrated, we observed in this sample test plan that QRC (Query Result Caching) can significantly improve the performance of repetitive queries, which are common in BI workloads. This improvement contributes to an enhanced overall user experience and leads to reduced cost for SQL BI workloads. For more information on Query Result Caching, refer to the official Databricks documentation (Azure | AWS | GCP).
In the next post we will discuss the additional benefits of Query Result Caching available in Databricks SQL Serverless.
The JMeter artifacts featured in this blog are available in the following Github repository
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.