cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
MarcoScagliola
Databricks Employee
Databricks Employee

Authors: Andrey Mirskiy (@AndreyMirskiy) and Marco Scagliola (@MarcoScagliola)

dbsql_part3.pngWelcome to part III of our blog series on “Why Databricks SQL Serverless is the best fit for BI workloads”. In part I of the series we discussed Disk Caching (you can access it through this link: Part I: Disk Cache), focusing on how this method improves query performance by utilizing on-disk data storage, resulting in faster data retrieval.  In part II we demonstrated different approaches for basic and advanced performance test plans against Databricks SQL Warehouse with Apache JMeter. In this post we will be discussing Query Result Cache (QRC), a feature which accelerates execution of repetitive queries by caching their results.

 

Query result cache overview

Caching in databases is a performance-enhancing technique that minimizes Input and Output (I/O) read and write operations. This approach not only speeds up queries but also has a positive impact on cost. 

In the QRC architecture, query results are stored in-memory to maintain low latency access for cache hits, particularly for queries with high Queries Per Second (QPS). This approach ensures rapid data retrieval as results are fetched directly from the cache, eliminating the need to execute a query.

Demo Scenario

This section provides a repeatable demo scenario showcasing the capabilities of QRC.

Setup

For QRC testing we will be using Apache JMeter and the sample test plan based on the TPCH schema which can be found in any Databricks workspace under the samples catalog.

The test plan consists of the following JMeter objects:

  1. JDBC Connection named “DBSQL connection” defining the connection to Databricks SQL Warehouse.
  2. Thread Group named “WarmUp cache” responsible to cache the lineitem table in Disk Cache.
  3. Thread Group named “QRC is OFF” where a sample SQL query is executed with a PreProcessor JMeter object that disables QRC by executing the statement shown on code 1.
  • Code 1: Command which disables QRC.
    SET use_cached_result = false;

4. On the Thread Group named “QRC is ON”,  QRC is enabled by default.

Fig. 1: JMeter configuration pane for the QRC test planFig. 1: JMeter configuration pane for the QRC test plan

In the code below, you can see the sample test query used in the JMeter test plan. The query remains consistent across both JMeter Thread Groups, namely “QRC is OFF'' and “QRC is ON”. As a result, the latter thread group should benefit from the cached query result when running the former thread group.

  • Sample query used for the QRC test plan:
    select
    	l_returnflag,
    	l_linestatus,
    	sum(l_quantity) as sum_qty,
    	sum(l_extendedprice) as sum_base_price,
    	sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    	sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    	avg(l_quantity) as avg_qty,
    	avg(l_extendedprice) as avg_price,
    	avg(l_discount) as avg_disc,
    	count(*) as count_order
    from
    	lineitem
    where
    	l_shipdate <= date '1998-12-01' - interval '79' day
    group by
    	l_returnflag,
    	l_linestatus
    order by
    	l_returnflag,
    	l_linestatus;

Query Result Cache disabled

Based on the results of the run with QRC set to OFF, the wall-clock metrics in figure 2 display a total duration of one 1 second and 91 milliseconds. In addition, the IO metrics shown in figure 3 are not empty, indicating that data retrieval occurred to fulfill our query.

Fig. 2: Wall-clock duration metrics with QRC disabledFig. 2: Wall-clock duration metrics with QRC disabled

 

 

Fig. 3: Query execution metrics with QRC disabledFig. 3: Query execution metrics with QRC disabled

 

 

Query Result Cache enabled

When comparing these results with those obtained from the run with QRC set to ON (default), the wall-clock metrics in figure 4 display an overall duration of two hundred and thirty (230) milliseconds. Additionally, the IO metrics in figure 5 are empty, signifying that no data retrieval was necessary to satisfy our query. 

Therefore, the query result was retrieved from the Query Result Cache.

Fig. 4: Wall-clock duration metrics with QRC enabledFig. 4: Wall-clock duration metrics with QRC enabled

 

Fig. 5: Query execution metrics with QRC enabledFig. 5: Query execution metrics with QRC enabled

This test demonstrates that Databricks SQL Warehouse uses the Query Result Cache (QRC) feature, preserving query results in a cache. This feature enables faster data retrieval by prioritizing cache access over direct disk reads.

 

Conclusion

In this post we have covered the fundamental principles of Query Result Caching. As demonstrated, we observed in this sample test plan that QRC (Query Result Caching) can significantly improve the performance of repetitive queries, which are common in BI workloads. This improvement contributes to an enhanced overall user experience and leads to reduced cost for SQL BI workloads. For more information on Query Result Caching, refer to the official Databricks documentation (Azure | AWS | GCP).

 In the next post we will discuss the additional benefits of Query Result Caching available in Databricks SQL Serverless.

Resources

The JMeter artifacts featured in this blog are available in the following Github repository