Databricks Community

MichelleD · ‎06-09-2025

Introduction

In today's fast-paced world, the speed and volume of data inputs are skyrocketing, increasing the importance of swift decision-making. Fortunately, innovative technologies like Databricks AI/BI and all the other reporting tools available on the market have democratized access to vast datasets, empowering more individuals to extract valuable insights from data. As a result, there's a growing need for collaborative environments where multiple users can concurrently access massive datasets measured in terabytes. This shift has led the industry away from traditional databases, overwhelmed by sheer data size, toward lakehouse architectures that decouple storage and compute, allowing for scalable growth.

However, a common misconception persists: query engines like Databricks SQL Serverless are best suited for large-scale batch processing and struggle with high-concurrency and small-to-medium analytical queries. Many believe that traditional cloud data warehouses or specialized query engines outperform Databricks in BI-driven scenarios requiring rapid response times and high concurrency. But is this really the case?

In this blog, we'll explore how Databricks SQL Serverless can support high-concurrency BI workloads, maintaining low query latencies and scalable performance across varied analytical scenarios.

Before even starting the discussion, let’s clarify the focus. This article concentrates exclusively on analytical scenarios, setting aside Online Transaction Processing (OLTP). Analytical processing covers scenarios where the technology is designed for performing high-speed complex queries and multidimensional analysis on large volumes of data, typically used in business intelligence, decision support, and forecasting applications. It differs from OLTP by prioritizing data reads and complex queries rather than transactional updates or frequent single-row lookups.

We will explore how Databricks SQL Serverless can efficiently manage datasets ranging from a few megabytes to terabytes, encompassing both small-scale analytical projects and large-scale enterprise data warehouses.

The market expectations

In modern business intelligence (BI) environments, systems must handle a wide range of users, from a few to hundreds, interacting with standard reporting tools or natural language interfaces like Databricks Genie. These systems need to manage thousands of queries per minute with a 95th-percentile latency of a few seconds to a fraction of it.

The two key concepts driving the success of BI systems are:

Scalability and Cost Efficiency: The system should enable users to execute a wide variety of queries on vast amounts of data while minimizing costs. This means providing the ability to handle diverse query types and large datasets efficiently, without breaking the bank.
Speed: Users of BI systems are typically not looking for sub-millisecond latencies but generally expect response times of just a few seconds. Long delays are unacceptable, as fast insights are essential for effective decision-making.

This is the scale that the market demands. Now, let's explore how the Databricks SQL engine can effectively support such high-performance requirements.

The secret sauce

As engineers and scientists, we do not believe in magic, and this is why we are going to uncover the secret sauce behind Databricks SQL engine's performance.

The three pillars driving Databricks SQL's performance are:

Photon: A native vectorized query engine written in C++ that accelerates SQL and DataFrame workloads on the Databricks Data Intelligence Platform. By leveraging modern hardware and parallel execution, Photon processes data in batches using CPU vectorization, delivering up to 12x faster performance and significantly lowering total cost of ownership (TCO) compared to traditional engines.
Caching: Databricks SQL implements a multi-layered caching architecture to optimize performance and efficiency. The SQL UI cache stores per-user query and dashboard results. Query result cache includes local in-memory storage, cleared when clusters stop, and a remote cache that persists across all serverless warehouses. The disk cache stores data on local SSDs to accelerate data reads and automatically invalidates when source files are updated. This mechanism stores frequently accessed data in memory or fast storage, significantly speeding up query execution and reducing resource usage, leading to lower costs and improved efficiency.
Intelligent Workload Management (IWM): A set of AI-powered features that dynamically manage resources to handle a high volume of queries efficiently. It monitors warehouse capacity in real time, prioritizes and queues queries as needed, and auto-scales compute to maintain performance and cost-efficiency under varying workload demands.

With the previous points, we covered the architectural and software components that enable Databricks SQL to be so efficient in handling the demands of high-concurrency BI workloads. In the next section, we would like to focus on the practical side of the problem, which is proving our statements with load testing and experimental runs.

Testing concurrency in practice

Testing SQL BI query concurrency can initially seem complex, as it requires simulating real-world scenarios to gather meaningful insights. Upon closer examination, there are primarily two approaches to consider:

User-Driven Testing: Opening the application to users in a User Acceptance Testing (UAT) environment can provide real-world stress testing. The variability inherent in user-driven testing makes it challenging to conduct controlled and repeatable performance analysis.
Simulated Stress Test: A more controlled and safer alternative is to use tools like Apache JMeter to simulate concurrent users.

Screenshot 2025-05-20 at 8.53.00 PM.png

The most effective and realistic way to perform a stress test is by onboarding users onto the platform, as described in the user-driven testing approach. However, there are situations where users may not be available for testing. In such cases, here are some suggestions for simulating a production-like workload. First, gather queries from a real production environment if possible. This allows you to collect hundreds of diverse queries that are different in terms of tables accessed or parameters used. Compile these queries into a CSV file to feed into a JMeter template, enabling JMeter to use real-world queries during the stress test.

Another crucial aspect is replicating the frequency patterns of the production workload. If available, analyze the current workload monitoring data or gather concurrency requirements from the project owner. This step helps configure maximum concurrency and ramp-up/down patterns for the stress test, aiming to validate system resilience under conditions similar to those expected in production.

During our work with clients, we've observed several behaviors of Databricks SQL clusters activated during the stress tests:

Databricks IWM can identify the best scheduling possible between the concurrent queries landing on the cluster. This feature significantly reduces query queue times, enhancing overall performance under high concurrency conditions.
Databricks Cache Layers act as a tank for the hot data requested by the queries and, as the concurrency increases, transform Databricks SQL into a sort of in-memory database. It’s easy to experience a 95% hit in cache even in a strongly heterogeneous set of queries. This capability is particularly effective in handling high concurrency. As more queries are executed, the cache grows, leading to shorter response times. Notably, performance remains stable across diverse percentiles lowering the risk of having long tail responses for the 95th or 99th percentiles.
Filtering and Optimization techniques applied in Databricks like liquid clustering and predictive I/O enhance query pushdown and efficient data skipping, reducing the amount of data read from blob storage. The next section covers more best practices for handling high-concurrency workloads.

We have observed these behaviors consistently during high-concurrency stress tests conducted in collaboration with multiple customers. While our association with Databricks may influence our perspective, we strongly encourage running similar tests to evaluate your BI solution and experience the performance and scalability benefits of Databricks SQL firsthand. To help you set up a comprehensive stress test, we suggest referencing this technical blog from one of our colleagues. It provides a step-by-step guide on how to set up a proper stress test using Apache Jmeter.

The second step of this game is making the most of Databricks capabilities to optimize the data layout and the compute configuration to handle high-concurrency requests. Let’s jump into it with the next section.

Optimizing High Concurrency Workloads on Databricks SQL

Supporting high-concurrency workloads on Databricks SQL goes beyond simply scaling compute. Some targeted configuration choices can help achieve consistent performance and cost-efficiency. By aligning data layout and compute settings with your workload patterns, you can minimize query planning and execution latency and maintain responsive performance even at peak load. This section outlines best practices across these two key areas to help you build efficient, scalable SQL environments.

Structuring Your Data for High-Concurrency Performance

How your data is organized has a big impact on performance, especially when many users are querying at once. If queries can quickly find just the data and metadata they need, the system does less work, cache hit rates go up, runs faster, and handles more users at the same time. That’s where smart data layout comes in.

Use Liquid Clustering to Minimize Unnecessary Scanning

Many concurrent queries use similar filters like date, region, or customer_id. Without clustering, they scan unnecessary files, increasing I/O and slowing performance. Liquid Clustering dynamically organizes data layouts in Delta tables around the cluster keys, typically the frequently used filter columns or join keys, by leveraging multi-dimensional clustering techniques to enhance query performance and support high-concurrency workloads. It improves performance by enabling efficient data skipping, and reduced write amplification. These capabilities allow for faster and cost-effective query execution, adapting to changing access patterns and data distributions.

Use Unity Catalog Managed Tables

Unity Catalog (UC) managed tables are the foundation for unlocking the full performance potential of the Databricks Data Intelligence Platform. They are designed to be performant, with faster query execution and lower storage costs, while also being easy to use and automatically optimized through Predictive Optimization. Predictive Optimization continuously analyzes query patterns and table characteristics to improve performance behind the scenes. This includes automated execution of operations like OPTIMIZE to compact file sizes and incremental clustering of new data, VACUUM to remove obsolete files, and ANALYZE to incrementally update statistics, ensuring both speed and cost efficiency. UC managed tables are fully open and interoperable, allowing you to access and manage data using catalog APIs and external tools. This makes them a powerful and flexible choice for building scalable, high-concurrency analytics solutions on Databricks SQL.

Use Materialized Views to Precompute Complex Queries

Materialized views are an effective way to improve performance and reduce costs in high-concurrency environments where the same queries such as dashboards, reports, or application refreshes are executed repeatedly and the underlying data changes infrequently. By pre-computing the results of complex operations like joins, filters, and aggregations, materialized views eliminate the need to recompute logic on every query, significantly reducing latency and compute usage. This offloads pressure from SQL warehouses and allows more concurrent queries to run efficiently. Materialized views also support common data processing tasks such as denormalization, enrichment, and transformations in a simple declarative way. In some cases, they can be incrementally refreshed with new data, further enhancing efficiency.

Compute Configuration for High Concurrency

For high-concurrency environments, your SQL Warehouse must respond quickly to fluctuating demand, scaling up when queries spike and scaling down to save costs when idle, and stay responsive under pressure. These strategies ensure your compute layer can handle high concurrency effectively.

Use Databricks Serverless SQL warehouses

Databricks Serverless SQL Warehouses are designed to deliver fast, efficient performance. They support all key performance features of Databricks SQL, including Photon, Predictive I/O, and Intelligent Workload Management, ensuring low-latency query execution even under heavy load. Predictive I/O is a Photon-exclusive feature in Databricks that improves read and write performance by intelligently accessing only the necessary data and minimizing file rewrites. With near-instant startup and rapid upscaling and downscaling, Serverless SQL warehouses automatically adjust compute resources to match real-time query demand. You can configure the minimum and maximum number of clusters to balance responsiveness with cost, and set an auto-stop interval to shut down idle compute and avoid unnecessary charges. In addition to the local result cache, Databricks SQL Serverless offers a remote result cache, a persistent, serverless-only feature that stores results in shared cloud storage across all warehouses. This allows repeated queries to return instantly, even after warehouse restarts, minimizing redundant computation and ensuring faster, consistent performance in high-concurrency environments. This makes SQL Serverless a strong choice for handling a wide range of high-concurrency scenarios from steady, scheduled workloads to sudden usage spikes while keeping performance consistent and costs under control.

Right-Size Your SQL Warehouses

Properly sizing your SQL Warehouse is key to balancing performance and cost under high concurrency. Consider factors like query complexity, data volume, SLA targets, expected concurrency, and budget. A good approach is to start with a serverless warehouse slightly larger than what you think you need, then scale down based on testing. Serverless handles autoscaling and query queuing, adjusting resources as needed. If you notice disk spilling or slow queries, increase the warehouse size. To improve concurrency, add more clusters to run queries in parallel. If performance is solid and you're looking to reduce costs, consider stepping down in size, as long as it doesn't cause disk spilling or a noticeable slowdown.

Monitor Concurrency Health

In high-concurrency environments, where many users and dashboards access data simultaneously, clear visibility into system performance is essential. Without proper monitoring, issues like query delays, disk spills, or resource saturation can go unnoticed, leading to a poor user experience and inefficient resource usage.

The tools below help you monitor warehouse health, optimize performance, and make informed decisions about right-sizing your SQL Warehouses.

- Monitoring Page

Use the Databricks UI monitoring tab to track real-time SQL Warehouse activity and check peak queued queries. If more than one query is frequently queued, consider increasing the number of clusters to support higher concurrency.

Screenshot 2025-05-20 at 8.55.08 PM.png - Query History & Query Profiles

Use the Databricks UI Query History or system tables (e.g., system.query.history) to analyze query patterns, identify slow or frequently repeated queries, and spot optimization opportunities like materialized views or Liquid Clustering. For deeper insights, Query Profiles reveal metrics like bytes spilled to disk, a sign your warehouse size may be too small.

- SQL Warehouse Advisor Dashboard

This open-source dashboard, developed by DBSQL SME members, offers a consolidated view of query concurrency, queueing, autoscaling behavior, and configuration insights. It helps platform teams quickly detect inefficiencies and fine-tune warehouses for performance and cost.

Conclusion

The true capabilities of Databricks SQL for high-concurrency workloads are best demonstrated through practical testing. While this blog has outlined the architectural components that enable scalable performance, such as Photon, Predictive I/O, caching, and Intelligent Workload Management, the most definitive validation comes from running real or simulated concurrency tests. In the section "Testing Concurrency in Practice," we illustrated how tools like Apache JMeter can be used to simulate hundreds of concurrent queries, allowing you to observe how the system behaves under load. This hands-on approach makes it possible to evaluate key behaviors: how quickly cache warms up, how IWM maintains low query queue times, how Serverless SQL Warehouses scale in real-time, and how overall query latency remains consistent under pressure. Alongside this, we’ve provided a set of best practices for optimizing data layout and compute configuration to support high-concurrency workloads efficiently.

With the tools, patterns, and recommendations shared in this blog, you are well-equipped to evaluate performance, fine-tune your architecture, and confidently scale your analytics workloads on Databricks SQL. If you're looking to future-proof your analytics platform and deliver fast, consistent performance at scale, now is the time to explore what's possible with Databricks SQL. You might be surprised just how far it can go.

Databricks Community

High Concurrency Data Consumption on Databricks SQL

Introduction

The market expectations

The secret sauce

Testing concurrency in practice

Optimizing High Concurrency Workloads on Databricks SQL

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks