cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Spark SQL vs serverless SQL

camilo_s
Contributor

Are there any benchmarks showing performance and cost differences between running SQL workloads on Spark SQL vs Databricks SQL (specially serverless SQL)?

Our customer is hesitant about getting locked into Databricks SQL as opposed to being able to run their queries in Spark SQL.

Is there a performance difference between running a query on Spark SQL on a Photon-enabled cluster vs running it on serverless SQL?

5 REPLIES 5

raphaelblg
Databricks Employee
Databricks Employee

 

Hi @camilo_s ,
 
Spark SQL is the SQL API for Spark applications, while Databricks SQL is a product that follows data warehouse principles. You can anticipate performance differences mainly due to the fact that Databricks SQL compute is based on SQL Warehouses (multiple Spark clusters), while Spark SQL relies on the common Spark cluster architecture.
 
Choosing a serverless option may not provide huge processing time enhancements, but it can offer near-instant startup times.
 
I hope this information is helpful. If you have any further questions or need additional clarification, please don't hesitate to ask.
Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

florence023
New Contributor III

@camilo_s wrote:

Are there any benchmarks showing performance and cost differences between running SQL workloads on Spark SQL vs Databricks SQL (specially serverless SQL)?

Our customer is hesitant about getting locked into Databricks SQL as opposed to being able to run their queries in Spark SQL. hpinstantink

Is there a performance difference between running a query on Spark SQL on a Photon-enabled cluster vs running it on serverless SQL?


Hello,

Yes, there are benchmarks and comparisons available that highlight the performance and cost differences between running SQL workloads on Spark SQL and Databricks SQL, particularly serverless SQL.

Performance and Cost Differences:
Databricks SQL Serverless is designed to provide instant and elastic compute, which can significantly reduce costs and improve performance by eliminating the need for manual tuning1. It leverages AI to optimize performance, such as predictive I/O and automatic data layout, which can lead to substantial performance improvements1.
Spark SQL on a Photon-enabled cluster also offers performance enhancements, particularly for compute-intensive operations. Photon is a native vectorized engine that can accelerate query execution, leading to faster performance compared to traditional Spark SQL.
Benchmarks:
Internal tests by Databricks have shown that Serverless SQL can be more cost-efficient and performant compared to traditional cloud data warehouses, considering factors like cluster startup time, query execution time, and overall cost.
Comparisons using TPC-DS benchmark data indicate that Databricks SQL Serverless can outperform other platforms in terms of both execution cost and performance.
Customer Concerns:
If your customer is concerned about vendor lock-in, itโ€™s worth noting that Databricks SQL is built on open standards and integrates well with existing Spark workloads. This means that while they can benefit from the optimizations and performance improvements of Databricks SQL, they still have the flexibility to run their queries on Spark SQL if needed.

Hope this will help you.
Best regards,
florence023

robinhood555
New Contributor II

@camilo_s wrote:

Are there any benchmarks showing performance and cost differences between running SQL workloads on Spark SQL vs Databricks SQL (specially serverless SQL)?  hpinstantink

Our customer is hesitant about getting locked into Databricks SQL as opposed to being able to run their queries in Spark SQL.

Is there a performance difference between running a query on Spark SQL on a Photon-enabled cluster vs running it on serverless SQL?


Yes, benchmarks indicate performance and cost differences between Spark SQL and Databricks SQL, especially serverless SQL. Databricks SQL, particularly with Photon-enabled clusters, generally offers faster performance due to its optimized query engine. Serverless SQL further reduces management overhead and scales automatically, which can result in cost efficiency for bursty workloads. However, Spark SQL provides more flexibility, avoiding vendor lock-in but may require more tuning for optimal performance. The decision should balance performance needs, cost, and concerns about vendor lock-in.

maxwarior
New Contributor II

Performance: Spark SQL vs. Databricks SQL (Serverless with Photon)

Yes, there is a notable performance difference between standard Spark SQL and Databricks SQL on Photon, especially when using Serverless SQL:

  1. Databricks SQL with Photon (especially serverless) is heavily optimized for BI/analytical workloads. Photon is a vectorized query engine written in C++, designed to outperform JVM-based Spark SQL, particularly on complex joins, aggregations, and columnar scans. Printer support number
  2. Benchmarks (including internal Databricks benchmarks and TPC-DS-style tests) often show 2x to 12x performance improvement for SQL workloads on Photon vs. traditional Spark SQL. These numbers, of course, vary by workload complexity, data size, and structure.
  3. Serverless SQL adds auto-scaling, caching, and query optimization enhancements โ€” often resulting in lower latency and better concurrency out of the box, without cluster management overhead.

 

maxwarior
New Contributor II

Spark SQL serves as the SQL interface for Spark applications, whereas Databricks SQL is a more advanced, warehouse-optimized product built around SQL Warehouses, which utilize multiple Spark clusters. This architectural difference can lead to noticeable performance variations.

While the serverless option might not drastically reduce query processing times, it does offer the advantage of near-instant startup, which can improve overall responsiveness.

Hope this helps! If you have any more questions or need further clarification, feel free to ask.https://quickprinterservices24x7.co/  <a href="https://quickprinterservices24x7.co/">printer support</a>

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now