cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark SQL vs serverless SQL

camilo_s
Contributor

Are there any benchmarks showing performance and cost differences between running SQL workloads on Spark SQL vs Databricks SQL (specially serverless SQL)?

Our customer is hesitant about getting locked into Databricks SQL as opposed to being able to run their queries in Spark SQL.

Is there a performance difference between running a query on Spark SQL on a Photon-enabled cluster vs running it on serverless SQL?

3 REPLIES 3

raphaelblg
Databricks Employee
Databricks Employee

 

Hi @camilo_s ,
 
Spark SQL is the SQL API for Spark applications, while Databricks SQL is a product that follows data warehouse principles. You can anticipate performance differences mainly due to the fact that Databricks SQL compute is based on SQL Warehouses (multiple Spark clusters), while Spark SQL relies on the common Spark cluster architecture.
 
Choosing a serverless option may not provide huge processing time enhancements, but it can offer near-instant startup times.
 
I hope this information is helpful. If you have any further questions or need additional clarification, please don't hesitate to ask.
Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

florence023
New Contributor III

@camilo_s wrote:

Are there any benchmarks showing performance and cost differences between running SQL workloads on Spark SQL vs Databricks SQL (specially serverless SQL)?

Our customer is hesitant about getting locked into Databricks SQL as opposed to being able to run their queries in Spark SQL. hpinstantink

Is there a performance difference between running a query on Spark SQL on a Photon-enabled cluster vs running it on serverless SQL?


Hello,

Yes, there are benchmarks and comparisons available that highlight the performance and cost differences between running SQL workloads on Spark SQL and Databricks SQL, particularly serverless SQL.

Performance and Cost Differences:
Databricks SQL Serverless is designed to provide instant and elastic compute, which can significantly reduce costs and improve performance by eliminating the need for manual tuning1. It leverages AI to optimize performance, such as predictive I/O and automatic data layout, which can lead to substantial performance improvements1.
Spark SQL on a Photon-enabled cluster also offers performance enhancements, particularly for compute-intensive operations. Photon is a native vectorized engine that can accelerate query execution, leading to faster performance compared to traditional Spark SQL.
Benchmarks:
Internal tests by Databricks have shown that Serverless SQL can be more cost-efficient and performant compared to traditional cloud data warehouses, considering factors like cluster startup time, query execution time, and overall cost.
Comparisons using TPC-DS benchmark data indicate that Databricks SQL Serverless can outperform other platforms in terms of both execution cost and performance.
Customer Concerns:
If your customer is concerned about vendor lock-in, it’s worth noting that Databricks SQL is built on open standards and integrates well with existing Spark workloads. This means that while they can benefit from the optimizations and performance improvements of Databricks SQL, they still have the flexibility to run their queries on Spark SQL if needed.

Hope this will help you.
Best regards,
florence023

robinhood555
New Contributor II

@camilo_s wrote:

Are there any benchmarks showing performance and cost differences between running SQL workloads on Spark SQL vs Databricks SQL (specially serverless SQL)?  hpinstantink

Our customer is hesitant about getting locked into Databricks SQL as opposed to being able to run their queries in Spark SQL.

Is there a performance difference between running a query on Spark SQL on a Photon-enabled cluster vs running it on serverless SQL?


Yes, benchmarks indicate performance and cost differences between Spark SQL and Databricks SQL, especially serverless SQL. Databricks SQL, particularly with Photon-enabled clusters, generally offers faster performance due to its optimized query engine. Serverless SQL further reduces management overhead and scales automatically, which can result in cost efficiency for bursty workloads. However, Spark SQL provides more flexibility, avoiding vendor lock-in but may require more tuning for optimal performance. The decision should balance performance needs, cost, and concerns about vendor lock-in.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group