Databricks Community

jwilliam · ‎05-06-2022

BilalAslamDbrx · ‎05-12-2022

Great question! There are similarities and differences:

Similarities

Photon is enabled on both
You have Databricks Runtime on both

Differences

Databricks Runtime (DBR) version is managed and auto-upgraded in Databricks SQL. Because SQL is a narrower workload than, say, data science, we automatically manage the version of DBR that runs on Databricks SQL Endpoints. This is a good thing - you don't have to worry about upgrading etc.
DBR behaves slightly differently on SQL Endpoints compared to Clusters. This is again a good thing. Mostly we optimize for the SQL workload and set configs automatically so you don't have to.
SQL Endpoints are actually behind a scalable gateway proxy. This proxy can, among other things, scale out the clusters as your SQL workload scales up or down. This brings elasticity to your workloads. A bunch of stuff like caching and metadata processing go here, too, to speed things up.

TL;DR if you are doing SQL/BI, please consider using SQL Endpoints, it's generally the best choice for that workload.

View solution in original post

Anonymous · ‎05-06-2022

They are very similar. Databricks SQL uses compute that has photon enabled. A traditional cluster with photon enabled does allow for a few more configurations to be set around the cluster architecture and settings. The traditional cluster will also have more libraries installed as it needs to run things in various languages, where the endpoints only needs SQL APIs.

https://docs.databricks.com/runtime/photon.html#limitations. This lists some limitations, although additional data source reads is in preview now.

jwilliam · ‎05-06-2022

Thank you. Will traditional cluster support serverless execution in the future or only SQL endpoints support that?

And are there any optimization tweaks in Databricks SQL that makes it perhaps faster than traditional Databricks cluster running only SQL queries?

Anonymous · ‎05-06-2022

Serverless for traditional compute is in preview for single node machines and multinode cluster serverless is on the roadmap.

I'm sure there are a few optimizations that makes things faster. Simple things such as caching metadata in the metastore helps.

Hubert-Dudek · ‎05-06-2022

I wouldn't call them the same as Databricks SQL runtime is a bit different (not everything is supported for example UDFs), new releases are separated from standard runtimes updates: https://docs.databricks.com/sql/release-notes/index.html

Databricks cluster can handle notebooks. SQL endpoint is only for SQL queries.

Both can be in photon or non-photon versions. Photon has a bunch of improvements for example better handle small files problem.

BilalAslamDbrx · ‎05-12-2022

Great question! There are similarities and differences:

Similarities

Photon is enabled on both
You have Databricks Runtime on both

Differences

Databricks Runtime (DBR) version is managed and auto-upgraded in Databricks SQL. Because SQL is a narrower workload than, say, data science, we automatically manage the version of DBR that runs on Databricks SQL Endpoints. This is a good thing - you don't have to worry about upgrading etc.
DBR behaves slightly differently on SQL Endpoints compared to Clusters. This is again a good thing. Mostly we optimize for the SQL workload and set configs automatically so you don't have to.
SQL Endpoints are actually behind a scalable gateway proxy. This proxy can, among other things, scale out the clusters as your SQL workload scales up or down. This brings elasticity to your workloads. A bunch of stuff like caching and metadata processing go here, too, to speed things up.

TL;DR if you are doing SQL/BI, please consider using SQL Endpoints, it's generally the best choice for that workload.