cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Should I enable Photon on my SQL Endpoint?

User16826992666
Valued Contributor

I see the option to enable Photon when creating a new SQL Endpoint. The description says that enabling it helps speed up up queries, which sounds good, but are there any downsides I need to be aware of?

1 ACCEPTED SOLUTION

Accepted Solutions

Ryan_Chynoweth
Esteemed Contributor

Generally, yes you should enable photon. The majority of functionality is available and will perform extremely well. There are some limitations with it that can be found here.

Limitations:

  • Works on Delta and Parquet tables only for both read and write.
  • Does not support the following data types:
    • Map
    • Array
  • Does not support window and sort operators
  • Does not support Spark Structured Streaming.
  • Does not support UDFs.
  • Not expected to improve operations bottlenecked by network or scan I/O.
  • Not expected to improve short-running queries (<2 seconds), for example, against small data.

Advantages:

  • Supports SQL and equivalent DataFrame operations against Delta and Parquet tables.
  • Expected to accelerate queries that process a significant amount of data (100GB+) and include aggregations and joins.
  • Data is accessed repeatedly and likely in the Delta Lake cache.
  • More robust scan performance on tables with many columns and many small files.
  • Faster Delta and Parquet writing using update, delete, merge into, and create table as select, especially for wide tables (hundreds to thousands of columns).
  • Photon replaces sort-merge joins with hash-joins.

View solution in original post

1 REPLY 1

Ryan_Chynoweth
Esteemed Contributor

Generally, yes you should enable photon. The majority of functionality is available and will perform extremely well. There are some limitations with it that can be found here.

Limitations:

  • Works on Delta and Parquet tables only for both read and write.
  • Does not support the following data types:
    • Map
    • Array
  • Does not support window and sort operators
  • Does not support Spark Structured Streaming.
  • Does not support UDFs.
  • Not expected to improve operations bottlenecked by network or scan I/O.
  • Not expected to improve short-running queries (<2 seconds), for example, against small data.

Advantages:

  • Supports SQL and equivalent DataFrame operations against Delta and Parquet tables.
  • Expected to accelerate queries that process a significant amount of data (100GB+) and include aggregations and joins.
  • Data is accessed repeatedly and likely in the Delta Lake cache.
  • More robust scan performance on tables with many columns and many small files.
  • Faster Delta and Parquet writing using update, delete, merge into, and create table as select, especially for wide tables (hundreds to thousands of columns).
  • Photon replaces sort-merge joins with hash-joins.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group