cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why is Photon increasing DBU used per hour?

564824
New Contributor II

I noticed that enabling photon acceleration is increasing the number of DBU utilized per hour which in turn increases our cost.

In light of this, I am interested in gaining clarity on the costing of Photon acceleration as I was led to believe that Photon acceleration optimizes and reduces cost.

Kindly provide with insights regarding the pricing.

I have attached the screenshots where we can see enabling Photon acceleration increases DBU/hour

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

Photon is more expensive DBU wise.
The cost optimization/reduction is achieved by (possible) faster runtimes.
So as you already noticed, it can be a cost reduction but not in all cases (as with you apparently).
But it can also be interesting to serve data faster, even with a higher cost.

View solution in original post

6 REPLIES 6

-werners-
Esteemed Contributor III

Photon is more expensive DBU wise.
The cost optimization/reduction is achieved by (possible) faster runtimes.
So as you already noticed, it can be a cost reduction but not in all cases (as with you apparently).
But it can also be interesting to serve data faster, even with a higher cost.

yj940525
New Contributor II

i don't understand why customers have to pay higher cost for faster runtime if it is pure software solution which doesn't need any extra infrastructure cost.   is it databrick's job to improve runtime? i am not sure whether this is  databricks marketing strategy, but doesnt sound reasonable. E.g. AWS athena engine 3 provides better performance than engine 2, but AWS doesnt increase cost of using athena new engine. 

-werners-
Esteemed Contributor III

Well, the reason why is probably because Photon is not standard Spark anymore.  Databricks has completely rewritten the spark query engine in C++ (instead of Scala) and applied all kinds of optimizations to allow for faster processing.
So it is not merely a question of a new version of spark (which are all priced the same), using your Athena example, but actually another product which is integrated flawlessly in the Databricks platform.
If you do not want to pay extra for Photon, it is no problem at all to not use it.
IMO it seems fair to charge extra as the development of Photon was surely no easy and free task.
There is also the fact that if your job finishes faster you pay less, so that makes the pricing difference smaller.

In the end it is a question of figuring out if you want to use it for a certain workload or not.
I use it from time to time, but not always.

yj940525
New Contributor II

thanks for response, the reason i asked the question is that i observed photon performed better and better when runtime upgraded, while just upgrading runtime might not see much performance improvement at all (i used the same test data), this pretty much enforces customer to enable photon. My concern is that in the future if databricks come up with another improvement, they will increase the cost again.  

yj940525
New Contributor II

Also, our company has three options for data warehousing solution, snowflake, AWS athena/glue and databrick, based on my test results, i noticed Snowflake and Athena performance improved a lot during past few year, especially Athena, but Databrick without enabling photon, i didn't see much performance improvement, this could really drive us away from databrick option if increasing cost is concern 

-werners-
Esteemed Contributor III

well that depends on what kinds of tests you do.  In data warehousing there are different kinds of loads.
What have you tested?  Data transformations or analytical queries.  Because for the latter databricks sql is a better choice than a common spark cluster (with or without photon).

So I suggest you check the different products of Databricks: managed spark clusters (with or without photon, ML/classic), (serverless/classic) SQL warehouses and to tests/ cost comparison likewise.
Also it might be interesting to check what language your data engineering part will use (sql, python, ...).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group