cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

TCO calculator for Databricks Analytics

Raman_Unifeye
Contributor III

Similar to the cloud infra calculators, is there a TCO calculator exist for Databricks?

Lets say we have the inputs such as Number of source tables, data pipelines (estimated number), data growth per day, transfromation complexity and target reports and number of users for analytical usage, is there a way to cacluate the +/- 50 or +/-100% cost estimation ? 

I totally get the factors around consumption and DBUs, but to give a fair idea, is there any such calculator exist ? Or any way we could derive a sensible way, rather 'finger in the air'. Assume it to be a complete ELT/Analytics use-case.


RG #Driving Business Outcomes with Data Intelligence
4 REPLIES 4

Raman_Unifeye
Contributor III

[Cannot edit Q] so for simplicity, lets assume Serverless for Job compute and Serverless SQL Warehouse.


RG #Driving Business Outcomes with Data Intelligence

szymon_dybczak
Esteemed Contributor III

Hi @Raman_Unifeye ,

There's a pricing calculator that you can check:

Pricing Calculator Page | Databricks

Raman_Unifeye
Contributor III

@szymon_dybczak - I am aware of that calculator, however, the challenge is - how to even calculate the number of DBU it will consume based on the volume of data processing etc. The tool starts with the Infra and compute inputs. However, my question is if the input parameters are - the data volume, number of pipelines and transformation complexity, how to convert that into DBU consumption which can then be used in the calculator.


RG #Driving Business Outcomes with Data Intelligence

szymon_dybczak
Esteemed Contributor III

Hi @Raman_Unifeye ,

The thing is Databricks pricing is based on your compute usage.Storage, networking and related costs will vary depending on the services you choose and your cloud service provider.

I think you won't find such a tool because every workload is different. For example, processing a table that has hundreds of millions of rows can vary significantly between two data pipelines. In pipeline A, you may have very complex transformations, and the time spent computing them will greatly affect the DBU cost (compute usage).

Meanwhile, pipeline B may simply take the data and perform a straightforward insert without any transformations. The cost of such a pipeline will be much lower, even though the amount of data processed is similar.

What Iโ€™m trying to say is that you wonโ€™t find a tool that can reliably estimate DBU cost based solely on data volume. By understanding your environments and transformations, you can try to estimate it yourself, but you wonโ€™t find a generic solution that will accurately calculate it for you.