cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Can someone help me understand how compute pricing works.

ClaudeR
New Contributor III

Im looking at using Databricks internally for some Data Science projects. I am however very confused to how the pricing works and would like to obviously avoid high spending right now. Internal documentation and within Databricks All-Purpose Computes and Jobs Computes and SQL Compute are referenced. How when I try 'estimate cost' online are additional computes like DLT core, pro and advanced compute types. Where do I find these computes or is this something that Databricks handles internally. Much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

Pricing is always tied to compute for running a cluster. If varies depending on the cloud. On azure it's one coast and on AWS it's a cost + your cloud costs.

Core, pro, and advanced relate to additional enterprise features such as security and ml flow. You probably want premium if you're doing Data Science projects.

The most important thing you can do to avoid high costs is to turn clusters off when you aren't using them. There is an autoturnoff that you can set to turn the cluster off after 10 minutes of inactivity, the default is 120 minutes.

Jobs vs All purpose relates to interactivity. Jobs is for scheduled/automated work and all purpose is for more interactive notebook use.

DLT is for data engineering pipelines, so probably out of scope for your project.

View solution in original post

4 REPLIES 4

Anonymous
Not applicable

Pricing is always tied to compute for running a cluster. If varies depending on the cloud. On azure it's one coast and on AWS it's a cost + your cloud costs.

Core, pro, and advanced relate to additional enterprise features such as security and ml flow. You probably want premium if you're doing Data Science projects.

The most important thing you can do to avoid high costs is to turn clusters off when you aren't using them. There is an autoturnoff that you can set to turn the cluster off after 10 minutes of inactivity, the default is 120 minutes.

Jobs vs All purpose relates to interactivity. Jobs is for scheduled/automated work and all purpose is for more interactive notebook use.

DLT is for data engineering pipelines, so probably out of scope for your project.

-werners-
Esteemed Contributor III

This may not be an exact answer to your question, but when I get our Azure invoice, the cost is always split into 2 parts:

  1. the cost of the hardware/VMs etc. Basically the resources you use from the cloud provider. The cost of this is dependent on how much you use for how long (and what type of VM etc). This can be calculated using the cloud provider's cost calculator.
  2. Then there is the cost of using the Databricks Service, which is expressed in DBUs. One DBU has a certain cost, and depending on what you do with Databricks, the amount of DBUs increases or decreases.

Calculating the cost is pretty hard tbh. F.e. using Photon is clearly more expensive than the classic engine BUT it can turn out to be cheaper because your workloads are finished way faster.

PriyaAnanthram
Contributor III

I am not v clear on the question but when we provision DLT workflow you can tweak what type of cluster and the servers etc EG

{

"clusters": [

{

"label": "default",

"node_type_id": "c5.4xlarge",

"driver_node_type_id": "c5.4xlarge",

"num_workers": 20,

"spark_conf": {

"spark.databricks.io.parquet.nativeReader.enabled": "false"

},

"aws_attributes": {

"instance_profile_arn": "arn:aws:..."

}

},

{

"label": "maintenance",

"aws_attributes": {

"instance_profile_arn": "arn:aws:..."

}

}

]

}

it will then provision those servers in your cloud provider

So the total cost will be the DBU cost and the cost of your EC2 instances

Anonymous
Not applicable

Hi @Claude Repono​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.