09-07-2022 08:06 AM
Im looking at using Databricks internally for some Data Science projects. I am however very confused to how the pricing works and would like to obviously avoid high spending right now. Internal documentation and within Databricks All-Purpose Computes and Jobs Computes and SQL Compute are referenced. How when I try 'estimate cost' online are additional computes like DLT core, pro and advanced compute types. Where do I find these computes or is this something that Databricks handles internally. Much appreciated.
09-07-2022 10:47 AM
Pricing is always tied to compute for running a cluster. If varies depending on the cloud. On azure it's one coast and on AWS it's a cost + your cloud costs.
Core, pro, and advanced relate to additional enterprise features such as security and ml flow. You probably want premium if you're doing Data Science projects.
The most important thing you can do to avoid high costs is to turn clusters off when you aren't using them. There is an autoturnoff that you can set to turn the cluster off after 10 minutes of inactivity, the default is 120 minutes.
Jobs vs All purpose relates to interactivity. Jobs is for scheduled/automated work and all purpose is for more interactive notebook use.
DLT is for data engineering pipelines, so probably out of scope for your project.
09-07-2022 10:47 AM
Pricing is always tied to compute for running a cluster. If varies depending on the cloud. On azure it's one coast and on AWS it's a cost + your cloud costs.
Core, pro, and advanced relate to additional enterprise features such as security and ml flow. You probably want premium if you're doing Data Science projects.
The most important thing you can do to avoid high costs is to turn clusters off when you aren't using them. There is an autoturnoff that you can set to turn the cluster off after 10 minutes of inactivity, the default is 120 minutes.
Jobs vs All purpose relates to interactivity. Jobs is for scheduled/automated work and all purpose is for more interactive notebook use.
DLT is for data engineering pipelines, so probably out of scope for your project.
09-08-2022 03:04 AM
This may not be an exact answer to your question, but when I get our Azure invoice, the cost is always split into 2 parts:
Calculating the cost is pretty hard tbh. F.e. using Photon is clearly more expensive than the classic engine BUT it can turn out to be cheaper because your workloads are finished way faster.
09-11-2022 10:09 PM
I am not v clear on the question but when we provision DLT workflow you can tweak what type of cluster and the servers etc EG
{
"clusters": [
{
"label": "default",
"node_type_id": "c5.4xlarge",
"driver_node_type_id": "c5.4xlarge",
"num_workers": 20,
"spark_conf": {
"spark.databricks.io.parquet.nativeReader.enabled": "false"
},
"aws_attributes": {
"instance_profile_arn": "arn:aws:..."
}
},
{
"label": "maintenance",
"aws_attributes": {
"instance_profile_arn": "arn:aws:..."
}
}
]
}
it will then provision those servers in your cloud provider
So the total cost will be the DBU cost and the cost of your EC2 instances
09-23-2022 11:07 PM
Hi @Claude Repono
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
05-30-2024 07:07 AM
Hello,
I was able to get a very precise cost of Azure Databricks Clusters and Computers jobs, using the Microsoft API and Databricks API
Then I wrote a simple tool to extract and manipulate the API results and generate detailed cost reports that can be visualized or exported to Excel.
Check my tool here --> https://databrickscost.kopicloud.com
Thanks! Guillermo
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group