cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Does Databricks have a google cloud Big Query equivalent of --dry_run to estimate costs before executing?

zach
New Contributor III

Databricks uses DBU's as a costing unit whether based onto of AWS/Azure/GCP and I want to know if Databricks has a google cloud Big Query equivalent of --dry_run for estimating costs? https://cloud.google.com/bigquery/docs/estimate-costs

5 REPLIES 5

-werners-
Esteemed Contributor III

Not that I know of.

Google uses number of bytes read to determine the cost.

Databricks uses DBU. The number of DBU's spent is not only dependent on the amount of bytes read (the more you read, the longer the program will run probably), but also the type of VM used.

Then there is also autoscaling which makes it harder to predict a price.

Also the total cost is not only DBU but also the provisioning cost of the VMs.

So that makes it pretty hard to predict a cost.

It would of course be very cool to have such a prediction.

zach
New Contributor III

Hi @Werner Stinckens​ thank you for taking the time to reply and for the thoughtful response. I find it hard to believe that so many companies are using the type of compute when the price is hard to know. I understand there is some ambiguity with the bytes read and cluster type, do you know of a way to give a rough estimate?

-werners-
Esteemed Contributor III

Databricks does give you a view on how many DBUs/hour a cluster consumes (from-to interval in case of autoscaling), see the cluster pane for this.

With that and a duration of the job, you can make an estimate. But the duration... for that you need to run the program (perhaps on small data and extrapolate).

This is a pretty rough estimate though. Maybe others have succeeded in doing this.

Kaniz
Community Manager
Community Manager

Hi @zach welshman​ ​, We haven’t heard from you on the last response from @Werner Stinckens​, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

zach
New Contributor III

Hi Kaniz, unfortunately there are no answers in the thread. It would be good to get a steer from someone at Databricks if possible.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.