โ06-08-2022 04:16 AM
Databricks uses DBU's as a costing unit whether based onto of AWS/Azure/GCP and I want to know if Databricks has a google cloud Big Query equivalent of --dry_run for estimating costs? https://cloud.google.com/bigquery/docs/estimate-costs
โ06-09-2022 03:35 AM
Not that I know of.
Google uses number of bytes read to determine the cost.
Databricks uses DBU. The number of DBU's spent is not only dependent on the amount of bytes read (the more you read, the longer the program will run probably), but also the type of VM used.
Then there is also autoscaling which makes it harder to predict a price.
Also the total cost is not only DBU but also the provisioning cost of the VMs.
So that makes it pretty hard to predict a cost.
It would of course be very cool to have such a prediction.
โ
โ06-09-2022 03:49 AM
Hi @Werner Stinckensโ thank you for taking the time to reply and for the thoughtful response. I find it hard to believe that so many companies are using the type of compute when the price is hard to know. I understand there is some ambiguity with the bytes read and cluster type, do you know of a way to give a rough estimate?
โ06-09-2022 04:00 AM
Databricks does give you a view on how many DBUs/hour a cluster consumes (from-to interval in case of autoscaling), see the cluster pane for this.
With that and a duration of the job, you can make an estimate. But the duration... for that you need to run the program (perhaps on small data and extrapolate).
This is a pretty rough estimate though. Maybe others have succeeded in doing this.
โ06-13-2022 02:26 AM
Hi @zach welshmanโ โ, We havenโt heard from you on the last response from @Werner Stinckensโ, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.
โ06-17-2022 03:45 AM
Hi Kaniz, unfortunately there are no answers in the thread. It would be good to get a steer from someone at Databricks if possible.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group