Does Databricks have a google cloud Big Query equivalent of --dry_run to estimate costs before executing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2022 04:16 AM
Databricks uses DBU's as a costing unit whether based onto of AWS/Azure/GCP and I want to know if Databricks has a google cloud Big Query equivalent of --dry_run for estimating costs? https://cloud.google.com/bigquery/docs/estimate-costs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2022 03:35 AM
Not that I know of.
Google uses number of bytes read to determine the cost.
Databricks uses DBU. The number of DBU's spent is not only dependent on the amount of bytes read (the more you read, the longer the program will run probably), but also the type of VM used.
Then there is also autoscaling which makes it harder to predict a price.
Also the total cost is not only DBU but also the provisioning cost of the VMs.
So that makes it pretty hard to predict a cost.
It would of course be very cool to have such a prediction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2022 03:49 AM
Hi @Werner Stinckens thank you for taking the time to reply and for the thoughtful response. I find it hard to believe that so many companies are using the type of compute when the price is hard to know. I understand there is some ambiguity with the bytes read and cluster type, do you know of a way to give a rough estimate?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2022 04:00 AM
Databricks does give you a view on how many DBUs/hour a cluster consumes (from-to interval in case of autoscaling), see the cluster pane for this.
With that and a duration of the job, you can make an estimate. But the duration... for that you need to run the program (perhaps on small data and extrapolate).
This is a pretty rough estimate though. Maybe others have succeeded in doing this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2022 03:45 AM
Hi Kaniz, unfortunately there are no answers in the thread. It would be good to get a steer from someone at Databricks if possible.

