โ10-11-2024 03:52 AM
Hi everybody,
I want to calculate the exact cost of single job execution. In all examples I can find on the internet it uses the tables system.billing.usage and system.billing.list_prices. It makes sense to calculate the sum of DBUs consumed and multiply it by the current price for that SKU. What confuses me is that the usage_unity used there is DBU but the prices are for DBUs used per hour. If every time window found in system.billing.usage would be 1 hour, it would all allign, but what I find are also time windows of 10 minutes. If it is like shown in the screenshot with a 10 minute time window, wouldn't I need to first divide the usage_quantity by 6 as it is only used one sixth of an hour? And then multiply that by the prices in the price list?
Also can someone explain what the single rows are? Are these different compute instances used for the job exceution?
a month ago
Hi @jreh ,
You are right about calculating the single job execution using system.billing.usage and system.billing.list_prices.
The DBU consumption reported in shorter windows (e.g., 10 minutes) is already reflective of the time window, so You dont need to divide it by 6.
So in general, what You want to do is to sum the calculation for eac job run.
sum(usage_quantity * list_prices.pricing.default)
You can find more useful examples here:
https://docs.databricks.com/en/admin/system-tables/jobs-cost.html
Regarding system.billing.usage system table, You can find out more here:
https://docs.databricks.com/en/admin/system-tables/billing.html
A single row in the system.billing.usage table represents a snapshot of DBU usage for a specific job run within time window and for specific SKU.
a month ago
Hi @radothede,
thank you for your answer.
But I just want to clarify so everything is correct:
Isn't the metric that is used for cost calculation DBU-hour? And the metric for (let's keep it simple) compute power is DBU. So yes, the DBU consumption is correct in the table but the unit for the cost is still DBU-hour. I pay a specific price if I use a cluster with that compute size for 1 hour.
Normally, all the time windows in the usage_quantity table are exactly 1 hour, except for serverless compute. Here we have 10 minutes. If I use the proposed formula without dividing it for the fraction of the used hour, my serverless jobs are 5-6 times more expensive, than running it on a regular job cluster. Which does not seem right.
In my mind that translates to the exact same way the cost calculation for electricity works. If I have a device that uses 1000W (1kW) and I use that device for 1 hour, I just used 1kWh and my price is fixed for kWh. I don't get charged for the lone reason of using a device with 1kW. I get charged for the time that device is used. But if I use that device for only 30 minutes, I used 1kW for half an hour, so 0.5kWh.
Please let me know if I'm wrong with that assumption
a month ago
@jreh which specific serverless sku are you willing to check?
Can You share the query please?
I've double-checked that, and I'm sure the usage quantity, at least in my case, reflects the time windowโso it's 10 minutes of consumption, and I do not need to divide it by 6.
I'm using that query to produce the output:
select sku_name,
usage_start_time,
usage_end_time,
usage_quantity,
usage_unit
from system.billing.usage
where sku_name like '%SERVERLESS%'
and USAGE_END_TIME - USAGE_START_TIME = INTERVAL '0 00:10:00' DAY TO SECOND
group by all;
a month ago - last edited a month ago
Actually, I've checked the table details based on one job run example, and this table looks messy...
This job was triggerd on serverless job cluster and succeded in 1 minute and 12 seconds.
In system.billing.usage I can see 3 different rows for this job run:
I think it is not possible that job used 0.159 + 0.392 DBU in 72 seconds, is it ?
Could anyone from databricks confirm how the logic actually works so we wont need to guess?
a month ago
Hi @radothede,
That's what I'm trying to get into. You used 0.159 + 0.392 = 0.551 DBU, but that is only the compute power. So technically it is just a snapshot of how much resources you used in the background. But the important information for calculating the cost is actually the time (e.g. 72 seconds) you used these DBUs for. So DBU is not a metric to calculate cost, it's DBU-hour
3 weeks ago
@radothede, I've clarified this with Databricks and my assumption was correct. The formula
sum(usage_quantity * list_prices.pricing.default)
is only right, if the time window in the usage table is 1 hour. For every window that is not 1 hour, the fraction of the used hour needs to be calculated and multiplied with the usage_quantity. So something like this might be the correct formula:
select ((unix_timestamp(usage_end_time) - unix_timestamp(usage_start_time))/3600) * usage_quantity as DBU_hours, * from system.billing.usage
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group