cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Tracing SQL costs

simenheg
Visitor

Hello, Databricks community!

In our Account Usage Dashboard, the biggest portion of our costs are labeled simply "SQL".

We want to drill deeper to see where the SQL costs are coming from.

By querying the `system.usage.billing` table we see that it's mostly serverless SQL queries driving the costs. However almost every field in both `product_features`, `usage_metadata`, and `identity_metadata` (even `run_as`) are `null`.

Is this an error or intended behavior? If intended, how can we attribute used DBU quantities to specific queries or users to find out what is driving our costs?

Best regards, Simen

3 REPLIES 3

Raman_Unifeye
Contributor III

@simenheg - first of all, It’s not an error as Serverls SQL often produces null metadata fields.

So you will need to follow below steps for the cost

  • Use SQL Warehouse Query History
  • join billing data with SQL query history - system.billing.usage.usage_date, system.billing.usage.workspace_id, system.billing.usage.product = "SQL Serverless" - You can correlate with system.query.history (or the UI equivalent)
  • Now you use SQL → Warehouses → (select warehouse) → Monitoring → Cost breakdown

Above is a manual way to comprehend the cost.

You could always do tagging and bundle it up with predefined query based on your needs

Offical doc link - https://docs.databricks.com/aws/en/admin/system-tables/serverless-billing?utm_source=copilot.com

 


RG #Driving Business Outcomes with Data Intelligence

Thanks for your reply, Raman!

I have a couple of follow up questions.

I looked into joining `system.billing.usage` and `system.query.history` as you suggested, but ran into a problem: There is no column in our `system.billing.usage` table named `product`. Also, which column(s) should the two tables be joined on?

From the official documentation you linked to, the following is stated:

The identity_metadata column includes the run_as field, which shows the user or service principal whose credentials were used to run the workload.

However in our case, the `run_as` field is almost always `null`, contrary to what the documentation says, so we don't know which user of service principal ran the query.

you are right, no column named product. instead you need to use sku_name containing 'SQL Serverless' OR product_features containing 'serverless'.

To join those 2 tables, you are right, no obvious single key available.

instead use time windows and warehouse-ID and join as below.

Billing table Query history table Remarks

workspace_idworkspace_idEnsure same workspace
sku_name(filter to SQL queries)Ensure SQL Serverless
usage_start_time / usage_end_timestart_time / end_timeTime‑window correlation
product_features.warehouse_id (if present)warehouse_idMatch warehouse

RG #Driving Business Outcomes with Data Intelligence