Databricks Community

gauravmahajan · ‎03-10-2025

Hello Team

We are seeking cost information as we have noticed fluctuations in the daily costs for the "SQL Analytics DBU." We would like to understand the reasons behind the daily cost differences, even though the workload remains consistent.

trying to understand how the "Premium SQL Analytics DBU" factor comes into play when running our workflow from Job Compute. We are using Spark.sql to run our code in our notebooks with Job Compute

Nivethan_Venkat · ‎03-10-2025

Hi @gauravmahajan,

Most of the cost / DBU used can be retrieved from System tables across your different workspaces in a databricks account. Details related to job compute types and it's associated cost can be fetched from the queries mentioned in the following documentation: https://docs.databricks.com/aws/en/admin/system-tables/jobs-cost

Please let us know, if the above link finds helpful.

Best Regards,
Nivethan V

gauravmahajan · ‎03-10-2025

@Nivethan_Venkat Thanks for quick reply.

I am looking for specific information. If we are running job compute to run our pipeline with workflow wherein sql queries are running with spark.sql. does cost will incur under SQL Analytics DBU? is it default behaviour of databricks. if yes then how can we run jobs with job compute with medallion architecture

Nivethan_Venkat · ‎03-10-2025

Hi @gauravmahajan,

The cost will be associated with the respective compute type you are choosing when running in Databricks workflows, and it doesn't incur separate SQL Analytics DBU in your cost if your code is wrapped with spark.sql("select * from table").

For better choosing the cluster configuration required for your workload depending upon the data volume and transformations involved, you can refer the following documentation to know more about different compute type options: https://docs.databricks.com/aws/en/compute/cluster-config-best-practices

Additionally, you can leverage the below query to find under which SKU your jobs are tagged when deploying the jobs in Databricks workflow:

%sql
with job_run_timeline_with_cost as (
  SELECT
    t1.*,
    t1.usage_metadata.job_id as job_id,
    t1.identity_metadata.run_as as run_as,
    t1.usage_quantity * list_prices.pricing.default AS list_cost
  FROM system.billing.usage t1
    INNER JOIN system.billing.list_prices list_prices
      ON
        t1.cloud = list_prices.cloud AND
        t1.sku_name = list_prices.sku_name AND
        t1.usage_start_time >= list_prices.price_start_time AND
        (t1.usage_end_time <= list_prices.price_end_time or list_prices.price_end_time is NULL)
  WHERE
    t1.billing_origin_product = 'JOBS'
),
most_recent_jobs as (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY workspace_id, job_id ORDER BY change_time DESC) as rn
  FROM
    system.lakeflow.jobs QUALIFY rn=1
)
SELECT
    t2.name
    ,t1.workspace_id
    ,t1.job_id
    ,t1.sku_name
    ,t1.run_as
    ,t1.spend
FROM
  (
    SELECT
      workspace_id,
      job_id,
      run_as,
      sku_name,
      SUM(list_cost) AS spend
    FROM job_run_timeline_with_cost
    GROUP BY ALL
  ) t1
  LEFT JOIN most_recent_jobs t2 USING (workspace_id, job_id)