cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unexpected SKU Names in Usage Table for Job Cost Calculation

vziog
New Contributor II

I'm trying to calculate the cost of a job using the usage and list_prices system tables, but I'm encountering some unexpected behavior that I can't explain.

When I run a job using a shared cluster, the sku_name in the usage table is PREMIUM_JOBS_SERVERLESS_COMPUTE_EU_WEST for the specific job_id and job_run_id found in the usage_metadata column. I was expecting to see PREMIUM_ALL_PURPOSE_COMPUTE, or possibly NULL, since both usage_metadata.job_id and usage_metadata.job_run_id are NULL.

Additionally, when I run the job on a Job Compute cluster, I see both the expected PREMIUM_JOBS_COMPUTE SKU and the PREMIUM_JOBS_SERVERLESS_COMPUTE_EU_WEST SKU in the usage records.

Could you please help me understand why the PREMIUM_JOBS_SERVERLESS_COMPUTE_EU_WEST SKU appears in each of these scenarios?

5 REPLIES 5

Walter_C
Databricks Employee
Databricks Employee

Just for confirmation there is no task on this job that has been selected to run with Serverless right? 

mnorland
Contributor

This could happen if there was at least one run of the job when a task was set to serverless.  Alas, the toggle to enable serverless is no more.

LRALVA
Honored Contributor

HI @vziog 

When you run a job on a shared cluster, you're seeing PREMIUM_JOBS_SERVERLESS_COMPUTE_EU_WEST in the usage table because the system is tracking the serverless resources used to manage and coordinate your job, even when the actual compute happens on a shared cluster. This behavior occurs because:

  1. All Databricks jobs have an orchestration layer that runs on serverless compute
  2. This orchestration layer exists even when your job executes on a shared all-purpose cluster
  3. The job orchestration costs are tracked separately from the actual compute costs

For your second scenario, when running on Job Compute clusters, you see both SKUs because:

  • PREMIUM_JOBS_COMPUTE represents the actual cluster compute resources
  • PREMIUM_JOBS_SERVERLESS_COMPUTE_EU_WEST represents the orchestration layer

The NULL values for job_id and job_run_id in some records likely happen because those costs aren't directly tied to a specific job execution but rather to the overall orchestration system.

This dual-charging model ensures that both the orchestration overhead and the actual compute resources are properly accounted for. The serverless component is typically small compared to the cluster compute costs but is still necessary for job coordination.

To accurately calculate job costs, you should include both SKU types for a complete picture of the resources used.

LR

vziog
New Contributor II

Thank you all for your replies. @LRALVA what about @Walter_C and @mnorland mentioned about enabling serverless tasks. Is this possible and how?

LRALVA
Honored Contributor

@vziog

They're correct to be concerned about potential serverless task configuration.
The toggle to enable serverless is no more. Databricks has been transitioning away from the explicit serverless toggle in the UI

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now