How to get cost per job which runs on ALL_PURPOSE_COMPUTE ??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2024 04:09 AM
with system.billing.usage table i could get cost per jobs which are runs on JOB_COMPUTE but not for jobs which runs on ALL_PURPOSE_COMPUTE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2024 08:21 PM
Hi Kumar,
How are you? As per my understanding, please consider checking if your jobs running on ALL_PURPOSE_COMPUTE are being tracked properly in the system.billing.usage table. For ALL_PURPOSE_COMPUTE workloads, billing can sometimes be aggregated under interactive clusters, and the costs might not be attributed directly to specific jobs, making it harder to get a job-specific breakdown. You might want to cross-reference cluster usage with job runs using the cluster usage metrics or cluster events logs. This will help you map costs from ALL_PURPOSE_COMPUTE clusters to the jobs they are supporting. Alternatively, you can explore Databricks' cost management tools or integrate with external billing tools to get a more granular view of job-level costs on these compute types.
Give a try and let me know.
Regards,
Brahma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2024 10:08 PM
If nowhere DBU is captured for jobs under ALL_PURPOSE_COMPUTE then cost breakdown-based cluster events is very difficult as more than 2 jobs can parallel. So mapping is very difficult to break down cost for specific job.
let me know if I am missing anything.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2024 05:44 AM
You’re right @KUMAR__111—tracking costs for jobs on ALL_PURPOSE_COMPUTE clusters can be tricky since DBU usage isn’t directly tied to specific jobs. When multiple jobs run in parallel on the same cluster, it’s challenging to allocate costs accurately. Consider using cluster tags to label clusters by job, which can help with grouping costs even when jobs share clusters. Running job-specific clusters for key workloads could provide clearer cost attribution. You could also cross-reference job logs with cluster usage metrics, though this can be manual. Leveraging the Databricks REST API can help gather more detailed metrics to better estimate costs per job.
Just a thought. Give a try and let me know.
Regards,
Brahma

