cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Per-table/flow DBU cost attribution within a multi-table DLT pipeline — is it possible?

toothless
New Contributor

We are building a cost monitoring dashboard and want to drill down from pipeline level cost to individual table or flow cost inside a multi table DLT pipeline.

Our setup is a serverless DLT pipeline with multiple tables including more than ten streaming tables and materialized views. Cost monitoring is based on system.billing.usage.

What we have found so far is that system.billing.usage tracks cost at dlt_pipeline_id and dlt_update_id level with no per table breakdown. The uc_table_catalog schema and name fields are null for multi table pipeline records.

The DLT event log flow_progress events include per flow metrics such as executor_time_ms num_output_rows and num_output_bytes but no DBU or cost field.

Databricks documentation for querying pipeline cost only provides pipeline level aggregation.

Standalone materialized views or streaming tables outside a pipeline do have table level information in billing but this does not apply to tables inside a multi table pipeline.

One approach we found is to use executor_time_ms from the event log for proportional cost allocation where table cost equals update cost multiplied by table executor time divided by total executor time. This is an approximation and not exact billing.

Our questions are the following. Is there a way to get actual per table or per flow DBU usage within a DLT pipeline. Is per flow DBU metering planned. Has anyone found a better approach than time based proportional allocation.

Environment Azure Databricks Serverless DLT Unity Catalog.

1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

Hi @toothless , You’ve already mapped the landscape pretty accurately, so I’ll confirm what you found and layer in a bit of context.

Short answer: there’s no clean way today to get exact per-table or per-flow DBU usage for a multi-table serverless DLT pipeline. The system.billing.usage table only surfaces cost at the pipeline and update level — it doesn’t break things down further within a multi-table pipeline.

Here’s what you actually get for DLT in billing:

  • usage_metadata.dlt_pipeline_id

  • usage_metadata.dlt_update_id

  • usage_metadata.dlt_maintenance_id

You’ll notice Unity Catalog table identifiers do show up for standalone materialized views and streaming tables, but for tables inside a multi-table pipeline, those fields come back null. That lines up with what you saw — and with the public examples, which all stop at pipeline-level attribution.

On your executor_time_ms approach

This is the right instinct, and it’s more grounded than it might feel at first glance. The proportional model you landed on — allocating DBUs based on each table’s share of total executor time — is directionally aligned with how Databricks handles similar attribution problems internally.

A good mental model here: the total is known exactly, but the internal split is inferred. Systems like predictive optimization expose estimated DBU splits under the same principle — not because they’re guessing blindly, but because there’s no native billing signal at that level of granularity. Your approach is operating in that same space: structured, explainable estimation.

If you want to experiment, num_output_bytes and num_output_rows from the event log are reasonable alternative weights. Just keep in mind they’re all heuristics — you’re choosing a proxy for “work,” not measuring cost directly.

One thing I would validate before baking this into a dashboard: look at your flow_progress events and filter down to completed flows. You want to make sure executor time isn’t being inflated by planning or idle phases. The exact phase labels can vary a bit, so it’s worth confirming what’s actually being counted in your environment.

 

Direct answers to your questions

Is there a way to get actual per-table or per-flow DBU usage within a DLT pipeline?

No — billing stops at pipeline/update granularity for multi-table pipelines today.

 

Is per-flow DBU metering planned?

I can’t speak to roadmap specifics. If this is a hard requirement, it’s worth pushing through your account team or official feedback channels so it’s tracked as a product need.

 

Has anyone found a better approach than time-based proportional allocation?

Nothing documented that’s meaningfully better. The only way to get exact attribution today is architectural: isolate heavier or business-critical flows into their own pipelines. That gives you clean, exact DBU numbers at the pipeline level without needing to estimate internally. It’s a tradeoff, but it’s the only way to move from estimated to exact.

 

Hope this helps, Louis.

 

Cheers, Louis

View solution in original post

2 REPLIES 2

Louis_Frolio
Databricks Employee
Databricks Employee

Hi @toothless , You’ve already mapped the landscape pretty accurately, so I’ll confirm what you found and layer in a bit of context.

Short answer: there’s no clean way today to get exact per-table or per-flow DBU usage for a multi-table serverless DLT pipeline. The system.billing.usage table only surfaces cost at the pipeline and update level — it doesn’t break things down further within a multi-table pipeline.

Here’s what you actually get for DLT in billing:

  • usage_metadata.dlt_pipeline_id

  • usage_metadata.dlt_update_id

  • usage_metadata.dlt_maintenance_id

You’ll notice Unity Catalog table identifiers do show up for standalone materialized views and streaming tables, but for tables inside a multi-table pipeline, those fields come back null. That lines up with what you saw — and with the public examples, which all stop at pipeline-level attribution.

On your executor_time_ms approach

This is the right instinct, and it’s more grounded than it might feel at first glance. The proportional model you landed on — allocating DBUs based on each table’s share of total executor time — is directionally aligned with how Databricks handles similar attribution problems internally.

A good mental model here: the total is known exactly, but the internal split is inferred. Systems like predictive optimization expose estimated DBU splits under the same principle — not because they’re guessing blindly, but because there’s no native billing signal at that level of granularity. Your approach is operating in that same space: structured, explainable estimation.

If you want to experiment, num_output_bytes and num_output_rows from the event log are reasonable alternative weights. Just keep in mind they’re all heuristics — you’re choosing a proxy for “work,” not measuring cost directly.

One thing I would validate before baking this into a dashboard: look at your flow_progress events and filter down to completed flows. You want to make sure executor time isn’t being inflated by planning or idle phases. The exact phase labels can vary a bit, so it’s worth confirming what’s actually being counted in your environment.

 

Direct answers to your questions

Is there a way to get actual per-table or per-flow DBU usage within a DLT pipeline?

No — billing stops at pipeline/update granularity for multi-table pipelines today.

 

Is per-flow DBU metering planned?

I can’t speak to roadmap specifics. If this is a hard requirement, it’s worth pushing through your account team or official feedback channels so it’s tracked as a product need.

 

Has anyone found a better approach than time-based proportional allocation?

Nothing documented that’s meaningfully better. The only way to get exact attribution today is architectural: isolate heavier or business-critical flows into their own pipelines. That gives you clean, exact DBU numbers at the pipeline level without needing to estimate internally. It’s a tradeoff, but it’s the only way to move from estimated to exact.

 

Hope this helps, Louis.

 

Cheers, Louis

toothless
New Contributor

Hello @Louis_Frolio ,

 

Thank you for your detailed response and validating my approach. This comment has been very insightful. I highly appreciate it. I believe I know what I need for now. 

 

Regards,

Reema