Root Cause / Why executionTimeMs isnโt ideal
executionTimeMs includes everything the job did:
Waiting for resources
Shuffle, GC, or network latency
Contention with other concurrent jobs
Using this to allocate costs can misattribute costs, especially if some tables were idle or blocked while others were actively processing.
So executionTime is noisy for cost attribution โ it doesnโt reflect actual data volume processed or work done.
Solution thinking:
Calculate cost per unit of metric:
cost_per_MB = total_job_cost / sum(numTargetBytesAdded for all tables)
Attribute per-domain cost:
cost_per_domain = sum(cost_per_MB * numTargetBytesAdded_for_tables_in_domain)
Optional refinement:
If table sizes vary widely in row size, numTargetBytesAdded is more accurate.
If row sizes are uniform, numOutputRows is simpler.
You could also combine metrics (weighted by output bytes + output rows) for a hybrid approach.