Greetings @pargit ,
Why Your Approach Isn't Working
Cluster usage tags cannot be dynamically modified at runtime from within a notebook. The `spark.databricks.clusterUsageTags.` configurations are read-only properties set when the cluster is created or configured, and `spark.conf.set()` cannot modify them during execution.
When you use `spark.conf.get("spark.databricks.clusterUsageTags.clusterAllTags")`, you can read the current tags, but attempting to set them with `spark.conf.set()` has no effect because these are cluster-level configurations that are immutable once the cluster is running.
Alternative Solutions for Cost Tracking
Use Job Clusters with Different Tags
Instead of a single all-purpose cluster, create job-specific clusters where each job can have custom tags for `project` and `department`. This allows granular cost attribution through the Clusters API when defining job configurations.
Serverless Budget Policies
If you're using serverless compute (Public Preview), you can use serverless budget policies to automatically tag usage at the user or group level. When users are assigned different policies, their usage is automatically tagged with the policy's custom tags.
API-Based Cluster Management
Programmatically update cluster tags using the Databricks Clusters API before running workloads:
- Call the API to update cluster configuration with new tags
- Restart the cluster (if needed)
- Run your notebook with the updated tags
System Tables for Cost Analysis
Use the `system.billing.usage` table to track costs. While you can't change cluster tags dynamically, you can add metadata tracking within your notebooks (logging project/department info to a table) and join this with billing data for cost attribution.
The fundamental limitation is that cluster tags are designed to be set at the infrastructure level, not modified during runtime execution.
Hope this helps, Louis.