cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to tag/ cost track Databricks Data Profiling?

Charuvil
New Contributor III

We recently started using the Data Profiling/ Lakehouse monitoring feature from Databricks https://learn.microsoft.com/en-us/azure/databricks/data-quality-monitoring/data-profiling/. Data Profiling is using serverless compute for running the profiling job. 

Is there any way to tag this serverless compute or data profiling job (monitoring job)? Cost tracking is essential for our use case. Only way to achieve this in my understanding is by tagging the jobs, computes..etc.

I know we can get the overall cost for Data Profiling on a workspace from system tables. But in our workspace we are running multiple use cases and cost need to be tracked on use case level. Usually we achieve this using tagging but in the case of data profiling it seems impossible. 

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @Charuvil ,

To attribute serverless compute usage to users, groups, or projects, you can use serverless budget policies. When a user is assigned a serverless budget policy, their serverless usage is automatically tagged with their policy's custom tags. Serverless budget policies can be applied to serverless notebooks, jobs, pipelines, and model serving endpoints.

Attribute usage with serverless budget policies - Azure Databricks | Microsoft Learn

Use tags to attribute and track usage | Databricks on AWS

Keep in mind following limitation. If you have existing assets you need to assigned policy manually for them:

szymon_dybczak_0-1762941471833.png

 

 

Charuvil
New Contributor III

Hi @szymon_dybczak Thanks for th quick replay.
But it seems serverless budget policies cannot be applied to data profiling/ monitoring jobs. https://learn.microsoft.com/en-us/azure/databricks/data-quality-monitoring/data-profiling/

Serverless budget policies only can be applied to notebook, job, pipeline or model serving endpoint.

Reference: https://learn.microsoft.com/en-us/azure/databricks/admin/usage/budget-policies#where-to-select-the-s...