cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Log notebook activities

wellington
New Contributor III

Hi friends;

I'm working on a project where we are 4 programmers. We are working in a single environment, using only the "Workspaces" folder. Each has its own user, which is managed by Azure AD.
We had a peak in consumption on the 5th Feb. So I can see on the Azure portal that it was Databricks that consumed an exorbitant amount of budget. But, I can't know who ran the notebook, what the notebook was or even what code was there? Can anyone help?

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @wellington, To trace the high consumption in your Azure Databricks environment, you can use tags to attribute usage and accurately identify the culprits.

Here’s how you can proceed:

  1. Tagging Workspaces, Clusters, and Pools:

    • Azure Databricks allows you to tag workspaces (resource groups), clusters, and pools. These tags propagate to detailed cost analysis reports accessible in the Azure portal.
    • You can associate consumption with specific business units or teams by tagging your resources, aiding in chargebacks and accountability.
  2. Default Tags Applied by Azure Databricks:

    • When you create clusters or pools, Azure Databricks automatically assigns default tags:
      • For clusters, tags include:
        • ClusterId: The internal ID of the cluster.
        • ClusterName: The name of the cluster.
        • Creator: The username (email address) of the user who created the cluster.
        • Additionally, on job clusters, tags include RunName, JobId, and other relevant details.
      • For pools, tags include:
        • DatabricksInstancePoolId: The internal ID of the pool.
        • DatabricksInstancePoolCreatorId: The internal ID of the user who created the pool.
        • The constant tag Vendor with the value “Databricks.”
  3. Tag Propagation and Cost Analysis:

    • Workspace and pool tags aggregate and are assigned as resource tags to the Azure VMs hosting the pools.
    • Workspace and cluster tags aggregate and are assigned as resource tags to the Azure VMs hosting the clusters.
    • These tags enable precise cost attribution and help you identify which user or team caused the peak consumption.
  4. Accessing Cost Analysis Reports:

    • In the Azure portal, navigate the cost analysis section to view detailed reports based on the tags.
    • You’ll find insights into consumption by cluster, pool, and workspace, allowing you to pinpoint the resource responsible for the budget spike.

Remember to avoid customizing the Name tag for clusters, as it may interfere with tracking by Azure Databricks. With proper tagging, you’ll gain visibility into usage patterns and be better equipped to manage cos....

Happy debugging, and may your budget stay intact! 🚀

 

wellington
New Contributor III

Hi @Kaniz_Fatma , thanks for your quick answer.
There is no other way to monitor notebook runs. I ask this because adding tags to the cluster and workspace does not solve my problem, considering that everyone uses the same cluster and the same workspace.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!