cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
anindita_mahapa
Databricks Employee
Databricks Employee

In Databricks, tags—simple key/value metadata—have long been used to organize and manage resources. With Unity Catalog, we introduced a new type of governed tags was introduced, governed by Tag Policies that enable scalable governance through Attribute-Based Access Control (ABAC) and automated data classification.

Moving forward, Tag Policies will unify governed and ungoverned tags within a single framework, delivering a consistent experience for searching and discovering assets. To fully leverage this, we recommend establishing a clear tag management strategy. This allows data stewards to maintain a standardized governance taxonomy, while giving data practitioners the flexibility to tag assets for discovery and analytics. A well-designed tagging strategy fosters a common language across your organization, improving governance, usability, and collaboration.

The Evolution of Tags in Databricks: From Cost Attribution to Governance

tag_evolution.png

Tagging in Databricks has evolved from a basic tool for cost attribution to a core component of governance and discoverability of assets. Initially used to track compute resources and manage infrastructure costs, tagging expanded with Cluster Policies and Serverless Budget Policies to support more granular control. As the platform grew, tags also became essential for asset discovery. To address governance challenges, Databricks has introduced Attribute-Based Access Control (ABAC) using tags to simplify permissions management.

Looking ahead, Databricks aims to unify its tagging system, creating a consistent, scalable framework for cost tracking, discovery, and access control. This unified tagging approach will enable organizations to maintain consistent metadata across diverse assets like models, dashboards, and datasets, streamlining both operational oversight and compliance. Tag Policies at the account level will further standardize governance, reducing administrative overhead and risk. As tagging is fundamental to Databricks administration, it empowers teams to implement dynamic, scalable access controls and cost management strategies. Ultimately, this evolution reflects a broader shift toward metadata-driven operations across modern data platforms. 
First, let us establish the types of tags that exist on Unity Catalog data+AI assets.

Parent Feature

Permission Set

What does it do?

What scope is it assigned to

Regular Tag

APPLY TAG

Allows the principal to add tags to Unity Catalog securable objects.

Be the Owner of, or have APPLY TAG 

Tag Policies   

CREATE

Allows the principal to create tag policy(s)

Account

Tag Policies

MANAGE

-Allows the principal to edit tag policy 
-Allows the principal to apply tag (s) governed by tag policies
-Allows the principal to delete tag policy(s)

Individual Tag Policy or Account

Tag Policies

ASSIGN

Allows the principal to apply tag(s) governed by tag policy(s)

Individual Tag Policy or Account

Tag Policies (Governed Tags) and Regular Tags will co-exist. Tags governed by Tag policies can be leveraged to set tag standards, enforce permissions, and set tags used in ABAC.  Whereas ungoverned tags provide data + AI teams the freedom to introduce domain-specific tags that help with search/discovery across all data & AI assets. 

Tags_All_types_of_tags.png

Use Cases

An effective tag strategy helps both data practitioners and data stewards

tag_persona_usecase.png

Use Cases for Data Practitioners

Capability: Discovery, Search, Filter

Scenario: As a Portfolio Manager in domain:portfolio-management, I want to easily browse relevant datasets, dashboards, and models related to financial assets to more easily find assets that I need for my research. Users can use ungoverned tags to mark personal or team-based categorization while leveraging governed tags to set official company-wide tags such as domain:portfolio-management. Ideally, admins and stewards can mark trusted assets such as tables and dashboards as Certified to ensure these assets are more easily discoverable at a glance

How To: BROWSE is automatically set at the Catalog level for all users to allow discoverability of an asset.
Global Search allows search on all assets via name/tags/metadata using tag keys/values.
Eg. tag:tag_key     tag:tag_key:tag_value - Today UC managed tables, views, & models are discoverable by key only but later search-by-value will also be supported) 

Capability: Automation, Monitoring, Resource Organization

Scenario: As a Quant Engineer, I need to build reliable, scheduled pipelines that transform raw data into curated, tagged tables for use by ML and business teams. Tags like domain and team power observability dashboards and targeted alerts, making it easy to monitor pipeline health and trace failures to impacted dashboards and targeted alerts, making it easy to monitor pipeline health and trace failures to impacted areas. Additional attributes for data Freshness and completeness ensure datasets meet quality standards for trusted analysis.

How To: In addition to discovery tags, apply the Databricks system tag, Certified to Gold-level tables to signal they are ready for consumption. Examples of discovery tags may include - Domain: wealth management  Team1: Investment specialists  Team2: Tax Advisors etc.

Anomaly detection metadata (freshness & completeness) can be extracted from tables and displayed in dashboards. In the future, data quality rules and health indicators will provide additional automated insights.

Capability: Data Quality, Data Classification

Scenario: As an Actuary building predictive models, I need to ensure the data I use is both high quality and free from sensitive personal information. Specific Classification tags, including class.name , automatically signal detected sensitive columns, helping flag them for exclusion. This supports compliance, reduces bias, and promotes ethical model development.

How To: Apply ABAC policies that reference tags like class.name to ensure Data Scientists can safely access appropriate data, while restricting or redacting sensitive information such as columns tagged with class.name.

Capability: Attribution, Usage/Cost

Scenario: As a Chief Architect, I need visibility into how teams—like actuaries, underwriters, and investment managers—collaborate in a shared environment and which resources they consume. 

Tags on pipelines, clusters, and jobs enable cost attribution by user and team, helping optimize resource usage. Lineage and audit logs help assess trust by revealing usage patterns while Certified and Deprecated tags highlight reliable vs. outdated datasets. Together, this tagging and observability framework supports governance, accountability, and efficient platform management at scale.

How To: 

  • Use user-defined tags to track job/team attribution
  • Apply budget policies for Serverless workloads
  • Use the Certified and Deprecated system tag to signal life cycle status
  • Audit access via System Tables over time (e.g. last 90 days) before removing datasets

Use Cases for Data Stewards

Capability: Tag Management

  • Creator: Defines new tag policies with associated key & allowed values. The Creator can manage permissions associated with all tag policies.
  • Manager: Owns the lifecycle of a tag policy, including updating the values and managing permissions associated with a given tag policy. 
  • AssigneeCan apply governed tags to UC assets if granted access

To assign ungoverned tags, users must be the asset owner or have APPLY TAG permission. 

How To: Data Stewards and Admins collaboratively define tag policies to ensure consistency across the platform.

Examples:

  • Domain: retail, investment, asset
  • UseCase: fraud, risk, churn, CLV
  • sensitivity_level: public, restricted, confidential

When new domains or use cases are introduced, the Manager updates the allowed values in the policy.

Capability: Governance via ABAC + Data Classification

Scenario: When Data Classification is enabled on a catalog, Databricks automatically detects and classifies sensitive data with classification tags. When a steward or admin has applied an ABAC policy referencing one of these tags, UC automatically ensures that the data is protected by default as additional tables are ingested into this catalog. 

How To: A Data Steward enables Data Classification for a given catalog and creates an ABAC policy at the catalog level to ensure sensitive data will be filtered or masked accordingly.

CUJ of a tag on Unity Catalog objects

Cloud Providers (Eg. AWS, Azure, GCP) offer tags with coarse-grained permissions making them insufficient for fine-grained governance of Unity Catalog assets. Enterprise catalogs (Eg. Collibra, Alation) support tags but are limited to structured data. In contrast, Databricks enables tagging across most assets (Compute, Workflows, Unity Catalog securables, dashboards, etc). Unity Catalog also supports federated data sources, allowing tags to extend governance and attribution to data and workloads beyond the Databricks platform.

Define a tag - Organizations should centralize tag policies/definitions at the account level, particularly governed ones.
- LoBs can dictate additional tagging policies for Workspace assets like clusters, jobs, workflows, etc.
Note: With UC, data/AI assets span across workspaces, so due consideration should be given to the nomenclature.
Grant Tag Policy Permissions Define domain / BU / LoB specific data stewards who can create Tag Policies relevant to their specific areas and then delegate to power users.
Assign Tag to a UC Object Unity Catalog supports assigning tags, both governed (enforced by Tag Policies) and ungoverned, on securables such as catalogs, schemas, tables, views, columns, models, and volumes. Users must have the appropriate APPLY TAG and ASSIGN TAG POlICY permissions to apply tags to these objects.
Search for a tag/value: UC Explorer All users should be able to search (Global search) all objects by tag, value, and metadata. 

What’s coming

Previews

Capability Description Documentation/Onboarding
Data Classification (System Tags)

- PII detection

Beta AWS Azure GCP
Anomaly Detection

- Data Freshness
- Data Completeness

Beta AWS Azure GCP
Request For Access Allows users to discover assets and request specific access from approved stewards who can control access Private Preview. Contact your Databricks Account Team
Tag Policy

- Control which users can create/manage/assign tag policies
- Control which values can be used when assigning certain tags
- Control which tags can be referenced as attributes in ABAC 
- Certification / Deprecation System Tags

Private Preview. Contact your Databricks Account Team

Beta coming soon!

ABAC Policy ABAC enables Data Governance Administrators to define access policies once that are applied broadly across the Data Lake

Private Preview. Contact your Databricks Account Team
Beta coming soon!

Tags on AI/BI Dashboards Allows for dashboard certification, organization, and discovery. Private Preview. Contact your Databricks Account Team
UC Governance Insights Dashboards Designed to give enterprise CDOs and admin teams key insights into the health of their data estate by providing out-of-the-box dashboards based on our system tables. Private Preview. Contact your Databricks Account Team

ABAC

Attribute Based Access Control (ABAC) allows data governance administrators to define scalable access policies that are automatically enforced across the data lake. ABAC policies can be defined at the catalog, schema, or table level, and apply broadly based on tags. This allows administrators and data stewards to write one policy at the catalog level that governs access across many tables matching specific tag conditions. 

Securable_Privileges_Rules_Tags_0.png

ABAC policies work in conjunction with tags and are governed by a tag policy. Enforcement happens when someone tries to access a data asset that is tagged. All operations on the data asset are immediately captured, and available in real time in the Databricks Audit Log. The diagram below demonstrates the vision of its working:

ABAC_Example_1.png

Best Practices Summary 

1. Standardize tagging roles and responsibilities 

Collaborate with your business users, business heads, and data stewards to develop an organization-wide approach that defines who is responsible for creating and managing different parts of the tag taxonomy. Then map these responsibilities to existing Databricks roles and permissions. When applicable, workspace admins should enforce tags using compute policies and budget policies. This ensures clarity, accountability, and consistency in how tags are applied and governed across the platform.

Personas_1.png

2. Standardize the Nomenclature

Using Tag Policies, Databricks unifies how users interact with governed and ungoverned tags. To avoid confusion, establish clear naming conventions for governed tags while allowing flexibility for ungoverned tags used for discovery. 

A few suggestions:

  • Avoid using reserved system tag names (e.g., certified)
  • Use unique, descriptive tag names
  • Do not include sensitive information such as project code names or confidential data in tag keys or values to protect resource security

3. Change Management for Tags

Tag changes (create, update, delete) can have significant downstream impacts. Establish an enterprise review process, such as a governance review board, to oversee and control modifications of tags used for data governance.

Recommended Controls:

  • Do not delete tag policies if the tags they govern are referenced in ABAC rules, to avoid breaking access controls
  • Changes to tags can affect cost attribution retroactively and should be managed carefully

4. Tag Observability

Leverage Tag Policies and ABAC to manage access using tags. Using information schema and audit logs, you will be able to Monitor tag application and usage, ensuring data quality and compliance. You can also use the Governance Insights dashboard to visualize and set alerts for privileged actions such as tag deletions or modifications. 

There are several announcements in DAIS 2025, so watch out for updates as these features are made available.