cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Tags Field Doesn't propagate in Delta Share

souravroy1990
New Contributor II

Hi,

My current work requires it to add tags in Databricks tables & views. I see that there is a tag column associated to tables and views to which we can set the TAG using SET TAG command.

My requirement is that once we are creating delta shares out of these tables and views the tags should be visible to the consumers. Currently Databricks doesn’t propagate this information in delta share and as I’m exploring this the reason for not propagating this is this is considered as metadata.

My question is, if we have a requirement of propagating the column level tags to the views associated to a delta share what is the mechanism in Databricks which would enable this?

Databricks community suggests using column comments field which propagates in share, but we are using the comment field for column description. Since this is a string field it would be difficult for the receiver to parse this field to exact 2 different information (column description and metadata), so we want to avoid using comment for metadata.

So is there any alternative to share tags in the delta share to the recipient of the share?

3 REPLIES 3

balajij8
Contributor

You can use comments for it. Its easy to parse this field

COMMENT '{"des": "key", "meta": {"UPI": "true"}}'

pradeep_singh
Contributor

The catalog that you are sharing should have information_schema.table_tags and information_schema.column_tags tables . You can share these two tables as well if you are not sharing it today . 

You can then ask your share recipient to join these tables to the tables you have shared to get details on the tags for each table and column . You can also get creative with using the tags from table_tags and column_tags to define your comments value when creating share and see if that help propogate the tags.

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @souravroy1990,

You are correct that Unity Catalog tags (set via ALTER TABLE SET TAGS or ALTER TABLE ALTER COLUMN SET TAGS) are not propagated to recipients through Delta Sharing. Tags are treated as Unity Catalog governance metadata that lives in the provider's metastore and is not included in the share payload. What IS included in Databricks-to-Databricks shares (as of mid-2024) are table comments, column comments, and primary key constraints.

Here are a few approaches to get tag-like information to your share recipients:


OPTION 1: MATERIALIZE TAG DATA INTO A SHARED DELTA TABLE

This is the most robust approach. You can query the information_schema views for tags in your provider catalog and write the results into a regular Delta table, then include that Delta table in your share.

On the provider side, create and populate the tag table:

CREATE TABLE my_catalog.my_schema.shared_tag_metadata AS
SELECT * FROM my_catalog.information_schema.column_tags
WHERE table_name IN ('table1', 'table2');

You can also combine table-level and column-level tags:

CREATE OR REPLACE TABLE my_catalog.my_schema.shared_tag_metadata AS
SELECT
catalog_name, schema_name, table_name,
NULL AS column_name,
tag_name, tag_value,
'TABLE' AS tag_level
FROM my_catalog.information_schema.table_tags
WHERE table_name IN ('table1', 'table2')
UNION ALL
SELECT
catalog_name, schema_name, table_name,
column_name,
tag_name, tag_value,
'COLUMN' AS tag_level
FROM my_catalog.information_schema.column_tags
WHERE table_name IN ('table1', 'table2');

Then add it to your share:

ALTER SHARE my_share ADD TABLE my_catalog.my_schema.shared_tag_metadata;

To keep it current, schedule a job that periodically refreshes this table (e.g., with CREATE OR REPLACE TABLE or a MERGE). Recipients can then join this metadata table with the shared data tables on catalog_name, schema_name, table_name, and column_name to look up the tags.

Note: you cannot directly share information_schema views through Delta Sharing because shareable views must be defined on Delta tables, other shareable views, or materialized views. That is why materializing into a Delta table is necessary.


OPTION 2: STRUCTURED JSON IN COLUMN COMMENTS

As you mentioned, column comments do propagate through Delta Sharing. If you want to keep both the description and tag metadata in the comment field, you can use a structured JSON format to make parsing straightforward for recipients:

ALTER TABLE my_catalog.my_schema.my_table
ALTER COLUMN my_column
COMMENT '{"description": "Customer unique identifier", "tags": {"pii": "true", "classification": "confidential"}}';

Recipients can then parse the JSON comment on their side:

SELECT
column_name,
get_json_object(comment, '$.description') AS description,
get_json_object(comment, '$.tags.pii') AS pii_tag,
get_json_object(comment, '$.tags.classification') AS classification_tag
FROM recipient_catalog.information_schema.columns
WHERE table_name = 'my_table';

This keeps everything in one field but with a clear, parseable structure. The downside is that it requires coordination with your recipients on the JSON schema.


OPTION 3: SHARE A VIEW THAT EMBEDS TAGS AS COLUMNS

If your use case is focused on a small number of well-known tags, you could create a view that adds tag values as extra columns, then share the view:

CREATE VIEW my_catalog.my_schema.my_table_with_tags AS
SELECT
*,
'true' AS tag_pii,
'confidential' AS tag_classification
FROM my_catalog.my_schema.my_table;

ALTER SHARE my_share ADD VIEW my_catalog.my_schema.my_table_with_tags;

This is simple but does not scale well if you have many tags or if tags change frequently.


RECOMMENDATION

Option 1 (materialized tag metadata table) is the most flexible and scalable approach. It keeps your actual column comments free for descriptions, gives recipients a clean relational way to look up tags, and can be automated with a scheduled job. This is similar to what @pradeep_singh suggested, and the key detail is that you need to materialize the information_schema data into a Delta table first since information_schema views themselves cannot be shared directly.

For reference:
https://docs.databricks.com/en/data-sharing/create-share.html
https://docs.databricks.com/en/data-governance/unity-catalog/tags.html
https://docs.databricks.com/en/sql/language-manual/information-schema/column_tags.html
https://docs.databricks.com/en/sql/language-manual/information-schema/table_tags.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.