Databricks Community

stefan_erste · ‎09-08-2025

Hi all,

In order to achieve data stability in our Workspace, our IT team has given us access to the data through VIEWs on top of an ingestion schema.

Now I want to provide metadata to VIEWs in form of TAGs (IT does not want to cover this used case). They are stored in a table and I planned to do a loop and dynamically create SQL statements. I have confirmed the SQL syntax as

SET TAG ON COLUMN

catalog_name.schema_name.table_name.colum_name `tag_key` = `tag_value`;

by running it in the SQL editor.

However, it does not work in a notebook using pyspark, but rather returns the following error:

"[INVALID_PROPERTY_KEY] "TAG ON COLUMN prod_aut_crp00_lab_catalog.access_uc045_crisp_db.prod_parsed_crisp_crrp_vw_fct_deal_sas.CD_AGG_UAMRT_MDF_EST_LCOA `reporting_field_name`" is an invalid property key, please use quotes, e.g. SET "TAG ON COLUMN prod_aut_crp00_lab_catalog.access_uc045_crisp_db.prod_parsed_crisp_crrp_vw_fct_deal_sas.CD_AGG_UAMRT_MDF_EST_LCOA `reporting_field_name`"="DCO - Aggregated Unamortized Contractual Modifications and Changes in Estimates Local GAAP LCOA". SQLSTATE: 42602".

I tried adjusting per suggestion but again it fails saying that quotes after set are incorrect syntax (which is logical).

Any help is appreciated!

Stefan

stefan_erste · ‎09-09-2025

Hi @szymon_dybczak and @WiliamRosa,

I have used backticks from the very start (you'll see it if you re-check my original post).

It is definitely a cluster issue as I am able to assign tags using serverless cluster. The reason I was using a dedicated one is because I was pulling data from an excel on a sharepoint location, and this is not working with serverless.

So in the end I will make a two-step job. First with dedicated cluster that will store all the tags into a table and the second one with serverless that will do the assignment.

Thank you both for the time and effort!

KR,
Stefan

View solution in original post

szymon_dybczak · ‎09-08-2025

Hi @stefan_erste ,

I'm not right now in front of my PC to check this but maybe try to use below syntax:

ALTER VIEW catalog.schema.view_name ALTER COLUMN column_name SET TAGS (key' = 'value)

stefan_erste · ‎09-08-2025

Hi @szymon_dybczak,

unfortunately IT won't let us have alter rights so not to accidentally drop them. All they have given is the SET TAG right.

Stefan

szymon_dybczak · ‎09-08-2025

Thanks for the info. When I come back to home I'll try to find a workaround using SET Tag command

szymon_dybczak · ‎09-08-2025

Hi @stefan_erste ,

Ok, so in my case I was able to set tag in pyspark using following way:

spark.sql(f"""
    SET TAG ON COLUMN default.employees.Id test_tag = test_value
""")

Of course you can do it dynamically using loop and f-string:

tags = [
    "my_tag3 = my_tag_value",
    "my_tag4 = my_tag_other_value"
]
for tag in tags:
    query_template = f"SET TAG ON COLUMN default.employees.Id {tag}"
    #print(query_template)
    spark.sql(query_template)

stefan_erste · ‎09-08-2025

Hi @szymon_dybczak

I tried to execute your example, but it failed with this error:

"[INVALID_SET_SYNTAX] Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, or include semicolon in value, please use backquotes, e.g., SET `key`=`value`. SQLSTATE: 42000".

Could the cluster setup be an issue? Below is the extract of the cluster's JSON:

    "spark_version": "15.4.x-scala2.12",
    "spark_conf": {
        "spark.databricks.delta.preview.enabled": "true",
        "spark.databricks.conda.condaMagic.enabled": "true",
        "spark.master": "local[*, 4]",
        "spark.databricks.driver.disableDefaultMavenCentralResolver": "true",
        "spark.databricks.driver.disableSparkPackagesResolver": "true",
        "spark.databricks.cluster.profile": "singleNode"
    },
    "node_type_id": "Standard_DS3_v2",
    "driver_node_type_id": "Standard_DS3_v2"

Thanks,

Stefan

szymon_dybczak · ‎09-08-2025

Hi @stefan_erste ,

I'll check with the exact same cluster setting tomorrow's morning.

In the meantime - do you use some special characters in your tag's key or value?

Because if you have unusual tags like below - in this case I have dollar sign in my key and space in my tag value I get the same error:

To mitigate that you need to escape using backtick, like below:

tags = [
    "`my_tag$` = `my tag value`"    
]
for tag in tags:
    query_template = f"SET TAG ON COLUMN default.employees.Id {tag}"
    #print(query_template)
    spark.sql(query_template)

With backticks it works as expected and I can use non standard characters to define tag's key and value:

And as a prove that it worked, here an output from information_schema.column_tags:

stefan_erste · ‎09-08-2025

Hi @szymon_dybczak

The loop fails on the first SET TAG, so I just printed it out. Here it is

SET TAG ON COLUMN prod_aut_crp00_lab_catalog.crisp_db.prod_parsed_crisp_crrp_vw_fct_cust.DCO_AMG_AGG_INCOME reporting_field_name = DCO_Aggregated_Income

Thanks,

Stefan

WiliamRosa · ‎09-08-2025

Hi @stefan_erste, I ran into the same situation and managed to get it working from a notebook. The main issue isn’t really the cluster but the way the SQL parser handles the command. When you’re inside PySpark you need to fully qualify the column path with <catalog>.<schema>.<view>.<column>, and also make sure to put backticks around both the tag key and the value, otherwise it throws that syntax error you saw.

For example, what worked for me was something like:

spark.sql("""
SET TAG ON COLUMN workspace.wiliam_lab.vw_sales.customerID
`reporting_field_name` = `Customer Identifier Test`
""")

Once I used this format, the tag showed up in information_schema.column_tags and in the UI as expected. If you plan to loop over a mapping table and apply multiple tags dynamically, the same pattern works fine with f-strings. The only catch is that if you run it more than once with the same key/value, Unity Catalog complains with a duplicate assignment error. To get around that, I either skip when the tag already exists with the same value, or unset it first and then set it again if I need to update the value.

After doing this, I was able to tag my views programmatically without issues. So as long as you backtick the key and value and fully qualify the column name, you should be good to go.

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

szymon_dybczak · ‎09-08-2025

I've already suggested that and even provide code with for loop, backticks and f-string...

Re: Programmatically setting TAGs on VIEWs - Databricks Community - 131240

And @stefan_erste already using fully qualified name. I think he's only missing backticks

stefan_erste · ‎09-09-2025

Hi @szymon_dybczak and @WiliamRosa,

I have used backticks from the very start (you'll see it if you re-check my original post).

It is definitely a cluster issue as I am able to assign tags using serverless cluster. The reason I was using a dedicated one is because I was pulling data from an excel on a sharepoint location, and this is not working with serverless.

So in the end I will make a two-step job. First with dedicated cluster that will store all the tags into a table and the second one with serverless that will do the assignment.

Thank you both for the time and effort!

KR,
Stefan