cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Programmatically setting TAGs on VIEWs

stefan_erste
New Contributor

Hi all,

In order to achieve data stability in our Workspace, our IT team has given us access to the data through VIEWs on top of an ingestion schema.

Now I want to provide metadata to VIEWs in form of TAGs (IT does not want to cover this used case). They are stored in a table and I planned to do a loop and dynamically create SQL statements. I have confirmed the SQL syntax as

SET TAG ON COLUMN
catalog_name.schema_name.table_name.colum_name `tag_key` = `tag_value`;
by running it in the SQL editor.
 
However, it does not work in a notebook using pyspark, but rather returns the following error:
"[INVALID_PROPERTY_KEY] "TAG ON COLUMN prod_aut_crp00_lab_catalog.access_uc045_crisp_db.prod_parsed_crisp_crrp_vw_fct_deal_sas.CD_AGG_UAMRT_MDF_EST_LCOA `reporting_field_name`" is an invalid property key, please use quotes, e.g. SET "TAG ON COLUMN prod_aut_crp00_lab_catalog.access_uc045_crisp_db.prod_parsed_crisp_crrp_vw_fct_deal_sas.CD_AGG_UAMRT_MDF_EST_LCOA `reporting_field_name`"="DCO - Aggregated Unamortized Contractual Modifications and Changes in Estimates Local GAAP LCOA". SQLSTATE: 42602".
 
I tried adjusting per suggestion but again it fails saying that quotes after set are incorrect syntax (which is logical).
 
Any help is appreciated!
Stefan
1 ACCEPTED SOLUTION

Accepted Solutions

stefan_erste
New Contributor

Hi @szymon_dybczak and @WiliamRosa,

I have used backticks from the very start (you'll see it if you re-check my original post).

It is definitely a cluster issue as I am able to assign tags using serverless cluster. The reason I was using a dedicated one is because I was pulling data from an excel on a sharepoint location, and this is not working with serverless. 

So in the end I will make a two-step job. First with dedicated cluster that will store all the tags into a table and the second one with serverless that will do the assignment.

Thank you both for the time and effort!

KR,
Stefan

View solution in original post

11 REPLIES 11

szymon_dybczak
Esteemed Contributor III

Hi @stefan_erste ,

I'm not right now in front of my PC to check this but maybe try to use below syntax: 

ALTER VIEW catalog.schema.view_name ALTER COLUMN column_name SET TAGS (key' = 'value)

Hi @szymon_dybczak,

unfortunately IT won't let us have alter rights so not to accidentally drop them. All they have given is the SET TAG right.

Stefan

szymon_dybczak
Esteemed Contributor III

Thanks for the info. When I come back to home I'll try to find a workaround using SET Tag command

szymon_dybczak
Esteemed Contributor III

Hi @stefan_erste ,

Ok, so in my case I was able to set tag in pyspark using following way:

spark.sql(f"""
    SET TAG ON COLUMN default.employees.Id test_tag = test_value
""")

Of course you can do it dynamically using loop and f-string:

tags = [
    "my_tag3 = my_tag_value",
    "my_tag4 = my_tag_other_value"
]
for tag in tags:
    query_template = f"SET TAG ON COLUMN default.employees.Id {tag}"
    #print(query_template)
    spark.sql(query_template)

 

Hi @szymon_dybczak 

I tried to execute your example, but it failed with this error:

"[INVALID_SET_SYNTAX] Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, or include semicolon in value, please use backquotes, e.g., SET `key`=`value`. SQLSTATE: 42000".

Could the cluster setup be an issue? Below is the extract of the cluster's JSON:

    "spark_version": "15.4.x-scala2.12",
    "spark_conf": {
        "spark.databricks.delta.preview.enabled": "true",
        "spark.databricks.conda.condaMagic.enabled": "true",
        "spark.master": "local[*, 4]",
        "spark.databricks.driver.disableDefaultMavenCentralResolver": "true",
        "spark.databricks.driver.disableSparkPackagesResolver": "true",
        "spark.databricks.cluster.profile": "singleNode"
    },
    "node_type_id": "Standard_DS3_v2",
    "driver_node_type_id": "Standard_DS3_v2"

Thanks,

Stefan 

szymon_dybczak
Esteemed Contributor III

Hi @stefan_erste ,

I'll check with the exact same cluster setting tomorrow's morning. 

In the meantime - do you use some special characters in your tag's key or value?

Because if you have unusual tags like below - in this case I have dollar sign in my key and space in my tag value I get the same error:

szymon_dybczak_0-1757361181180.png

To mitigate that you need to escape using backtick, like below:

tags = [
    "`my_tag$` = `my tag value`"    
]
for tag in tags:
    query_template = f"SET TAG ON COLUMN default.employees.Id {tag}"
    #print(query_template)
    spark.sql(query_template)

With backticks it works as expected and I can use non standard characters to define tag's key and value:

szymon_dybczak_1-1757361293178.png

And as a prove that it worked, here an output from information_schema.column_tags:

szymon_dybczak_2-1757361369063.png

 

stefan_erste
New Contributor

Hi @szymon_dybczak 

The loop fails on the first SET TAG, so I just printed it out. Here it is

SET TAG ON COLUMN prod_aut_crp00_lab_catalog.crisp_db.prod_parsed_crisp_crrp_vw_fct_cust.DCO_AMG_AGG_INCOME reporting_field_name = DCO_Aggregated_Income

Thanks,

Stefan 

WiliamRosa
New Contributor III

Hi @stefan_erste, I ran into the same situation and managed to get it working from a notebook. The main issue isn’t really the cluster but the way the SQL parser handles the command. When you’re inside PySpark you need to fully qualify the column path with <catalog>.<schema>.<view>.<column>, and also make sure to put backticks around both the tag key and the value, otherwise it throws that syntax error you saw.

For example, what worked for me was something like:

spark.sql("""
SET TAG ON COLUMN workspace.wiliam_lab.vw_sales.customerID
`reporting_field_name` = `Customer Identifier Test`
""")

Once I used this format, the tag showed up in information_schema.column_tags and in the UI as expected. If you plan to loop over a mapping table and apply multiple tags dynamically, the same pattern works fine with f-strings. The only catch is that if you run it more than once with the same key/value, Unity Catalog complains with a duplicate assignment error. To get around that, I either skip when the tag already exists with the same value, or unset it first and then set it again if I need to update the value.

After doing this, I was able to tag my views programmatically without issues. So as long as you backtick the key and value and fully qualify the column name, you should be good to go.

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

szymon_dybczak
Esteemed Contributor III

I've already suggested that and even provide code with for loop, backticks and f-string...

Re: Programmatically setting TAGs on VIEWs - Databricks Community - 131240

szymon_dybczak_0-1757365161773.png

And @stefan_erste already using fully qualified name. I think he's only missing backticks

szymon_dybczak_1-1757365207939.png

 

stefan_erste
New Contributor

Hi @szymon_dybczak and @WiliamRosa,

I have used backticks from the very start (you'll see it if you re-check my original post).

It is definitely a cluster issue as I am able to assign tags using serverless cluster. The reason I was using a dedicated one is because I was pulling data from an excel on a sharepoint location, and this is not working with serverless. 

So in the end I will make a two-step job. First with dedicated cluster that will store all the tags into a table and the second one with serverless that will do the assignment.

Thank you both for the time and effort!

KR,
Stefan

szymon_dybczak
Esteemed Contributor III

Thanks for info @stefan_erste !

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now