Monday - last edited Monday
Hi all,
In order to achieve data stability in our Workspace, our IT team has given us access to the data through VIEWs on top of an ingestion schema.
Now I want to provide metadata to VIEWs in form of TAGs (IT does not want to cover this used case). They are stored in a table and I planned to do a loop and dynamically create SQL statements. I have confirmed the SQL syntax as
Tuesday
Hi @szymon_dybczak and @WiliamRosa,
I have used backticks from the very start (you'll see it if you re-check my original post).
It is definitely a cluster issue as I am able to assign tags using serverless cluster. The reason I was using a dedicated one is because I was pulling data from an excel on a sharepoint location, and this is not working with serverless.
So in the end I will make a two-step job. First with dedicated cluster that will store all the tags into a table and the second one with serverless that will do the assignment.
Thank you both for the time and effort!
KR,
Stefan
Monday - last edited Monday
Hi @stefan_erste ,
I'm not right now in front of my PC to check this but maybe try to use below syntax:
ALTER VIEW catalog.schema.view_name ALTER COLUMN column_name SET TAGS (key' = 'value)
Monday
Hi @szymon_dybczak,
unfortunately IT won't let us have alter rights so not to accidentally drop them. All they have given is the SET TAG right.
Stefan
Monday
Thanks for the info. When I come back to home I'll try to find a workaround using SET Tag command
Monday
Hi @stefan_erste ,
Ok, so in my case I was able to set tag in pyspark using following way:
spark.sql(f"""
SET TAG ON COLUMN default.employees.Id test_tag = test_value
""")
Of course you can do it dynamically using loop and f-string:
tags = [
"my_tag3 = my_tag_value",
"my_tag4 = my_tag_other_value"
]
for tag in tags:
query_template = f"SET TAG ON COLUMN default.employees.Id {tag}"
#print(query_template)
spark.sql(query_template)
Monday
I tried to execute your example, but it failed with this error:
"[INVALID_SET_SYNTAX] Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, or include semicolon in value, please use backquotes, e.g., SET `key`=`value`. SQLSTATE: 42000".
Could the cluster setup be an issue? Below is the extract of the cluster's JSON:
"spark_version": "15.4.x-scala2.12",
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true",
"spark.databricks.conda.condaMagic.enabled": "true",
"spark.master": "local[*, 4]",
"spark.databricks.driver.disableDefaultMavenCentralResolver": "true",
"spark.databricks.driver.disableSparkPackagesResolver": "true",
"spark.databricks.cluster.profile": "singleNode"
},
"node_type_id": "Standard_DS3_v2",
"driver_node_type_id": "Standard_DS3_v2"
Thanks,
Stefan
Monday - last edited Monday
Hi @stefan_erste ,
I'll check with the exact same cluster setting tomorrow's morning.
In the meantime - do you use some special characters in your tag's key or value?
Because if you have unusual tags like below - in this case I have dollar sign in my key and space in my tag value I get the same error:
To mitigate that you need to escape using backtick, like below:
tags = [
"`my_tag$` = `my tag value`"
]
for tag in tags:
query_template = f"SET TAG ON COLUMN default.employees.Id {tag}"
#print(query_template)
spark.sql(query_template)
With backticks it works as expected and I can use non standard characters to define tag's key and value:
And as a prove that it worked, here an output from information_schema.column_tags:
Monday
The loop fails on the first SET TAG, so I just printed it out. Here it is
SET TAG ON COLUMN prod_aut_crp00_lab_catalog.crisp_db.prod_parsed_crisp_crrp_vw_fct_cust.DCO_AMG_AGG_INCOME reporting_field_name = DCO_Aggregated_Income
Thanks,
Stefan
Monday
Hi @stefan_erste, I ran into the same situation and managed to get it working from a notebook. The main issue isn’t really the cluster but the way the SQL parser handles the command. When you’re inside PySpark you need to fully qualify the column path with <catalog>.<schema>.<view>.<column>, and also make sure to put backticks around both the tag key and the value, otherwise it throws that syntax error you saw.
For example, what worked for me was something like:
spark.sql("""
SET TAG ON COLUMN workspace.wiliam_lab.vw_sales.customerID
`reporting_field_name` = `Customer Identifier Test`
""")
Once I used this format, the tag showed up in information_schema.column_tags and in the UI as expected. If you plan to loop over a mapping table and apply multiple tags dynamically, the same pattern works fine with f-strings. The only catch is that if you run it more than once with the same key/value, Unity Catalog complains with a duplicate assignment error. To get around that, I either skip when the tag already exists with the same value, or unset it first and then set it again if I need to update the value.
After doing this, I was able to tag my views programmatically without issues. So as long as you backtick the key and value and fully qualify the column name, you should be good to go.
Monday
I've already suggested that and even provide code with for loop, backticks and f-string...
Re: Programmatically setting TAGs on VIEWs - Databricks Community - 131240
And @stefan_erste already using fully qualified name. I think he's only missing backticks
Tuesday
Hi @szymon_dybczak and @WiliamRosa,
I have used backticks from the very start (you'll see it if you re-check my original post).
It is definitely a cluster issue as I am able to assign tags using serverless cluster. The reason I was using a dedicated one is because I was pulling data from an excel on a sharepoint location, and this is not working with serverless.
So in the end I will make a two-step job. First with dedicated cluster that will store all the tags into a table and the second one with serverless that will do the assignment.
Thank you both for the time and effort!
KR,
Stefan
Tuesday
Thanks for info @stefan_erste !
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now