parameterized ALTER TABLE SET TAGS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago - last edited 2 weeks ago
I would like to use parametrized sql queries to run SET TAGs commands on tables, but cannot figure out how to parameterize the query to prevent SQL injection. Both the `?` and `:key` parameter syntaxes throw a syntax error
Basically, I'd like to do this, but without string formatting so I can protect against SQL injection. Is there a way to do this?
tags = ", ".join(list_of_tags)
spark.sql(
f"ALTER TABLE IDENTIFIER(:table) SET TAGS {tags}",
args={"table": table_name},
)
Note: the above isn't even valid syntax - once I add the IDENTIFIER clause, ANY FORM of SET TAGS following it will fail.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Hello @JameDavi_51481!
PySpark does not fully support parameterised queries for ALTER TABLE SET TAGS (a DDL Command). You can use IDENTIFIER(:table) to safely pass the table name, but for the tags, placeholders like ? or :key doesnโt seem to work. From what I understand, youโd need to manually format the tags before including them in the query. A safer way might be to validate them against a predefined list and then format them properly as a comma separated string. While itโs not fully parameterised, this should help mitigate SQL injection risks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago - last edited a week ago
> You can use IDENTIFIER(:table) to safely pass the table name, but for the tags, placeholders like ? or :key doesnโt seem to work.
This does not seem correct in my testing. This fails on a cluster running DBR 16.1:
spark.sql("alter table IDENTIFIER(:table) SET TAGS ('foo')",
args={"table": <table_name>})
File "/databricks/python/lib/python3.12/site-packages/grpc/_channel.py", line 966, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INTERNAL
details = "
[PARSE_SYNTAX_ERROR] Syntax error at or near 'TAGS'. SQLSTATE: 42601 (line 1, pos 35)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Hello @JameDavi_51481 and @Advika
1st thing we need to understand what is SQL Injection and why its not possible ?
A security hole known as SQL Injection occurs when a hacker inserts malicious SQL code into a query. Unauthorized access to databases, data alternation, ore even the removal of entire tables may result from this. Lets consider this example: SELECT * FROM users WHERE username = 'admin' or '1'='1'
Now condition 1=1 is always true, so i can easily bypass authentication and return all user data.
@Advika mentioned the reason is exactly correct but you can try following 2 options:
1st approach : Using python you can prevent SQL Injection by sanitizing inputs and safely craft the query.
example:
def set_table_tags(spark, table_name, tags_dict):
"""
Safely sets tags on a given table in Databricks.
:param spark: The Spark session
:param table_name: The name of the table (string)
:param tags_dict: A dictionary of tag key-value pairs
"""
# Sanitize table name (remove any backticks that might be used maliciously)
safe_table_name = f"`{table_name.replace('`', '')}`"
# Format tags safely as key = 'value'
tag_assignments = ", ".join([f"'{key}' = '{value}'" for key, value in tags_dict.items()])
# Construct the SQL query
query = f"ALTER TABLE {safe_table_name} SET TAGS {tag_assignments}"
# Execute the query
spark.sql(query)
print(f"Tags successfully set for table {table_name}")
# Example usage:
tags = {"environment": "production", "owner": "data_engineer"}
set_table_tags(spark, "my_catalog.my_schema.my_table", tags)
2nd approch is more simple by creating the widgets like this
-- Create input widgets
CREATE WIDGET TEXT table_name DEFAULT "my_catalog.my_schema.my_table";
CREATE WIDGET TEXT tag_key DEFAULT "owner";
CREATE WIDGET TEXT tag_value DEFAULT "data_engineer";
-- Construct and execute the query
SET tag_query = CONCAT(
'ALTER TABLE ', getArgument('table_name'),
' SET TAGS ', getArgument('tag_key'), ' = ', "'", getArgument('tag_value'), "'"
);
EXECUTE IMMEDIATE tag_query;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Homegrown input text sanitization within Python is notoriously failure prone and typically easily circumvented, and not an approach I would ever recommend.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
@JameDavi_51481 i agree with your point so creating dynamic safe queries in python can i think still if i am not wrong? i think can use EXECUTE IMMEDIATE this can prevent risk of exposing direct string formatting.

