<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: parameterized ALTER TABLE SET TAGS in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/120072#M46051</link>
    <description>&lt;P&gt;I already said I was running 16.1 for my tests.&lt;/P&gt;</description>
    <pubDate>Fri, 23 May 2025 13:31:51 GMT</pubDate>
    <dc:creator>JameDavi_51481</dc:creator>
    <dc:date>2025-05-23T13:31:51Z</dc:date>
    <item>
      <title>parameterized ALTER TABLE SET TAGS</title>
      <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/112975#M44379</link>
      <description>&lt;P&gt;I would like to use parametrized sql queries to run SET TAGs commands on tables, but cannot figure out how to parameterize the query to prevent SQL injection. Both the `?` and `:key` parameter syntaxes throw a syntax error&lt;/P&gt;&lt;P&gt;Basically, I'd like to do this, but without string formatting so I can protect against SQL injection. Is there a way to do this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;tags = ", ".join(list_of_tags)
spark.sql(
    f"ALTER TABLE IDENTIFIER(:table) SET TAGS {tags}",
    args={"table": table_name},
)&lt;/LI-CODE&gt;&lt;P&gt;Note: the above isn't even valid syntax - once I add the IDENTIFIER clause, ANY FORM of SET TAGS following it will fail.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Mar 2025 20:14:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/112975#M44379</guid>
      <dc:creator>JameDavi_51481</dc:creator>
      <dc:date>2025-03-18T20:14:24Z</dc:date>
    </item>
    <item>
      <title>Re: parameterized ALTER TABLE SET TAGS</title>
      <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113138#M44433</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/37890"&gt;@JameDavi_51481&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;PySpark does not fully support parameterised queries for ALTER TABLE SET TAGS (a DDL Command). You can use IDENTIFIER(:table) to safely pass the table name, but for the tags, placeholders like ? or :key doesn’t seem to work. From what I understand, you’d need to manually format the tags before including them in the query. A safer way might be to validate them against a predefined list and then format them properly as a comma separated string. While it’s not fully parameterised, this should help mitigate SQL injection risks.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 12:03:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113138#M44433</guid>
      <dc:creator>Advika</dc:creator>
      <dc:date>2025-03-20T12:03:17Z</dc:date>
    </item>
    <item>
      <title>Re: parameterized ALTER TABLE SET TAGS</title>
      <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113149#M44435</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&amp;gt; You can use IDENTIFIER(:table) to safely pass the table name, but for the tags, placeholders like ? or :key doesn’t seem to work.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;This does not seem correct in my testing. This fails on a cluster running DBR 16.1:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.sql("alter table IDENTIFIER(:table) SET TAGS ('foo')",
args={"table": &amp;lt;table_name&amp;gt;})

  File "/databricks/python/lib/python3.12/site-packages/grpc/_channel.py", line 966, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: &amp;lt;_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "
[PARSE_SYNTAX_ERROR] Syntax error at or near 'TAGS'. SQLSTATE: 42601 (line 1, pos 35)&lt;/LI-CODE&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 20 Mar 2025 13:49:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113149#M44435</guid>
      <dc:creator>JameDavi_51481</dc:creator>
      <dc:date>2025-03-20T13:49:37Z</dc:date>
    </item>
    <item>
      <title>Re: parameterized ALTER TABLE SET TAGS</title>
      <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113153#M44439</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/37890"&gt;@JameDavi_51481&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152834"&gt;@Advika&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1st thing we need to understand what is SQL Injection and why its not possible ?&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;A security hole known as SQL Injection occurs when a hacker inserts malicious SQL code into a query. Unauthorized access to databases, data alternation, ore even the removal of entire tables may result from this. Lets consider this example: &lt;STRONG&gt;SELECT * FROM users WHERE username = 'admin' or '1'='1'&lt;/STRONG&gt;&lt;BR /&gt;Now condition &lt;STRONG&gt;1=1 is always true,&amp;nbsp;&lt;/STRONG&gt; so i can easily bypass authentication and return all user data.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152834"&gt;@Advika&lt;/a&gt;&amp;nbsp; mentioned the reason is exactly correct but you can try following 2 options:&lt;BR /&gt;1st approach : Using python you can prevent SQL Injection by sanitizing inputs and safely craft the query.&amp;nbsp;&lt;BR /&gt;example:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;def set_table_tags(spark, table_name, tags_dict):&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;"""&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Safely sets tags on a given table in Databricks.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;:param spark: The Spark session&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;:param table_name: The name of the table (string)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;:param tags_dict: A dictionary of tag key-value pairs&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;"""&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;# Sanitize table name (remove any backticks that might be used maliciously)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;safe_table_name = f"`{table_name.replace('`', '')}`"&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;# Format tags safely as key = 'value'&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;tag_assignments = ", ".join([f"'{key}' = '{value}'" for key, value in tags_dict.items()])&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;# Construct the SQL query&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;query = f"ALTER TABLE {safe_table_name} SET TAGS {tag_assignments}"&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;# Execute the query&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;spark.sql(query)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;print(f"Tags successfully set for table {table_name}")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;# Example usage:&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;tags = {"environment": "production", "owner": "data_engineer"}&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;set_table_tags(spark, "my_catalog.my_schema.my_table", tags)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;2nd approch is more simple by creating the widgets like this&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;-- Create input widgets&lt;/STRONG&gt;&lt;BR /&gt;CREATE WIDGET TEXT table_name DEFAULT "my_catalog.my_schema.my_table";&lt;BR /&gt;CREATE WIDGET TEXT tag_key DEFAULT "owner";&lt;BR /&gt;CREATE WIDGET TEXT tag_value DEFAULT "data_engineer";&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;-- Construct and execute the query&lt;/STRONG&gt;&lt;BR /&gt;SET tag_query = CONCAT(&lt;BR /&gt;'ALTER TABLE ', getArgument('table_name'),&lt;BR /&gt;' SET TAGS ', getArgument('tag_key'), ' = ', "'", getArgument('tag_value'), "'"&lt;BR /&gt;);&lt;BR /&gt;EXECUTE IMMEDIATE tag_query;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 14:01:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113153#M44439</guid>
      <dc:creator>Data_Mavericks</dc:creator>
      <dc:date>2025-03-20T14:01:23Z</dc:date>
    </item>
    <item>
      <title>Re: parameterized ALTER TABLE SET TAGS</title>
      <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113155#M44441</link>
      <description>&lt;P&gt;Homegrown input text sanitization within Python is notoriously failure prone and typically easily circumvented, and not an approach I would ever recommend.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 14:05:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113155#M44441</guid>
      <dc:creator>JameDavi_51481</dc:creator>
      <dc:date>2025-03-20T14:05:52Z</dc:date>
    </item>
    <item>
      <title>Re: parameterized ALTER TABLE SET TAGS</title>
      <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113157#M44443</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/37890"&gt;@JameDavi_51481&lt;/a&gt;&amp;nbsp; i agree with your point so creating dynamic safe queries in python can i think still if i am not wrong? i think can use EXECUTE IMMEDIATE this can prevent risk of exposing&amp;nbsp; direct string formatting.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 14:15:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/113157#M44443</guid>
      <dc:creator>Data_Mavericks</dc:creator>
      <dc:date>2025-03-20T14:15:42Z</dc:date>
    </item>
    <item>
      <title>Re: parameterized ALTER TABLE SET TAGS</title>
      <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/120063#M46046</link>
      <description>&lt;P&gt;Make sure the prerequisite is your cluster should be running on DBR version greater than 13.3 LTS.&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 11:45:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/120063#M46046</guid>
      <dc:creator>NikunjKakadiya</dc:creator>
      <dc:date>2025-05-23T11:45:58Z</dc:date>
    </item>
    <item>
      <title>Re: parameterized ALTER TABLE SET TAGS</title>
      <link>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/120072#M46051</link>
      <description>&lt;P&gt;I already said I was running 16.1 for my tests.&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 13:31:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/parameterized-alter-table-set-tags/m-p/120072#M46051</guid>
      <dc:creator>JameDavi_51481</dc:creator>
      <dc:date>2025-05-23T13:31:51Z</dc:date>
    </item>
  </channel>
</rss>

