cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

PII tags in Spark Declarative Pipelines

bi_123
New Contributor III

I need to add PII tags at both the table and column levels for a streaming table created using Spark Declarative Pipelines.

I tried applying Unity Catalog tags with the following code inside the SDP Python pipeline:

spark.sql(f"""
ALTER TABLE {table_name}
SET TAGS ({tags_sql})
""")

However, this fails with the following error:

UNSUPPORTED_SPARK_SQL_COMMAND
'${command}' is not supported in spark.sql("...") API in SDP Python.
Supported command: ${supportedCommands}.

What is the correct way to define or apply PII tags for tables and columns created by Spark Declarative Pipelines?

1 REPLY 1

amirabedhiafi
New Contributor III

Hi @bi_123  !

You need to use UC tags outside the SPD definition not inside the SDP python function.

@dp.table(table_properties=...) can set table properties but those are not the same as UC tags and spark.sql("ALTER TABLE ...") inside SDP python is not supported because pipeline code is evaluated as a declarative graph and dataset functions should only define or return dataframes. 

For your streaming table, you can use ALTER STREAMING TABLE not ALTER TABLE:

-- table level tag
ALTER STREAMING TABLE catalog.schema.my_streaming_table
SET TAGS ('pii' = 'true');

-- column level tags
ALTER STREAMING TABLE catalog.schema.my_streaming_table
ALTER COLUMN email SET TAGS ('pii' = 'email');

ALTER STREAMING TABLE catalog.schema.my_streaming_table
ALTER COLUMN ssn SET TAGS ('pii' = 'ssn');

and run this from a DBKS sql env or as a post deployment CI/CD step after the pipeline creates or refreshes the table.

If this answer resolves your question, could you please mark it as “Accept as Solution”? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP