cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

StructField Metadata Dictionary - What are the possible keys?

chaosBEE
New Contributor II
I have a Delta Live Table which is being deposited to Unity Catalog. In the Python notebook, I am defining the schema with a series of StructFields, for example:
 
StructField(
    "columnName",
    StringType(),
    True,
    metadata = {
        'comment': "This is a comment"
    }
)
 
This works, however now I am trying to add tags, I have tried the following, but it doesn't seem to work:
 
metadata = {
        'comment': "This is a comment"
        'tags': {
            'tag1': 'x',
            'tag2': 'y',
        }
    }
 
Does anyone know to add tags to columns programmatically using the Python API?
 
Also, does anyone know how to add tags to the specific table, I am using 
dlt.create_streaming_table(), but there doesn't seem to be an option here: Delta Live Tables Python language reference | Databricks on AWS
 
Any help would be greatly appreciated!
 
Thanks,
Riqo
 
5 REPLIES 5

Thank you Kaniz!

A few follow-up questions:

Could you give an example of each?

For 1. you said "define your schema with metadata", how do you this, what is the syntax for defining metadata with tags, like I said before, the following does not work:

StructField(
    "columnName",
    StringType(),
    True,
    metadata = {
        'comment'"This is a comment",
        'tags': {
            'tag1': 'x',
            'tag2': 'y',
        }
    }
)
 
For 2., again, what is the syntax? Could you provide an example? In the link you have shown, there doesn't seem to be a tags option as shown below.
 
@Dlt.table( name="<name>", comment="<comment>", spark_conf={"<key>" : "<value>", "<key>" : "<value>"}, table_properties={"<key>" : "<value>", "<key>" : "<value>"}, path="<storage-location-path>", partition_cols=["<partition-column>", "<partition-column>"], schema="schema-definition", temporary=False)
 
Also, is this available with @Dlt.append_flow too?
 
Thanks

ChrisLawford_n1
New Contributor III

Did you ever get an answer to this ?

Panda
Valued Contributor

@chaosBEE Have you try the below

Use following SQL command to add tags programmatically

spark.sql("""ALTER TABLE your_catalog.your_schema.your_table SET TAGS ('key1' = 'value1');""")

ChrisLawford_n1
New Contributor III

Hey @Panda ,

That will work but when you want to do this for each of the columns in your table it becomes very unclean in comparison to using something like the StructField Metadata attribute.

At the moment you would end up doing somethings like:

    def _create_tags_script(self, column_name, field_name, value):
        return f"ALTER COLUMN {column_name} SET TAGS ('{field_name}' = '{value}')"

    def get_sql_script(self, catalog_name, schema_name, table_name):
        alter_table_script = f"ALTER TABLE {catalog_name}.{schema_name}.{table_name}"
        output_sql_script = []
        for schema_field in self.schema_fields:
            if schema_field.source_field:
                output_sql_script.append(
                    f"{alter_table_script} {self._create_tags_script(schema_field.column_name, 'source_field', schema_field.source_field)}"
                )
            if schema_field.legacy_field:
                output_sql_script.append(
                    f"{alter_table_script} {self._create_tags_script(schema_field.column_name, 'legacy_field', schema_field.legacy_field)}"
                )
            if schema_field.comment:
                output_sql_script.append(
                    f"{alter_table_script} ALTER COLUMN {schema_field.column_name} COMMENT '{schema_field.comment}'"
                )
        return output_sql_script

which is just a bit more messy than 

StructType(
[
StructField(
    "columnName1",
    StringType(),
    True,
    metadata = {
        'comment': "This is a comment"
    }
),
StructField(
    "columnName2",
    StringType(),
    True,
    metadata = {
        'comment': "This is a comment"
    }
),
StructField(
    "columnName3",
    StringType(),
    True,
    metadata = {
        'comment': "This is a comment"
    }
)
]
)

ipreston
New Contributor III

Bump,

I've got the same issue. Looks like there was a partial reply from Kaniz but I can't see it in this thread.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group