cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Alter table to add/update multiple column comments

bharathjs
New Contributor II

I was wondering if there's a way to alter table and add/update comments for multiple columns at once using SQL or API calls. For instance - 

 

ALTER TABLE <table_name> CHANGE COLUMN <col1> COMMENT '<comment1>',
CHANGE COLUMN <col2> COMMENT 'comment2' ; 
 
I would like to update column comments for multiple tables for over 200 fields and flexible using sql, pyspark, apis. Would be great if there is a way to achieve this. Thanks in advance...   
 
4 REPLIES 4

filipniziol
Contributor

Hi, assuming you have a dictionary with column name to comment mapping, you can do this using pyspark like this:

columns_comments = {
    "col1": "comment1",
    "col2": "comment2",
    # Add all your columns and comments here
}

for col, comment in columns_comments.items():
    spark.sql(f"ALTER TABLE your_table_name CHANGE COLUMN {col} SET COMMENT '{comment}'")

Yeah I did consider this approach but one drawback I observed with this approach is it takes a while to updated comments for ~200 columns. I'm looking for more reliable solution to update column's comments concurrently.  

Hi @bharathjs ,

If you want to run it faster then there is a better way. Databricks supports the execution of multiple ALTER COLUMN operations within a single ALTER TABLE statement. 

 

ALTER TABLE your_table_name
  ALTER COLUMN col1 SET COMMENT 'comment1',
  ALTER COLUMN col2 SET COMMENT 'comment2',
  ALTER COLUMN col3 SET COMMENT 'comment3';

 

You can prepare your SQL programatically and then to run it. I have prepared a test table with 200 columns, random comments and added the comments in one shot - the execution took just 0.18 second.

Here is the sample:

 

from pyspark.sql.types import StructType, StructField, IntegerType

# Define the number of columns
num_columns = 200

# Dynamically create a list of StructFields
fields = [StructField(f"col{i}", IntegerType(), True) for i in range(1, num_columns + 1)]

# Create the schema
schema = StructType(fields)

# Create an empty DataFrame with the schema
df = spark.createDataFrame([], schema)

# Define the database and table name
database_name = "default"  # You can change this as needed
table_name = "test_comments"

# Create the test table
df.write.format("delta").mode("overwrite").saveAsTable(f"{database_name}.{table_name}")

print(f"Test table '{database_name}.{table_name}' with {num_columns} columns created successfully.")
import random
import string

# Function to generate a random comment
def generate_random_comment(length=30):
    return ''.join(random.choices(string.ascii_letters + string.digits + ' ', k=length))

# Generate comments for each column
columns_comments = {f"col{i}": generate_random_comment() for i in range(1, num_columns + 1)}

# Display a sample of the generated comments
for col in list(columns_comments.keys())[:5]:  # Show first 5 for brevity
    print(f"{col}: {columns_comments[col]}")
import time
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Function to escape single quotes in comments
def escape_single_quotes(comment):
    return comment.replace("'", "\\'")

# Start the timer
start_time = time.time()

# Generate ALTER COLUMN statements for all columns
alter_statements = ",\n  ".join([
    f"ALTER COLUMN `{col}` COMMENT '{escape_single_quotes(comment)}'"
    for col, comment in columns_comments.items()
])

# Construct the full ALTER TABLE SQL command
alter_table_sql = f"""
ALTER TABLE hive_metastore.{database_name}.{table_name}
  {alter_statements};
"""

# Execute the ALTER TABLE statement
try:
    spark.sql(alter_table_sql)
    logger.info(f"Successfully updated comments for {len(columns_comments)} columns in '{database_name}.{table_name}'.")
except Exception as e:
    logger.error(f"Failed to update comments for table '{database_name}.{table_name}'. Error: {e}")

# End the timer
end_time = time.time()
batched_duration = end_time - start_time

print(f"Batched ALTER TABLE duration: {batched_duration:.2f} seconds.")

 

The output:

filipniziol_0-1729840763995.png

The comments:

filipniziol_0-1729840826365.png

 






 

This approach would give me a syntax error. Essentially, error at the comma after each alter column.  

[PARSE_SYNTAX_ERROR] Syntax error at or near ','. 

 

ALTER TABLE hive_metastore.xxxx.yyyy.table ALTER COLUMN `col1` COMMENT 'Dhmsps4w3Lfgy emzbociW0szmpU3d', --------------------------------------------------------------^^^ ALTER COLUMN `col2` COMMENT '2z5lwE5MGWzAPgR46icOzHhFnBZHCh',

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group