cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

How to update a value in a column in a delta table with Map of Struct datatype?

Mado
Valued Contributor II

I have a delta table in Databricks named "prod.silver.control_table". It has a few columns including "table_name" with string data type and "transform_options" with the below structure:

 

 |-- transform_options: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- col_name_mappings: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- type_mappings: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- partition_duplicates_by: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)
 |    |    |-- order_duplicates_by: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)

For example, when "table_name" is "prod.silver.weather", the "transform_options" is:

 

{
"prod.bronze.weather_source_a":{"col_name_mappings":{"col_a_old":"col_a_new","col_b_old":"col_b_new"},"type_mappings":{"col_a_new":"INT","col_b_new":"TIMESTAMP"},"partition_duplicates_by":["col_a_new"],"order_duplicates_by":["_commit_version"]},
"prod.bronze.weather_source_b":{"col_name_mappings":{"col_c_old":"col_c_new","col_d_old":"col_d_new"},"type_mappings":{"col_c_new":"INT","col_d_new":"TIMESTAMP"},"partition_duplicates_by":["col_c_new"],"order_duplicates_by":["ingestion_timestamp","_commit_version"]}
}

I need to update values in "order_duplicates_by". I need to change "_commit_version" into "commit_version" by removing the initial underscore.

Any idea how to update table values? 

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Mado, Yes, you can update the values in the “order_duplicates_by” field of your Delta table using the withColumn function in PySpark.

 

Also, please be aware that overwriting a Delta table will replace all existing data in the table. If you want to preserve the existing data, you should use a different write mode, such as “append” or “ignore”. However, these modes may not apply the desired updates to existing rows. For updating specific rows in a Delta table, Delta Lake provides an UPDATE SQL statement, but it requires a condition to identify which rows to update. Since your use case involves updating nested fields, it might be more complex to achieve with the UPDATE statement.

 

I hope this helps! Let me know if you have any other questions.

Mado
Valued Contributor II

@Kaniz 

Could you give me more hints on how to use UPDATE command for this specific case? 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.