cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to update a value in a column in a delta table with Map of Struct datatype?

Mado
Valued Contributor II

I have a delta table in Databricks named "prod.silver.control_table". It has a few columns including "table_name" with string data type and "transform_options" with the below structure:

 

 |-- transform_options: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- col_name_mappings: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- type_mappings: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- partition_duplicates_by: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)
 |    |    |-- order_duplicates_by: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)

For example, when "table_name" is "prod.silver.weather", the "transform_options" is:

 

{
"prod.bronze.weather_source_a":{"col_name_mappings":{"col_a_old":"col_a_new","col_b_old":"col_b_new"},"type_mappings":{"col_a_new":"INT","col_b_new":"TIMESTAMP"},"partition_duplicates_by":["col_a_new"],"order_duplicates_by":["_commit_version"]},
"prod.bronze.weather_source_b":{"col_name_mappings":{"col_c_old":"col_c_new","col_d_old":"col_d_new"},"type_mappings":{"col_c_new":"INT","col_d_new":"TIMESTAMP"},"partition_duplicates_by":["col_c_new"],"order_duplicates_by":["ingestion_timestamp","_commit_version"]}
}

I need to update values in "order_duplicates_by". I need to change "_commit_version" into "commit_version" by removing the initial underscore.

Any idea how to update table values? 

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @Mado, Yes, you can update the values in the “order_duplicates_by” field of your Delta table using the withColumn function in PySpark.

 

Also, please be aware that overwriting a Delta table will replace all existing data in the table. If you want to preserve the existing data, you should use a different write mode, such as “append” or “ignore”. However, these modes may not apply the desired updates to existing rows. For updating specific rows in a Delta table, Delta Lake provides an UPDATE SQL statement, but it requires a condition to identify which rows to update. Since your use case involves updating nested fields, it might be more complex to achieve with the UPDATE statement.

 

I hope this helps! Let me know if you have any other questions.

Mado
Valued Contributor II

@Kaniz_Fatma 

Could you give me more hints on how to use UPDATE command for this specific case? 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group