Update DeltaTable on column type ArrayType(): add element to array
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2024 04:31 AM
Hi all,
I need to perform an Update on a Delta Table adding elements to a column of ArrayType(StringType()) which is initialized empty.
Before Update
| Col_1 StringType() | Col_2 StringType() | Col_3 ArrayType() |
| Val | Val | [ ] |
After Update
| Col_1 StringType() | Col_2 StringType() | Col_3 ArrayType() |
| Val | Val | [ 'append value' ] |
I'm trying with Update syntax but a receive errors within the "set" statement since updated value type (StringType) is not consistent with target one - ArrayType(StringType()):
schema = StructType([
StructField("Load_id", StringType(), True),
StructField("Task_id", StringType(), True),
StructField("Task_output", StringType(), True),
StructField("Task_output_detail", ArrayType(StringType()), True), StructField("Execution_ts", TimestampType(), True),
StructField("Task_status", StringType(), True)])
#some code to init materialize the delta table
Task_output_detail = "Invalid value"
Log_table = DeltaTable.forPath(spark, path)
Log_table.update(
condition = (col("Load_id")== Load_id) & (col("Task_id")== Task_id),
set = { "Task_output": lit(Task_output), "Task_output_detail": Task_output_detail, "Execution_ts": lit(Execution_ts), "Task_status": lit('Closed')})
Does anyone know a "smart" solution or correct syntaxt to achieve that? I want to avoid deleting the raw and creating a new one since I have to perform multiple updates / appends.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-18-2024 11:21 AM
Does it mean that to add an element to an array we have first read all the elements of the array, then add new one, the save new array?