Hi all,
I need to perform an Update on a Delta Table adding elements to a column of ArrayType(StringType()) which is initialized empty.
Before Update
Col_1 StringType() | Col_2 StringType() | Col_3 ArrayType() |
Val | Val | [ ] |
After Update
Col_1 StringType() | Col_2 StringType() | Col_3 ArrayType() |
Val | Val | [ 'append value' ] |
I'm trying with Update syntax but a receive errors within the "set" statement since updated value type (StringType) is not consistent with target one - ArrayType(StringType()):
schema = StructType([
StructField("Load_id", StringType(), True),
StructField("Task_id", StringType(), True),
StructField("Task_output", StringType(), True),
StructField("Task_output_detail", ArrayType(StringType()), True), StructField("Execution_ts", TimestampType(), True),
StructField("Task_status", StringType(), True)])
#some code to init materialize the delta table
Task_output_detail = "Invalid value"
Log_table = DeltaTable.forPath(spark, path)
Log_table.update(
condition = (col("Load_id")== Load_id) & (col("Task_id")== Task_id),
set = { "Task_output": lit(Task_output), "Task_output_detail": Task_output_detail, "Execution_ts": lit(Execution_ts), "Task_status": lit('Closed')})
Does anyone know a "smart" solution or correct syntaxt to achieve that? I want to avoid deleting the raw and creating a new one since I have to perform multiple updates / appends.
Thanks!