Update DeltaTable on column type ArrayType(): add element to array

carlosancassani
New Contributor III

Hi all,

I need to perform an Update on a Delta Table adding elements to a column of ArrayType(StringType()) which is initialized empty.

Before Update

Col_1 StringType()Col_2 StringType()Col_3 ArrayType()
ValVal[ ]

After Update

Col_1 StringType()Col_2 StringType()Col_3 ArrayType()
ValVal[ 'append value' ]

I'm trying with Update syntax but a receive errors within the "set" statement since updated value type (StringType) is not consistent with target one - ArrayType(StringType()):

 

schema = StructType([ 
StructField("Load_id", StringType(), True), 
StructField("Task_id", StringType(), True), 
StructField("Task_output", StringType(), True), 
StructField("Task_output_detail", ArrayType(StringType()), True), StructField("Execution_ts", TimestampType(), True), 
StructField("Task_status", StringType(), True)]) 

#some code to init materialize the delta table

Task_output_detail = "Invalid value" 
Log_table = DeltaTable.forPath(spark, path) 
Log_table.update( 
condition = (col("Load_id")== Load_id) & (col("Task_id")== Task_id), 
set = { "Task_output": lit(Task_output), "Task_output_detail": Task_output_detail, "Execution_ts": lit(Execution_ts), "Task_status": lit('Closed')})

 


Does anyone know a "smart" solution or correct syntaxt to achieve that? I want to avoid deleting the raw and creating a new one since I have to perform multiple updates / appends. 

Thanks! 

Does it mean that to add an element to an array we have first read all the elements of the array, then add new one, the save new array?