Hi, I want to be clear about 'replaceWhere' clause in spark.write.
Here is the scenario:
I would like to add a column to few existing records.
The table is already partitioned on "PickupMonth" column.
Here is example: Without 'replaceWhere'
spark.read \
.format("delta") \
.load(destination_path) \
.where("PickupMonth == '12' and PaymentType == '3' ") \
.withColumn("PaymentType", lit(4).cast(LongType())) \
.write \
.format("delta") \
.mode("overwrite") \
.save(destination_path)
In the above code we are reading a few records "where("PickupMonth == '12' and PaymentType == '3' ")"
and then adding a new column and writing back to the same table. Does the above code work?
If we do not explicitly mention ".option("replaceWhere", "PickupMonth = '12'") \" during write,
should it not write just the "PickupMonth == '12'" partition automatically, because the table is a partitioned table?
Why should we write as follows with ".option("replaceWhere", "PickupMonth = '12'") \"
spark.read \
.format("delta") \
.load(destination_path) \
.where("PickupMonth == '12' and PaymentType == '3' ") \
.withColumn("PaymentType", lit(4).cast(LongType())) \
.write \
.format("delta") \
.option("replaceWhere", "PickupMonth = '12'") \
.mode("overwrite") \
.save(destination_path)
in other words what is the difference between whether we mention ".option("replaceWhere", "PickupMonth = '12'") \" or not.
Thank you in advance.