I have a delta table that is being updated nightly using Auto Loader. After the merge, the job kicks off a second notebook to clean and rewrite certain value using a series of UPDATE statements, e.g.,
UPDATE TABLE foo
SET field1 = some_value
WHERE some_condition_is_met
As the table grow, this step is taking longer and longer. I suspect it is scanning through the entire table each time.
Is there a way to make this step more efficient, i.e. scanning only the delta update or append?