Behavior of Vector Index Sync with Delta Tables When Using OVERWRITE vs MERGE in Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Friday
I'm working with vector search in Databricks using vector index sync with Delta tables, and I'm a bit unclear on how updates to the source table affect the vector index, specifically when using different write operations.
If I overwrite the source Delta table that is synced to the vector index (using the overwrite mode), will all the embeddings be recalculated and the vector index fully refreshed?
On the other hand, if I use a MERGE operation to upsert data into the source table, does the sync behave differently? For instance, are only the updated or inserted rows recalculated and synced?
Since we are using Azure OpenAI's embedding models for a high number of documents, fully recalculated embeddings would be somehow costly. And source Delta tables must have Change Data Feed enabled so I think embedding updates can be based on table change details.
Thanks in advance!

