Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @ittzzmalind,

This is expected behaviour and is mainly due to how Delta Sharing handles materialized views for open (non-Databricks) recipients versus Databricks-to-Databricks recipients.

For Databricks-to-Databricks recipients, the shared materialized view is read almost directly from its backing table. After you run REFRESH MATERIALIZED VIEW, those recipients see the new data right away.

However, for open recipients using the Python delta_sharing client, Databricks uses provider-side materialization. This means the first query for that MV builds a hidden, cached table on the provider side, and subsequent queries for that same recipient reuse that cached result for a configurable time-to-live (TTL, default 8 hours).. see snapshot from documentation below. During that TTL, the open recipient can still see stale data even after refreshing the MV.

delta share TTL.png

 That’s why your original open recipient continued to see the old data, and a new open recipient (new token) immediately saw the updated data because it triggered a fresh materialization.

To mitigate this, you can reduce the TTL of data materialisation in the Delta Sharing settings at the metastore level, so cached results expire sooner (at the cost of more frequent recompute and higher provider cost). Check this link for the steps.

Where possible, use Databricks-to-Databricks sharing for consumers that need near-real-time MV freshness, or share a Delta table instead of an MV for open recipients and let them do the aggregation on their side.

Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post