- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2024 07:34 AM
Hello,
currently the doc says that async progress tracking is available only for Kafka sink:
https://docs.databricks.com/en/structured-streaming/async-progress-checking.html
I would like to know if it would work for any sink that is "exactly once"?
I explain:
in many workflows, we read streamed data and merge the processed batch (increment) in an external database (Azure SQL, Snowflake, etc...) using a merge to ensure idempotency. But while merging, the Spark cluster is idle though we could start processing the next batch. So I think the async progress tracking could address this issue while merge statement ensures "exactly once" semantics. I don't see any impediment to this use case except maybe if this feature is forbidden for other sinks than Kafka.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2024 12:32 PM
Asynchronous progress tracking is a feature designed for ultra low latency use cases. You can read more in the open source SPIP doc here, but the expected gain in time is in the hundreds of milliseconds, which seems insignificant when doing merge operations with external systems.
Once Delta Live Tables (DLT) releases functionality to write to external databases, I recommend trying it. DLT should give you a pretty big gain in efficiency for this use case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2024 12:32 PM
Asynchronous progress tracking is a feature designed for ultra low latency use cases. You can read more in the open source SPIP doc here, but the expected gain in time is in the hundreds of milliseconds, which seems insignificant when doing merge operations with external systems.
Once Delta Live Tables (DLT) releases functionality to write to external databases, I recommend trying it. DLT should give you a pretty big gain in efficiency for this use case.

