Databricks Community

Thor · ‎11-26-2024

Hello,

currently the doc says that async progress tracking is available only for Kafka sink:
https://docs.databricks.com/en/structured-streaming/async-progress-checking.html

I would like to know if it would work for any sink that is "exactly once"?
I explain:
in many workflows, we read streamed data and merge the processed batch (increment) in an external database (Azure SQL, Snowflake, etc...) using a merge to ensure idempotency. But while merging, the Spark cluster is idle though we could start processing the next batch. So I think the async progress tracking could address this issue while merge statement ensures "exactly once" semantics. I don't see any impediment to this use case except maybe if this feature is forbidden for other sinks than Kafka.

cgrant · ‎11-26-2024

Asynchronous progress tracking is a feature designed for ultra low latency use cases. You can read more in the open source SPIP doc here, but the expected gain in time is in the hundreds of milliseconds, which seems insignificant when doing merge operations with external systems.

Once Delta Live Tables (DLT) releases functionality to write to external databases, I recommend trying it. DLT should give you a pretty big gain in efficiency for this use case.

View solution in original post

cgrant · ‎11-26-2024

Asynchronous progress tracking is a feature designed for ultra low latency use cases. You can read more in the open source SPIP doc here, but the expected gain in time is in the hundreds of milliseconds, which seems insignificant when doing merge operations with external systems.

Once Delta Live Tables (DLT) releases functionality to write to external databases, I recommend trying it. DLT should give you a pretty big gain in efficiency for this use case.

Databricks Community

Asynchronous progress tracking with foreachbatch

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April