Apache Spark™ Structured Streaming has long powered mission-critical pipelines at scale, from streaming ETL to near real-time analytics and machine learning. Now, we’re expanding that capability to an entirely new class of workloads with real-time mode, a new trigger type that processes events as they arrive, with latency in the tens of milliseconds.
Unlike existing micro-batch triggers, which either process data on a fixed schedule (ProcessingTime trigger) or process all available data before shutting down (AvailableNow trigger), real-time mode continuously processes data and emits results as soon as they’re ready. This enables ultra-low-latency use cases like fraud detection, live personalization, and real-time machine learning feature serving, all without changing your existing code or replatforming.
This new mode is being contributed to open source Apache Spark and is now available in Public Preview on Databricks.
In this article, we’ll cover:
- What real-time mode is and how it works
- The types of applications it enables
- How you can start using it today
Continue to read more here.