Since 2022, Databricks Engineering has been on a mission to simplify streaming workloads through Project Lightspeed. We’ve democratized stateful processing with features like transformWithState and the State Reader API.
Now, with Databricks Runtime 18, we are addressing one of the most persistent headaches in streaming optimization: Shuffle tuning.
We are excited to announce that Adaptive Query Execution (AQE) and Auto Optimized Shuffle (AOS) are now fully supported for stateless Structured Streaming queries. This update brings the "set it and forget it" simplicity of batch processing to your streaming pipelines.
Shuffle is the redistribution (repartitioning) of data across executors and partitions, usually inserted as an Exchange between stages; it moves rows over the network and is one of the costliest operators. If you are unsure whether your streaming job includes a shuffle stage, the Spark UI can help you determine whether it is involved in the Spark plan, as not all streaming workloads require shuffle. A Spark query plan may include a shuffle stage for various reasons, including:
If you’ve optimized streaming pipelines before DBR 18, you know the struggle.
In DBR 18, the engine is finally smart enough to handle this for you. For stateless queries (like filters, projections, and stream-static joins), Spark now leverages the same adaptive intelligence used in batch jobs.
When you set spark.sql.shuffle.partitions to "auto", it actually behaves as "auto".
Once the shuffle happens, Adaptive Query Execution (AQE) steps in to optimize the plan on the fly.
Upgrading is simple. Move your workspace to DBR 18+, and these optimizations are enabled by default for stateless queries.
Note: These improvements currently apply to stateless operators. Complex stateful repartitioning requires different heuristics and remains a focus for future Project Lightspeed efforts.
With DBR 18, we are one step closer to a world where you simply write business logic, and the engine handles the rest. By bringing AQE and AOS to stateless streaming, we’re removing the "tuning tax" from data engineers, letting you focus on building value rather than managing partitions.
Stay tuned for more updates from Project Lightspeed!
Q: Does this update apply to stateful streaming operations like aggregations or mapGroupsWithState? A: No. Currently, these AQE and AOS improvements apply specifically to stateless queries (e.g., filters, projections, stream-static joins). Stateful repartitioning requires strict state store compatibility and is not yet supported by these specific features.
Q: Do I need to change my code to enable AQE for streaming? A: In DBR 18+, AQE for stateless streaming is enabled by default via spark.sql.adaptive.streaming.stateless.enabled. You do not need to change code, though you should ensure spark.sql.shuffle.partitions is set to "auto" to leverage AOS.
Q: How does AOS determine the partition count for a micro-batch? A: AOS estimates the size of the incoming micro-batch data and calculates an appropriate number of pre-shuffle partitions, ensuring the cluster isn't over-provisioned for small batches or choked by large ones.
Q: Will this fix "small file" problems in my downstream Delta tables? A: Yes, indirectly. By using AQE to coalesce small partitions during the write phase, the engine writes fewer, larger files to storage, reducing the need for aggressive downstream OPTIMIZE or auto-compaction jobs.
Q: Can I remove my existing foreachBatch logic? A: If you were using foreachBatch solely to trigger batch-style AQE optimizations for a stateless write, you can likely revert to a standard streaming write in DBR 18. If the foreachBatch contains other custom logic (such as merging to external systems), you must keep it.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.