Apache 4.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2025 04:18 PM
Missed the Apache Spark 4.0 release? It is not just a version bump, it is a whole new level for big data processing.
Some of the highlights that really stood out to me:
1. SQL just got way more powerful: reusable UDFs, scripting, session variables, and the new PIPE syntax make it feel like you're working in a real programming language.
2. Python users: Native Plotly-based plotting on PySpark DataFrames is finally here, and writing custom data sources in pure Python is a groundbreaking.
3. Spark Connect is maturing fast: you can now
build apps in Go, Rust, Swift and plug them straight into Spark.
4. Structured Streaming just got way more flexible with transformWithState and full visibility into the state store.
Whether you're doing data engineering, analytics, or ML workflows, Spark 4.0 feels more modern, ensure stricter data integrity (hello ANSI mode), and developer-friendly.
P.S. Huge kudos to the 400+ contributors that made this release happen!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2025 04:19 PM
If you wanna deep dive into the release, check this documentation: https://spark.apache.org/releases/spark-release-4-0-0.html
Best, Ilir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-17-2025 02:24 AM
Yeah, Spark 4.0 brings powerful enhancements while staying compatible with existing workloads.
Thank you for putting this together and highlighting the key updates, @ilir_nuredini.