cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Apache 4.0

ilir_nuredini
Honored Contributor

Missed the Apache Spark 4.0 release? It is not just a version bump, it is a whole new level for big data processing.

 

Some of the highlights that really stood out to me:

1. SQL just got way more powerful: reusable UDFs, scripting, session variables, and the new PIPE syntax make it feel like you're working in a real programming language.

2. Python users: Native Plotly-based plotting on PySpark DataFrames is finally here, and writing custom data sources in pure Python is a groundbreaking.

3. Spark Connect is maturing fast: you can now
build apps in Go, Rust, Swift and plug them straight into Spark.

4. Structured Streaming just got way more flexible with transformWithState and full visibility into the state store.

Whether you're doing data engineering, analytics, or ML workflows, Spark 4.0 feels more modern, ensure stricter data integrity (hello ANSI mode), and developer-friendly.

P.S. Huge kudos to the 400+ contributors that made this release happen!!!

apache-4-0.jpg

2 REPLIES 2

ilir_nuredini
Honored Contributor

If you wanna deep dive into the release, check this documentation: https://spark.apache.org/releases/spark-release-4-0-0.html

Best, Ilir

Advika
Databricks Employee
Databricks Employee

Yeah, Spark 4.0 brings powerful enhancements while staying compatible with existing workloads.
Thank you for putting this together and highlighting the key updates, @ilir_nuredini.