The latest Spark 4.0 release delivers powerful enhancements across SQL, Python, streaming, and connectivity — all aimed at making big data workloads more efficient, reliable, and developer-friendly.
With Databricks Runtime 17.0, these capabilities are available out of the box.
🔍 What’s New in Spark 4.0?
💡 SQL & Workflow Enhancements
✅ SQL scripting & session variables — Build complex, maintainable workflows
✅ Reusable SQL UDFs & intuitive |> pipe syntax — Streamline your analytics
✅ ANSI SQL mode enabled by default — Ensures stricter data integrity & standards compliance
🧱 Data Types & Logging
✅ New VARIANT data type — Seamless handling of JSON & semi-structured data
✅ Structured JSON logging — Improved observability & debugging
🐍 Python & PySpark Upgrades
✅ Native plotting in PySpark — .plot() now works right in our notebooks!
✅ New Python DataSource API — Build custom connectors using pure Python
✅ Polymorphic Python UDTFs with dynamic schema support
🔄 Streaming Improvements
✅ New transformWithState API — Power advanced stateful streaming applications
🌐 Connectivity & Ecosystem
✅ Spark Connect nearly at full parity with Spark Classic
✅ New client support: Go, Rust, Swift
📦 Bonus: We can try all of this now by selecting Databricks Runtime 17.0 when spinning up our cluster!