Databricks Runtime, Pyspark and Spark Versions

loujiang — Fri, 22 May 2026 09:50:18 GMT

Hello, Dear community,

I was go through the documentation of function from_xml here pyspark.sql.functions.from_xml — PySpark 4.1.2 documentation, it denotes that it is available in pyspark version higher than 4.0.0.

Meanwhile, we have documentation for from_xml at Azure/ AWS,from_xml function - Azure Databricks - Databricks SQL | Microsoft Learn the support of it is above Databricks Runtime 14.1.

But the Databricks Runtime 14.1 are using Apache Spark version 3.5.0, which should has no from_xml implementation. How should we understand this difference?

Thanks

best wishes

loujiang

Re: Databricks Runtime, Pyspark and Spark Versions

szymon_dybczak — Fri, 22 May 2026 10:24:13 GMT

Hi @loujiang ,

Databricks Runtime is not a vanilla Apache Spark distribution. DBR is built on top of a highly optimized version of Apache Spark, but also adds enhancements and additional components that substantially improve usability, performance, and security beyond what's in the open-source release. This means Databricks can - and regularly does - ship Spark features ahead of their upstream release.

Looking directly at the DBR 14.1 release notes, the Spark changelog section lists: Databricks Runtime 14.1 (EoS) | Databricks on Google Cloud

[SPARK-44788] [SC-142980][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

This JIRA ticket was cherry-picked into DBR 14.1, even though DBR 14.1 runs on Spark 3.5.0. Databricks applied this patch internally before it landed in an official Apache Spark release.

If my answer was helpful, please consider marking it as accepted solution

topic Re: Databricks Runtime, Pyspark and Spark Versions in Data Engineering

Databricks Runtime, Pyspark and Spark Versions

Re: Databricks Runtime, Pyspark and Spark Versions