<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Runtime, Pyspark and Spark Versions in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-runtime-pyspark-and-spark-versions/m-p/157482#M54570</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/229070"&gt;@loujiang&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Databricks Runtime is not a vanilla Apache Spark distribution. DBR is built on top of a highly optimized version of Apache Spark, but also adds enhancements and additional components that substantially improve usability, performance, and security beyond what's in the open-source release. This means Databricks can - and regularly does - ship Spark features ahead of their upstream release.&lt;/P&gt;&lt;P&gt;Looking directly at the DBR 14.1 release notes, the Spark changelog section lists:&amp;nbsp;&lt;A href="https://docs.databricks.com/gcp/en/release-notes/runtime/14.1" target="_blank"&gt;Databricks Runtime 14.1 (EoS) | Databricks on Google Cloud&lt;/A&gt;&lt;/P&gt;&lt;P&gt;[SPARK-44788] [SC-142980][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function&lt;/P&gt;&lt;P&gt;This JIRA ticket was cherry-picked into DBR 14.1, even though DBR 14.1 runs on Spark 3.5.0. Databricks applied this patch internally before it landed in an official Apache Spark release.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If my answer was helpful, please consider marking it as accepted solution&lt;/P&gt;</description>
    <pubDate>Fri, 22 May 2026 10:24:13 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2026-05-22T10:24:13Z</dc:date>
    <item>
      <title>Databricks Runtime, Pyspark and Spark Versions</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-runtime-pyspark-and-spark-versions/m-p/157477#M54569</link>
      <description>&lt;P&gt;Hello, Dear community,&lt;/P&gt;&lt;P&gt;I was go through the documentation of function from_xml here&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.from_xml.html" target="_blank"&gt;pyspark.sql.functions.from_xml — PySpark 4.1.2 documentation&lt;/A&gt;, it denotes that it is available in pyspark version higher than 4.0.0.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Meanwhile, we have documentation for from_xml at Azure/ AWS,&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/from_xml" target="_blank"&gt;from_xml function - Azure Databricks - Databricks SQL | Microsoft Learn&lt;/A&gt; the support of it is above Databricks Runtime&amp;nbsp;&lt;SPAN&gt;14.1.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But the Databricks Runtime 14.1 are using Apache Spark&amp;nbsp;version 3.5.0, which should has no from_xml implementation. How should we understand this difference?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;best wishes&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;loujiang&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 May 2026 09:50:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-runtime-pyspark-and-spark-versions/m-p/157477#M54569</guid>
      <dc:creator>loujiang</dc:creator>
      <dc:date>2026-05-22T09:50:18Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Runtime, Pyspark and Spark Versions</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-runtime-pyspark-and-spark-versions/m-p/157482#M54570</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/229070"&gt;@loujiang&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Databricks Runtime is not a vanilla Apache Spark distribution. DBR is built on top of a highly optimized version of Apache Spark, but also adds enhancements and additional components that substantially improve usability, performance, and security beyond what's in the open-source release. This means Databricks can - and regularly does - ship Spark features ahead of their upstream release.&lt;/P&gt;&lt;P&gt;Looking directly at the DBR 14.1 release notes, the Spark changelog section lists:&amp;nbsp;&lt;A href="https://docs.databricks.com/gcp/en/release-notes/runtime/14.1" target="_blank"&gt;Databricks Runtime 14.1 (EoS) | Databricks on Google Cloud&lt;/A&gt;&lt;/P&gt;&lt;P&gt;[SPARK-44788] [SC-142980][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function&lt;/P&gt;&lt;P&gt;This JIRA ticket was cherry-picked into DBR 14.1, even though DBR 14.1 runs on Spark 3.5.0. Databricks applied this patch internally before it landed in an official Apache Spark release.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If my answer was helpful, please consider marking it as accepted solution&lt;/P&gt;</description>
      <pubDate>Fri, 22 May 2026 10:24:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-runtime-pyspark-and-spark-versions/m-p/157482#M54570</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2026-05-22T10:24:13Z</dc:date>
    </item>
  </channel>
</rss>

