<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unable to load org.apache.spark.sql.delta classes from JVM pyspark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unable-to-load-org-apache-spark-sql-delta-classes-from-jvm/m-p/133776#M49922</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/169877"&gt;@Nasd_&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I believe you are trying to use OSS jars on DBR. (Can infer based on class package)&lt;/P&gt;
&lt;PRE&gt;org.apache.spark.sql.delta.DeltaLog&lt;/PRE&gt;
&lt;P&gt;The error &lt;CODE&gt;ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package&lt;/CODE&gt;&amp;nbsp;can be seen when installing the open-source &lt;CODE&gt;delta-spark&lt;/CODE&gt; (or Delta Core) Python package would be a&amp;nbsp;&lt;STRONG&gt;package conflict&lt;/STRONG&gt;.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Databricks Runtime includes a &lt;I&gt;native&lt;/I&gt; version of the Delta Lake Python libraries that are tightly coupled with the binaries on the cluster. When you install the open-source &lt;CODE&gt;delta-spark&lt;/CODE&gt; package via &lt;CODE&gt;%pip&lt;/CODE&gt; or as a cluster library, it often overwrites or conflicts with the native Databricks-provided modules, leading to the Python import error because the structure or contents of the installed package do not match what the Databricks environment expects.&lt;/P&gt;
&lt;P&gt;Okay, I just see you have the answer on this thread -&amp;nbsp;&lt;A href="https://community.databricks.com/t5/data-engineering/accessing-deltalog-and-optimistictransaction-from-pyspark/td-p/121886" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/accessing-deltalog-and-optimistictransaction-from-pyspark/td-p/121886&lt;/A&gt;&amp;nbsp;and you have accepted the answer. So, I believe your questions are answered.&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Sat, 04 Oct 2025 07:11:38 GMT</pubDate>
    <dc:creator>NandiniN</dc:creator>
    <dc:date>2025-10-04T07:11:38Z</dc:date>
    <item>
      <title>Unable to load org.apache.spark.sql.delta classes from JVM pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-load-org-apache-spark-sql-delta-classes-from-jvm/m-p/122196#M46693</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I’m working on Databricks with a cluster running Runtime 16.4, which includes Spark 3.5.2 and Scala 2.12.&lt;/P&gt;&lt;P&gt;For a specific need, I want to implement my own custom way of writing to Delta tables by manually managing Delta transactions from PySpark. To do this, I want to access the Delta Lake transactional engine via the JVM embedded in the Spark session, specifically by using the class:&lt;/P&gt;&lt;PRE&gt;org.apache.spark.sql.delta.DeltaLog&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Issue&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;When I try to use classes from the package org.apache.spark.sql.delta directly from PySpark (through spark._jvm), the classes are not found if I don’t have the Delta Core package installed explicitly on the cluster.&lt;/P&gt;&lt;P&gt;When I install the Delta Core Python package to gain access, I encounter the following Python import error:&lt;/P&gt;&lt;PRE&gt;ModuleNotFoundError: No &lt;SPAN class=""&gt;module&lt;/SPAN&gt; named &lt;SPAN class=""&gt;'delta.exceptions.captured'&lt;/SPAN&gt;; &lt;SPAN class=""&gt;'delta.exceptions'&lt;/SPAN&gt; is not a &lt;SPAN class=""&gt;package&lt;/SPAN&gt;
&lt;/PRE&gt;&lt;P&gt;Without the Delta Core package installed, accessing DeltaLog simply returns a generic JavaPackage object that is unusable.&lt;/P&gt;&lt;P&gt;What I want to do Access the Delta transaction log API (DeltaLog) from PySpark via JVM.&lt;/P&gt;&lt;P&gt;Be able to start transactions and commit manually to implement custom write behavior.&lt;/P&gt;&lt;P&gt;Work within the Databricks Runtime 16.4 environment without conflicts or missing dependencies.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Questions&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;How can I correctly access and use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;org.apache.spark.sql.delta.DeltaLog&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;from PySpark on Databricks Runtime 16.4?&lt;/P&gt;&lt;P&gt;Is there a supported way to manually manage Delta transactions through the JVM in this environment?&lt;/P&gt;&lt;P&gt;What is the correct setup or package dependency to avoid the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;ModuleNotFoundError&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;when installing the Delta Core Python package?&lt;/P&gt;&lt;P&gt;Are there any alternatives or recommended patterns to achieve manual Delta commits programmatically on Databricks?&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jun 2025 00:05:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-load-org-apache-spark-sql-delta-classes-from-jvm/m-p/122196#M46693</guid>
      <dc:creator>Nasd_</dc:creator>
      <dc:date>2025-06-19T00:05:07Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to load org.apache.spark.sql.delta classes from JVM pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-load-org-apache-spark-sql-delta-classes-from-jvm/m-p/133776#M49922</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/169877"&gt;@Nasd_&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I believe you are trying to use OSS jars on DBR. (Can infer based on class package)&lt;/P&gt;
&lt;PRE&gt;org.apache.spark.sql.delta.DeltaLog&lt;/PRE&gt;
&lt;P&gt;The error &lt;CODE&gt;ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package&lt;/CODE&gt;&amp;nbsp;can be seen when installing the open-source &lt;CODE&gt;delta-spark&lt;/CODE&gt; (or Delta Core) Python package would be a&amp;nbsp;&lt;STRONG&gt;package conflict&lt;/STRONG&gt;.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Databricks Runtime includes a &lt;I&gt;native&lt;/I&gt; version of the Delta Lake Python libraries that are tightly coupled with the binaries on the cluster. When you install the open-source &lt;CODE&gt;delta-spark&lt;/CODE&gt; package via &lt;CODE&gt;%pip&lt;/CODE&gt; or as a cluster library, it often overwrites or conflicts with the native Databricks-provided modules, leading to the Python import error because the structure or contents of the installed package do not match what the Databricks environment expects.&lt;/P&gt;
&lt;P&gt;Okay, I just see you have the answer on this thread -&amp;nbsp;&lt;A href="https://community.databricks.com/t5/data-engineering/accessing-deltalog-and-optimistictransaction-from-pyspark/td-p/121886" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/accessing-deltalog-and-optimistictransaction-from-pyspark/td-p/121886&lt;/A&gt;&amp;nbsp;and you have accepted the answer. So, I believe your questions are answered.&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Sat, 04 Oct 2025 07:11:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-load-org-apache-spark-sql-delta-classes-from-jvm/m-p/133776#M49922</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2025-10-04T07:11:38Z</dc:date>
    </item>
  </channel>
</rss>

