Unable to load org.apache.spark.sql.delta classes from JVM pyspark

Nasd_ — Thu, 19 Jun 2025 00:05:07 GMT

Hello,

I’m working on Databricks with a cluster running Runtime 16.4, which includes Spark 3.5.2 and Scala 2.12.

For a specific need, I want to implement my own custom way of writing to Delta tables by manually managing Delta transactions from PySpark. To do this, I want to access the Delta Lake transactional engine via the JVM embedded in the Spark session, specifically by using the class:

org.apache.spark.sql.delta.DeltaLog

Issue

When I try to use classes from the package org.apache.spark.sql.delta directly from PySpark (through spark._jvm), the classes are not found if I don’t have the Delta Core package installed explicitly on the cluster.

When I install the Delta Core Python package to gain access, I encounter the following Python import error:

ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package

Without the Delta Core package installed, accessing DeltaLog simply returns a generic JavaPackage object that is unusable.

What I want to do Access the Delta transaction log API (DeltaLog) from PySpark via JVM.

Be able to start transactions and commit manually to implement custom write behavior.

Work within the Databricks Runtime 16.4 environment without conflicts or missing dependencies.

Questions

How can I correctly access and use org.apache.spark.sql.delta.DeltaLog from PySpark on Databricks Runtime 16.4?

Is there a supported way to manually manage Delta transactions through the JVM in this environment?

What is the correct setup or package dependency to avoid the ModuleNotFoundError when installing the Delta Core Python package?

Are there any alternatives or recommended patterns to achieve manual Delta commits programmatically on Databricks?

Re: Unable to load org.apache.spark.sql.delta classes from JVM pyspark

NandiniN — Sat, 04 Oct 2025 07:11:38 GMT

Hi @Nasd_,

I believe you are trying to use OSS jars on DBR. (Can infer based on class package)

org.apache.spark.sql.delta.DeltaLog

The error ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package can be seen when installing the open-source delta-spark (or Delta Core) Python package would be a package conflict.

Databricks Runtime includes a native version of the Delta Lake Python libraries that are tightly coupled with the binaries on the cluster. When you install the open-source delta-spark package via %pip or as a cluster library, it often overwrites or conflicts with the native Databricks-provided modules, leading to the Python import error because the structure or contents of the installed package do not match what the Databricks environment expects.

Okay, I just see you have the answer on this thread - https://community.databricks.com/t5/data-engineering/accessing-deltalog-and-optimistictransaction-from-pyspark/td-p/121886 and you have accepted the answer. So, I believe your questions are answered.

Thanks!

topic Re: Unable to load org.apache.spark.sql.delta classes from JVM pyspark in Data Engineering

Unable to load org.apache.spark.sql.delta classes from JVM pyspark

Re: Unable to load org.apache.spark.sql.delta classes from JVM pyspark