cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Unable to load org.apache.spark.sql.delta classes from JVM pyspark

Nasd_
New Contributor II

Hello,

I’m working on Databricks with a cluster running Runtime 16.4, which includes Spark 3.5.2 and Scala 2.12.

For a specific need, I want to implement my own custom way of writing to Delta tables by manually managing Delta transactions from PySpark. To do this, I want to access the Delta Lake transactional engine via the JVM embedded in the Spark session, specifically by using the class:

org.apache.spark.sql.delta.DeltaLog

Issue

When I try to use classes from the package org.apache.spark.sql.delta directly from PySpark (through spark._jvm), the classes are not found if I don’t have the Delta Core package installed explicitly on the cluster.

When I install the Delta Core Python package to gain access, I encounter the following Python import error:

ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package

Without the Delta Core package installed, accessing DeltaLog simply returns a generic JavaPackage object that is unusable.

What I want to do Access the Delta transaction log API (DeltaLog) from PySpark via JVM.

Be able to start transactions and commit manually to implement custom write behavior.

Work within the Databricks Runtime 16.4 environment without conflicts or missing dependencies.

Questions

How can I correctly access and use org.apache.spark.sql.delta.DeltaLog from PySpark on Databricks Runtime 16.4?

Is there a supported way to manually manage Delta transactions through the JVM in this environment?

What is the correct setup or package dependency to avoid the ModuleNotFoundError when installing the Delta Core Python package?

Are there any alternatives or recommended patterns to achieve manual Delta commits programmatically on Databricks?

1 ACCEPTED SOLUTION

Accepted Solutions

NandiniN
Databricks Employee
Databricks Employee

Hi @Nasd_

I believe you are trying to use OSS jars on DBR. (Can infer based on class package)

org.apache.spark.sql.delta.DeltaLog

The error ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package can be seen when installing the open-source delta-spark (or Delta Core) Python package would be a package conflict

Databricks Runtime includes a native version of the Delta Lake Python libraries that are tightly coupled with the binaries on the cluster. When you install the open-source delta-spark package via %pip or as a cluster library, it often overwrites or conflicts with the native Databricks-provided modules, leading to the Python import error because the structure or contents of the installed package do not match what the Databricks environment expects.

Okay, I just see you have the answer on this thread - https://community.databricks.com/t5/data-engineering/accessing-deltalog-and-optimistictransaction-fr... and you have accepted the answer. So, I believe your questions are answered.

Thanks!

View solution in original post

1 REPLY 1

NandiniN
Databricks Employee
Databricks Employee

Hi @Nasd_

I believe you are trying to use OSS jars on DBR. (Can infer based on class package)

org.apache.spark.sql.delta.DeltaLog

The error ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package can be seen when installing the open-source delta-spark (or Delta Core) Python package would be a package conflict

Databricks Runtime includes a native version of the Delta Lake Python libraries that are tightly coupled with the binaries on the cluster. When you install the open-source delta-spark package via %pip or as a cluster library, it often overwrites or conflicts with the native Databricks-provided modules, leading to the Python import error because the structure or contents of the installed package do not match what the Databricks environment expects.

Okay, I just see you have the answer on this thread - https://community.databricks.com/t5/data-engineering/accessing-deltalog-and-optimistictransaction-fr... and you have accepted the answer. So, I believe your questions are answered.

Thanks!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now