cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Module not found, despite it being installed on job cluster?

mvmiller
New Contributor III

We observed the following error in a notebook which was running from a Databricks workflow:

ModuleNotFoundError: No module named '<python package>'

The error message speaks for itself - it obviously couldn't find the python package.  What is peculiar is that this is a library that we had manually specified for installation, at the job cluster level.  And indeed, when we checked the job cluster settings of this failed job (via the "Edit Details" button under "Compute", then clicking the "Libraries" tab), we verified that the python package (Type "PyPi", for whatever it's worth) is indeed listed there.

We are using Databricks runtime 14.2 (Apache Spark 3.5.0, Scala 2.12)

Our job runs daily, normally runs fine, and since this error has been running fine.  This error appears to have been a one-off.

Has anyone else run into the issue?  Is this a known issue in Databricks, or with distributed computing in general? Is there anyway to prevent it?

2 REPLIES 2

Walter_C
Databricks Employee
Databricks Employee

Here are a few possible explanations and solutions:

  1. Transient Issue: Considering that the error was a one-off and the job has been running fine since then, it's possible that it was a transient issue. Transient issues can occur due to temporary network glitches, issues with the PyPi server at the time of the job run, or other temporary problems.

  2. Cluster Initialization Timing: Sometimes, if a job starts running before all the libraries have been fully installed on the cluster, it can lead to a ModuleNotFoundError. This is more likely to happen if the cluster is just starting up and the job starts running immediately.

  3. Package Installation Failure: There might been an issue with the installation of the package for that particular run. You can check the cluster logs for any errors or warnings related to the package installation.

  4. Package Compatibility Issue: Ensure that the package is compatible with the Python version and the Databricks runtime version you're using.

mvmiller
New Contributor III

Thanks, @Walter_C.  Supposing that your second possible explanation, Cluster Initialization Timing, could be a factor, are there any best practices or recommendations for preventing this from being a recurring issue, down the road?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group