cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Module not found, despite it being installed on job cluster?

mvmiller
New Contributor III

We observed the following error in a notebook which was running from a Databricks workflow:

ModuleNotFoundError: No module named '<python package>'

The error message speaks for itself - it obviously couldn't find the python package.  What is peculiar is that this is a library that we had manually specified for installation, at the job cluster level.  And indeed, when we checked the job cluster settings of this failed job (via the "Edit Details" button under "Compute", then clicking the "Libraries" tab), we verified that the python package (Type "PyPi", for whatever it's worth) is indeed listed there.

We are using Databricks runtime 14.2 (Apache Spark 3.5.0, Scala 2.12)

Our job runs daily, normally runs fine, and since this error has been running fine.  This error appears to have been a one-off.

Has anyone else run into the issue?  Is this a known issue in Databricks, or with distributed computing in general? Is there anyway to prevent it?

2 REPLIES 2

Walter_C
Valued Contributor II
Valued Contributor II

Here are a few possible explanations and solutions:

  1. Transient Issue: Considering that the error was a one-off and the job has been running fine since then, it's possible that it was a transient issue. Transient issues can occur due to temporary network glitches, issues with the PyPi server at the time of the job run, or other temporary problems.

  2. Cluster Initialization Timing: Sometimes, if a job starts running before all the libraries have been fully installed on the cluster, it can lead to a ModuleNotFoundError. This is more likely to happen if the cluster is just starting up and the job starts running immediately.

  3. Package Installation Failure: There might been an issue with the installation of the package for that particular run. You can check the cluster logs for any errors or warnings related to the package installation.

  4. Package Compatibility Issue: Ensure that the package is compatible with the Python version and the Databricks runtime version you're using.

mvmiller
New Contributor III

Thanks, @Walter_C.  Supposing that your second possible explanation, Cluster Initialization Timing, could be a factor, are there any best practices or recommendations for preventing this from being a recurring issue, down the road?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.