<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Module not found, despite it being installed on job cluster? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/module-not-found-despite-it-being-installed-on-job-cluster/m-p/64336#M32544</link>
    <description>&lt;P&gt;&lt;SPAN&gt;We observed the following error in a notebook which was running from a Databricks workflow: &lt;/SPAN&gt;&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;EM&gt;ModuleNotFoundError: No module named '&amp;lt;python package&amp;gt;'&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;The error message speaks for itself - it obviously couldn't find the python package.&amp;nbsp; What is peculiar is that this is a library that we had manually specified for installation, at the job cluster level.&amp;nbsp; And indeed, when we checked the job cluster settings of this failed job (via the "Edit Details" button under "Compute", then clicking the "Libraries" tab), we verified that the python package (Type "PyPi", for whatever it's worth) is indeed listed there.&lt;/P&gt;&lt;P&gt;We are using Databricks runtime 14.2 (Apache Spark 3.5.0, Scala 2.12)&lt;/P&gt;&lt;P&gt;Our job runs daily, normally runs fine, and since this error has been running fine.&amp;nbsp; This error appears to have been a one-off.&lt;/P&gt;&lt;P&gt;Has anyone else run into the issue?&amp;nbsp; Is this a known issue in Databricks, or with distributed computing in general? Is there anyway to prevent it?&lt;/P&gt;</description>
    <pubDate>Thu, 21 Mar 2024 20:36:10 GMT</pubDate>
    <dc:creator>mvmiller</dc:creator>
    <dc:date>2024-03-21T20:36:10Z</dc:date>
    <item>
      <title>Module not found, despite it being installed on job cluster?</title>
      <link>https://community.databricks.com/t5/data-engineering/module-not-found-despite-it-being-installed-on-job-cluster/m-p/64336#M32544</link>
      <description>&lt;P&gt;&lt;SPAN&gt;We observed the following error in a notebook which was running from a Databricks workflow: &lt;/SPAN&gt;&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;EM&gt;ModuleNotFoundError: No module named '&amp;lt;python package&amp;gt;'&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;The error message speaks for itself - it obviously couldn't find the python package.&amp;nbsp; What is peculiar is that this is a library that we had manually specified for installation, at the job cluster level.&amp;nbsp; And indeed, when we checked the job cluster settings of this failed job (via the "Edit Details" button under "Compute", then clicking the "Libraries" tab), we verified that the python package (Type "PyPi", for whatever it's worth) is indeed listed there.&lt;/P&gt;&lt;P&gt;We are using Databricks runtime 14.2 (Apache Spark 3.5.0, Scala 2.12)&lt;/P&gt;&lt;P&gt;Our job runs daily, normally runs fine, and since this error has been running fine.&amp;nbsp; This error appears to have been a one-off.&lt;/P&gt;&lt;P&gt;Has anyone else run into the issue?&amp;nbsp; Is this a known issue in Databricks, or with distributed computing in general? Is there anyway to prevent it?&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 20:36:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/module-not-found-despite-it-being-installed-on-job-cluster/m-p/64336#M32544</guid>
      <dc:creator>mvmiller</dc:creator>
      <dc:date>2024-03-21T20:36:10Z</dc:date>
    </item>
    <item>
      <title>Re: Module not found, despite it being installed on job cluster?</title>
      <link>https://community.databricks.com/t5/data-engineering/module-not-found-despite-it-being-installed-on-job-cluster/m-p/64437#M32570</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;Here are a few possible explanations and solutions:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Transient Issue:&lt;/STRONG&gt; Considering that the error was a one-off and the job has been running fine since then, it's possible that it was a transient issue. Transient issues can occur due to temporary network glitches, issues with the PyPi server at the time of the job run, or other temporary problems.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Cluster Initialization Timing:&lt;/STRONG&gt; Sometimes, if a job starts running before all the libraries have been fully installed on the cluster, it can lead to a &lt;CODE&gt;ModuleNotFoundError&lt;/CODE&gt;. This is more likely to happen if the cluster is just starting up and the job starts running immediately.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Package Installation Failure:&lt;/STRONG&gt; There might been an issue with the installation of the package for that particular run. You can check the cluster logs for any errors or warnings related to the package installation.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Package Compatibility Issue:&lt;/STRONG&gt; Ensure that the package is compatible with the Python version and the Databricks runtime version you're using.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Sat, 23 Mar 2024 14:09:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/module-not-found-despite-it-being-installed-on-job-cluster/m-p/64437#M32570</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-03-23T14:09:14Z</dc:date>
    </item>
    <item>
      <title>Re: Module not found, despite it being installed on job cluster?</title>
      <link>https://community.databricks.com/t5/data-engineering/module-not-found-despite-it-being-installed-on-job-cluster/m-p/64512#M32593</link>
      <description>&lt;P&gt;Thanks, &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/88823"&gt;@Walter_C&lt;/a&gt;.&amp;nbsp; Supposing that your second possible explanation, Cluster Initialization Timing, could be a factor, are there any best practices or recommendations for preventing this from being a recurring issue, down the road?&lt;/P&gt;</description>
      <pubDate>Mon, 25 Mar 2024 13:03:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/module-not-found-despite-it-being-installed-on-job-cluster/m-p/64512#M32593</guid>
      <dc:creator>mvmiller</dc:creator>
      <dc:date>2024-03-25T13:03:14Z</dc:date>
    </item>
  </channel>
</rss>

