<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Python library not installed when compute is resized in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/python-library-not-installed-when-compute-is-resized/m-p/91014#M38060</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a python notebook workflow that uses a job cluster. The cluster lost at least a node (due to Spot Instance Termination) and did an upsize. After that I got an error in my job "Module not found", but the python module was being used before the cluster lost the node. So I guess the library was not installed in the new node. This is the first time it happens in this workflow when a node restarts. The cluster I use has 4 workers.&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P data-unlink="true"&gt;&lt;SPAN&gt;Any idea what might be going wrong? Thanks !&lt;/SPAN&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 19 Sep 2024 08:34:29 GMT</pubDate>
    <dc:creator>Bilel</dc:creator>
    <dc:date>2024-09-19T08:34:29Z</dc:date>
    <item>
      <title>Python library not installed when compute is resized</title>
      <link>https://community.databricks.com/t5/data-engineering/python-library-not-installed-when-compute-is-resized/m-p/91014#M38060</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a python notebook workflow that uses a job cluster. The cluster lost at least a node (due to Spot Instance Termination) and did an upsize. After that I got an error in my job "Module not found", but the python module was being used before the cluster lost the node. So I guess the library was not installed in the new node. This is the first time it happens in this workflow when a node restarts. The cluster I use has 4 workers.&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P data-unlink="true"&gt;&lt;SPAN&gt;Any idea what might be going wrong? Thanks !&lt;/SPAN&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 19 Sep 2024 08:34:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-library-not-installed-when-compute-is-resized/m-p/91014#M38060</guid>
      <dc:creator>Bilel</dc:creator>
      <dc:date>2024-09-19T08:34:29Z</dc:date>
    </item>
    <item>
      <title>Re: Python library not installed when compute is resized</title>
      <link>https://community.databricks.com/t5/data-engineering/python-library-not-installed-when-compute-is-resized/m-p/91343#M38153</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/121758"&gt;@Bilel&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;How are you doing today?&lt;/P&gt;&lt;P&gt;As per my understanding,&amp;nbsp;Consider &lt;STRONG&gt;installing the library at the cluster level&lt;/STRONG&gt; to ensure it's automatically applied across all nodes when a new one is added. You could also try using &lt;STRONG&gt;init scripts&lt;/STRONG&gt; to guarantee the required libraries are installed on every node during cluster start or scale-up. It's worth checking your &lt;STRONG&gt;Spot instance and autoscaling settings&lt;/STRONG&gt; to ensure they are optimized for stability. If you install libraries via notebook commands, consider reapplying them when a new node is added. Lastly, if node loss happens often, &lt;STRONG&gt;using on-demand instances&lt;/STRONG&gt; instead of Spot might help avoid these issues.&lt;/P&gt;&lt;P&gt;Please let me know if it works.&lt;/P&gt;&lt;P&gt;Have a good day.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Sun, 22 Sep 2024 19:52:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-library-not-installed-when-compute-is-resized/m-p/91343#M38153</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2024-09-22T19:52:01Z</dc:date>
    </item>
  </channel>
</rss>

