Python library not installed when compute is resized
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2024 01:34 AM
Hi,
I have a python notebook workflow that uses a job cluster. The cluster lost at least a node (due to Spot Instance Termination) and did an upsize. After that I got an error in my job "Module not found", but the python module was being used before the cluster lost the node. So I guess the library was not installed in the new node. This is the first time it happens in this workflow when a node restarts. The cluster I use has 4 workers.
Any idea what might be going wrong? Thanks !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-22-2024 12:52 PM
Hi @Bilel,
How are you doing today?
As per my understanding, Consider installing the library at the cluster level to ensure it's automatically applied across all nodes when a new one is added. You could also try using init scripts to guarantee the required libraries are installed on every node during cluster start or scale-up. It's worth checking your Spot instance and autoscaling settings to ensure they are optimized for stability. If you install libraries via notebook commands, consider reapplying them when a new node is added. Lastly, if node loss happens often, using on-demand instances instead of Spot might help avoid these issues.
Please let me know if it works.
Have a good day.
Regards,
Brahma
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/582998B45490C7019731A5B3A872C751/responsive_peak/images/icon_anonymous_message.png)