Hi @Bilel,
How are you doing today?
As per my understanding, Consider installing the library at the cluster level to ensure it's automatically applied across all nodes when a new one is added. You could also try using init scripts to guarantee the required libraries are installed on every node during cluster start or scale-up. It's worth checking your Spot instance and autoscaling settings to ensure they are optimized for stability. If you install libraries via notebook commands, consider reapplying them when a new node is added. Lastly, if node loss happens often, using on-demand instances instead of Spot might help avoid these issues.
Please let me know if it works.
Have a good day.
Regards,
Brahma