init_script breaks Notebooks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-17-2024 05:29 AM
Hi everyone
We would like to use our private company Python repository for installing Python libraries with pip install.
To achieve this, I created a simple script which sets the index-url configuration of pip to our private repo
I set this script as an init_script for my personal compute cluster.
When I start the cluster, the script loads successfully, and I can see that the two lines are correctly added to the pip.conf file.
However, here's the issue: I am unable to execute any Python/Spark/SQL/R commands from a notebook attached to this cluster. As a result, pip install <library> also doesn't work in the notebook.
Interestingly, when I run pip install directly in the cluster's web terminal, it works perfectly as intended.
Has anyone encountered similar issues?
PS: Just tested: any init_script will cause the same issue, even if the init script is empty! I'm not able to execute any language commands in a attacked notebook.
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-17-2024 06:05 AM
Does it throws any error message or it just hangs? Have you tried additional DBRs on the cluster to see if this is an issue with the DBR version being run or general issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-17-2024 06:49 AM
I had a similar issue, I installed a private library and I could not use the cluster in a notebook for any commands.
I fixed it cloning the cluster, it was like the cluster was saturated. The error I got was something like this:
Failure starting repl. Try detaching and re-attaching the notebook.
Other alternative is to install the new libraries in the notebook instead of using an init_script in the first cells.
Hope it works!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-18-2024 06:58 AM
Hei everyone, thanks for your answers!
@Walter_C I checked and saw that after like 5mins the Event Log shows that "Metastore is down". I see in the driver logs that the connection build up is timed out. I wonder how a init_script with only the following line can cause this:
#!/bin/bash
When I check the spark logs I see some lines like "locked java.net.SocksSocketImp"
@Palo Yes, in the end this is the error message, when I try to execute a command from a notebook.
To your alternative-idea: This works perfectly I know. But we want our company repo to be enforced, so users can just type "pip install <package-name>" without thinking from where they are downloading their packages.
I did further tests:
- Same behaviour on different DBR versions
- Same behaviour, if I put the file in a volume or directly in the workspace
- Same behaviour, even after checking with "dos2unix" that the init_script doesn't have any non-unix stuff inside
I will further try to do it directly with terraform:
Any input on this matter is welcome, thanks everybody.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-18-2024 08:16 AM
Did you also try cloning the cluster or using other cluster for the testing? The metastore down is normally a Hive Metastore issue, should not be impacting here, but you could check for more details on the error on the log4j under Driver logs.

