cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Adding to PYTHONPATH in interactive Notebooks

uzadude
New Contributor III

I'm trying to set PYTHONPATH env variable in the cluster configuration: `PYTHONPATH=/dbfs/user/blah`. But in the driver and executor envs it is probably getting overridden and i don't see it.

`%sh echo $PYTHONPATH` outputs:

`PYTHONPATH=/databricks/spark/python:/databricks/spark/python/lib/py4j-0.10.9.5-src.zip:/databricks/jars/spark--driver--driver-spark_3.3_2.12_deploy.jar:/WSFS_NOTEBOOK_DIR:/databricks/spark/python:/databricks/python_shell`

and `import sys; print(sys.path)`:

```

'/databricks/python_shell/scripts', '/local_disk0/spark-c87ff3f0-1b67-4ec4-9054-079bba1860a1/userFiles-ea2f1344-51c6-4363-9112-a0dcdff663d0', '/databricks/spark/python', '/databricks/spark/python/lib/py4j-0.10.9.5-src.zip', '/databricks/jars/spark--driver--driver-spark_3.3_2.12_deploy.jar', '/databricks/python_shell', '/usr/lib/python39.zip', '/usr/lib/python3.9', '/usr/lib/python3.9/lib-dynload', '', '/local_disk0/.ephemeral_nfs/envs/pythonEnv-267a0576-e6bd-4505-b257-37a4560e4756/lib/python3.9/site-packages', '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages', '/databricks/python/lib/python3.9/site-packages', '/usr/local/lib/python3.9/dist-packages', '/usr/lib/python3/dist-packages', '/databricks/python/lib/python3.9/site-packages/IPython/extensions', '/root/.ipython'

```

if i work from Repos it does add the repo to everywhere `/Workspace/Repos/user@domain.com/my_repo`, but then i need all my modules to be straight there and it is not convenient.

please let me know if there's a work-around to set a `/dbfs/` path in all nodes without ugly trick of ***** UDF, but straight from the cluster init script or the best would be dynamic `spark.conf` property.

5 REPLIES 5

Harun
Honored Contributor

Hi @Ohad Raviv​ can you try init-scripts, it might help you. https://docs.databricks.com/clusters/init-scripts.html

Cintendo
New Contributor III

init script won't work if you meant export PYTHONPATH env setting. Databricks shell overwrites it when it starts the python interpreter. One approach we make it work is if the code is under /dbfs, we do editable install at init script, e.g.

pip install -e /dbfs/some_repos_code

this creates a easy-install.pth under /databricks/python3 site-packages at cluster initialization, which will append to sys.path to driver and worker.

This approach avoids appending sys.path everywhere in the code, which breaks the code integrity; easier to enforce at cluster level.

We also tried to do the same editable install for Repos under /Workspace but failed. Apparently /Workspace partition is not mounted during cluster initialization. We are going to request databricks to look into this.

uzadude
New Contributor III

do you have any suggestions as to what should I run in the init-script?

setting an env variable there has no effect as it cannot change the main process env.

how would I add a library to the python path?

and even if I could, it would be hard-coded library and I would then need a dedicated cluster configuration for every developer/library.

uzadude
New Contributor III

Update:

At last found a (hacky) solution!

in the driver I can dynamically set the sys.path in the workers with:

`spark._sc._python_includes.append("/dbfs/user/blah/")`

combine that with, in the driver:

```

%load_ext autoreload

%autoreload 2

```

and setting: `spark.conf("spark.python.worker.reuse", "false")`

and we have a fully interactive Spark session with the ability to change python module code without the need to restart the Spark Session/Cluster.

Harun
Honored Contributor

Thats great, Thanks for sharing the solution.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.