cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Adding to PYTHONPATH in interactive Notebooks

uzadude
New Contributor III

I'm trying to set PYTHONPATH env variable in the cluster configuration: `PYTHONPATH=/dbfs/user/blah`. But in the driver and executor envs it is probably getting overridden and i don't see it.

`%sh echo $PYTHONPATH` outputs:

`PYTHONPATH=/databricks/spark/python:/databricks/spark/python/lib/py4j-0.10.9.5-src.zip:/databricks/jars/spark--driver--driver-spark_3.3_2.12_deploy.jar:/WSFS_NOTEBOOK_DIR:/databricks/spark/python:/databricks/python_shell`

and `import sys; print(sys.path)`:

```

'/databricks/python_shell/scripts', '/local_disk0/spark-c87ff3f0-1b67-4ec4-9054-079bba1860a1/userFiles-ea2f1344-51c6-4363-9112-a0dcdff663d0', '/databricks/spark/python', '/databricks/spark/python/lib/py4j-0.10.9.5-src.zip', '/databricks/jars/spark--driver--driver-spark_3.3_2.12_deploy.jar', '/databricks/python_shell', '/usr/lib/python39.zip', '/usr/lib/python3.9', '/usr/lib/python3.9/lib-dynload', '', '/local_disk0/.ephemeral_nfs/envs/pythonEnv-267a0576-e6bd-4505-b257-37a4560e4756/lib/python3.9/site-packages', '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages', '/databricks/python/lib/python3.9/site-packages', '/usr/local/lib/python3.9/dist-packages', '/usr/lib/python3/dist-packages', '/databricks/python/lib/python3.9/site-packages/IPython/extensions', '/root/.ipython'

```

if i work from Repos it does add the repo to everywhere `/Workspace/Repos/user@domain.com/my_repo`, but then i need all my modules to be straight there and it is not convenient.

please let me know if there's a work-around to set a `/dbfs/` path in all nodes without ugly trick of ***** UDF, but straight from the cluster init script or the best would be dynamic `spark.conf` property.

5 REPLIES 5

Harun
Honored Contributor

Hi @Ohad Ravivโ€‹ can you try init-scripts, it might help you. https://docs.databricks.com/clusters/init-scripts.html

Cintendo
New Contributor III

init script won't work if you meant export PYTHONPATH env setting. Databricks shell overwrites it when it starts the python interpreter. One approach we make it work is if the code is under /dbfs, we do editable install at init script, e.g.

pip install -e /dbfs/some_repos_code

this creates a easy-install.pth under /databricks/python3 site-packages at cluster initialization, which will append to sys.path to driver and worker.

This approach avoids appending sys.path everywhere in the code, which breaks the code integrity; easier to enforce at cluster level.

We also tried to do the same editable install for Repos under /Workspace but failed. Apparently /Workspace partition is not mounted during cluster initialization. We are going to request databricks to look into this.

uzadude
New Contributor III

do you have any suggestions as to what should I run in the init-script?

setting an env variable there has no effect as it cannot change the main process env.

how would I add a library to the python path?

and even if I could, it would be hard-coded library and I would then need a dedicated cluster configuration for every developer/library.

uzadude
New Contributor III

Update:

At last found a (hacky) solution!

in the driver I can dynamically set the sys.path in the workers with:

`spark._sc._python_includes.append("/dbfs/user/blah/")`

combine that with, in the driver:

```

%load_ext autoreload

%autoreload 2

```

and setting: `spark.conf("spark.python.worker.reuse", "false")`

and we have a fully interactive Spark session with the ability to change python module code without the need to restart the Spark Session/Cluster.

Harun
Honored Contributor

Thats great, Thanks for sharing the solution.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group