<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Adding to PYTHONPATH in interactive Notebooks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17115#M11165</link>
    <description>&lt;P&gt;Hi @Ohad Raviv​&amp;nbsp;can you try init-scripts, it might help you.  &lt;A href="https://docs.databricks.com/clusters/init-scripts.html?_ga=2.226279322.1472438208.1670843912-2025526080.1630492414&amp;amp;_gac=1.15030340.1668783414.Cj0KCQiA99ybBhD9ARIsALvZavVTlGVm5K1jVwBdB3TXKakyYq93IuaymoQ3XPTdvzrmP_oUXKk6Cn4aAqyAEALw_wcB" alt="https://docs.databricks.com/clusters/init-scripts.html?_ga=2.226279322.1472438208.1670843912-2025526080.1630492414&amp;amp;_gac=1.15030340.1668783414.Cj0KCQiA99ybBhD9ARIsALvZavVTlGVm5K1jVwBdB3TXKakyYq93IuaymoQ3XPTdvzrmP_oUXKk6Cn4aAqyAEALw_wcB" target="_blank"&gt;https://docs.databricks.com/clusters/init-scripts.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 13 Dec 2022 11:17:30 GMT</pubDate>
    <dc:creator>Harun</dc:creator>
    <dc:date>2022-12-13T11:17:30Z</dc:date>
    <item>
      <title>Adding to PYTHONPATH in interactive Notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17114#M11164</link>
      <description>&lt;P&gt;I'm trying to set PYTHONPATH env variable in the cluster configuration: `PYTHONPATH=/dbfs/user/blah`. But in the driver and executor envs it is probably getting overridden and i don't see it.&lt;/P&gt;&lt;P&gt;`%sh echo $PYTHONPATH` outputs:&lt;/P&gt;&lt;P&gt;`PYTHONPATH=/databricks/spark/python:/databricks/spark/python/lib/py4j-0.10.9.5-src.zip:/databricks/jars/spark--driver--driver-spark_3.3_2.12_deploy.jar:/WSFS_NOTEBOOK_DIR:/databricks/spark/python:/databricks/python_shell`&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and `import sys; print(sys.path)`:&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;'/databricks/python_shell/scripts', '/local_disk0/spark-c87ff3f0-1b67-4ec4-9054-079bba1860a1/userFiles-ea2f1344-51c6-4363-9112-a0dcdff663d0', '/databricks/spark/python', '/databricks/spark/python/lib/py4j-0.10.9.5-src.zip', '/databricks/jars/spark--driver--driver-spark_3.3_2.12_deploy.jar', '/databricks/python_shell', '/usr/lib/python39.zip', '/usr/lib/python3.9', '/usr/lib/python3.9/lib-dynload', '', '/local_disk0/.ephemeral_nfs/envs/pythonEnv-267a0576-e6bd-4505-b257-37a4560e4756/lib/python3.9/site-packages', '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages', '/databricks/python/lib/python3.9/site-packages', '/usr/local/lib/python3.9/dist-packages', '/usr/lib/python3/dist-packages', '/databricks/python/lib/python3.9/site-packages/IPython/extensions', '/root/.ipython'&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;if i work from Repos it does add the repo to everywhere `/Workspace/Repos/user@domain.com/my_repo`, but then i need all my modules to be straight there and it is not convenient.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;please let me know if there's a work-around to set a `/dbfs/` path in all nodes without ugly trick of ***** UDF, but straight from the cluster init script or the best would be dynamic `spark.conf` property.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 11:11:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17114#M11164</guid>
      <dc:creator>uzadude</dc:creator>
      <dc:date>2022-12-13T11:11:20Z</dc:date>
    </item>
    <item>
      <title>Re: Adding to PYTHONPATH in interactive Notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17115#M11165</link>
      <description>&lt;P&gt;Hi @Ohad Raviv​&amp;nbsp;can you try init-scripts, it might help you.  &lt;A href="https://docs.databricks.com/clusters/init-scripts.html?_ga=2.226279322.1472438208.1670843912-2025526080.1630492414&amp;amp;_gac=1.15030340.1668783414.Cj0KCQiA99ybBhD9ARIsALvZavVTlGVm5K1jVwBdB3TXKakyYq93IuaymoQ3XPTdvzrmP_oUXKk6Cn4aAqyAEALw_wcB" alt="https://docs.databricks.com/clusters/init-scripts.html?_ga=2.226279322.1472438208.1670843912-2025526080.1630492414&amp;amp;_gac=1.15030340.1668783414.Cj0KCQiA99ybBhD9ARIsALvZavVTlGVm5K1jVwBdB3TXKakyYq93IuaymoQ3XPTdvzrmP_oUXKk6Cn4aAqyAEALw_wcB" target="_blank"&gt;https://docs.databricks.com/clusters/init-scripts.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 11:17:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17115#M11165</guid>
      <dc:creator>Harun</dc:creator>
      <dc:date>2022-12-13T11:17:30Z</dc:date>
    </item>
    <item>
      <title>Re: Adding to PYTHONPATH in interactive Notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17116#M11166</link>
      <description>&lt;P&gt;do you have any suggestions as to what should I run in the init-script?&lt;/P&gt;&lt;P&gt;setting an env variable there has no effect as it cannot change the main process env.&lt;/P&gt;&lt;P&gt;how would I add a library to the python path?&lt;/P&gt;&lt;P&gt;and even if I could, it would be hard-coded library and I would then need a dedicated cluster configuration for every developer/library.&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 16:38:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17116#M11166</guid>
      <dc:creator>uzadude</dc:creator>
      <dc:date>2022-12-13T16:38:53Z</dc:date>
    </item>
    <item>
      <title>Re: Adding to PYTHONPATH in interactive Notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17117#M11167</link>
      <description>&lt;P&gt;Update:&lt;/P&gt;&lt;P&gt;At last found a (hacky) solution!&lt;/P&gt;&lt;P&gt;in the driver I can dynamically set the sys.path in the workers with:&lt;/P&gt;&lt;P&gt;`spark._sc._python_includes.append("/dbfs/user/blah/")`&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;combine that with, in the driver:&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;%load_ext autoreload&lt;/P&gt;&lt;P&gt;%autoreload 2&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and setting: `spark.conf("spark.python.worker.reuse", "false")`&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and we have a fully interactive Spark session with the ability to change python module code without the need to restart the Spark Session/Cluster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2022 07:50:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17117#M11167</guid>
      <dc:creator>uzadude</dc:creator>
      <dc:date>2022-12-14T07:50:44Z</dc:date>
    </item>
    <item>
      <title>Re: Adding to PYTHONPATH in interactive Notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17118#M11168</link>
      <description>&lt;P&gt;Thats great, Thanks for sharing the solution.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2022 12:19:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17118#M11168</guid>
      <dc:creator>Harun</dc:creator>
      <dc:date>2022-12-14T12:19:06Z</dc:date>
    </item>
    <item>
      <title>Re: Adding to PYTHONPATH in interactive Notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17119#M11169</link>
      <description>&lt;P&gt;init script won't work if you meant export PYTHONPATH env setting. Databricks shell overwrites it when it starts the python interpreter. One approach we make it work is if the code is under /dbfs, we do editable install at init script, e.g. &lt;/P&gt;&lt;P&gt;&lt;B&gt;pip install -e /dbfs/some_repos_code&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;this creates a easy-install.pth under /databricks/python3 site-packages at cluster initialization, which will append to sys.path to driver and worker. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This approach avoids appending sys.path everywhere in the code, which breaks the code integrity; easier to enforce at cluster level.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We also tried to do the same editable install for Repos under /Workspace but failed. Apparently /Workspace partition is not mounted during cluster initialization.  We are going to request databricks to look into this. &lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2022 14:06:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/adding-to-pythonpath-in-interactive-notebooks/m-p/17119#M11169</guid>
      <dc:creator>Cintendo</dc:creator>
      <dc:date>2022-12-26T14:06:30Z</dc:date>
    </item>
  </channel>
</rss>

