<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Permanently add python file path to sys.path in Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17029#M11107</link>
    <description>&lt;P&gt;I've been successfully using this in Delta Live Table pipelines with many nodes. Seems to work for my use case.&lt;/P&gt;</description>
    <pubDate>Thu, 29 Dec 2022 21:14:33 GMT</pubDate>
    <dc:creator>Jfoxyyc</dc:creator>
    <dc:date>2022-12-29T21:14:33Z</dc:date>
    <item>
      <title>Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17018#M11096</link>
      <description>&lt;P&gt;If your notebook is in different directory or subdirectory than python module, you cannot import it until you add it to the Python path.&lt;/P&gt;&lt;P&gt;That means that even though all users are using the same module, but since they are all working from different repos, they cannot import it until they add the path.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I wonder maybe it is possible to add module file path to Databricks sys.path permanently or until the file is deleted.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jun 2022 21:56:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17018#M11096</guid>
      <dc:creator>Direo</dc:creator>
      <dc:date>2022-06-21T21:56:11Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17019#M11097</link>
      <description>&lt;P&gt;You can use it inside the same repo. Provide a whole path from the highest repo level in any notebook inside the repo. As you mentioned, if the file is in another repo, you need to use sys.path.append. To make it permanent, you can try to edit global init scripts.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1751i596CDB88E7A20B17/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from directory.sub_directory.my_file import MyClass
&amp;nbsp;
"""
Repo
-------\directory
------------------\sub_directory
-------------------------------------\my_file 
"""&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jun 2022 17:53:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17019#M11097</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-06-22T17:53:06Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17021#M11099</link>
      <description>&lt;P&gt;@Direo Direo​&amp;nbsp;you can refer to &lt;A href="https://docs.databricks.com/repos/work-with-notebooks-other-files.html#work-with-python-and-r-modules" alt="https://docs.databricks.com/repos/work-with-notebooks-other-files.html#work-with-python-and-r-modules" target="_blank"&gt;this&lt;/A&gt;. The feature is now public preview.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Jul 2022 00:15:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17021#M11099</guid>
      <dc:creator>Prabakar</dc:creator>
      <dc:date>2022-07-16T00:15:55Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17022#M11100</link>
      <description>&lt;P&gt;Hi, the init_script doesn't work for me (worker's pythonpath doesn't get affected).&lt;/P&gt;&lt;P&gt;and the suggested options in the above link don't help either. &lt;/P&gt;&lt;P&gt;is there a way to add another folder to the PYTHONPATH of the workers?&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 12:20:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17022#M11100</guid>
      <dc:creator>uzadude</dc:creator>
      <dc:date>2022-12-13T12:20:28Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17023#M11101</link>
      <description>&lt;P&gt;For worker node, you can set spark config in cluster setting: spark.executorEnv.PYTHONPATH &lt;/P&gt;&lt;P&gt;However you need to make sure you append your Workspace path at the end as worker node needs other system python path. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This seems to be a hack to me. I hope databricks can respond with a more solid solution. &lt;/P&gt;</description>
      <pubDate>Sun, 25 Dec 2022 22:21:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17023#M11101</guid>
      <dc:creator>Cintendo</dc:creator>
      <dc:date>2022-12-25T22:21:53Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17024#M11102</link>
      <description>&lt;P&gt;setting the `spark.executorEnv.PYTHONPATH` did not work for me. it looked like Spark/Databricks overwrite this somewhere. I used a simple python UDF to print some properties like `sys.path` and `os.environ` and didn't see the path I added.&lt;/P&gt;&lt;P&gt;Finally, I found a hacky way of using `spark._sc._python_includes`.&lt;/P&gt;&lt;P&gt;you can see my answer to my self &lt;A href="https://community.databricks.com/s/feed/0D58Y00009bzRHXSA2" alt="https://community.databricks.com/s/feed/0D58Y00009bzRHXSA2" target="_blank"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2022 07:12:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17024#M11102</guid>
      <dc:creator>uzadude</dc:creator>
      <dc:date>2022-12-26T07:12:48Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17025#M11103</link>
      <description>&lt;P&gt;Thanks @Ohad Raviv​&amp;nbsp;. I will try your approach.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.executorEnv.PYTHONPATH works only for worker node not driver node. And it needs to set at the cluster initialization stage (under Spark tab). After cluster initialized, databricks overwrite it even if you manually do spark.conf.set.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I prefer setting environment not thru code as codying it breaks the code integrity. It is hard to enforce it when multiple people working on the same cluster. I wish there is a better way in databricks cluster screen, it allows users to append sys.path after the default; or allow people to do editable install (pip install -e) during development.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I checked the worker node PYTHONPATH using the following to make sure it gets appended.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;def getworkerenv():&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;import os&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;return(os.getenv('PYTHONPATH'))&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;sc = spark.sparkContext&lt;/P&gt;&lt;P&gt;sc.parallelize([1]).map(lambda x: getworkerenv()).collect()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2022 13:55:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17025#M11103</guid>
      <dc:creator>Cintendo</dc:creator>
      <dc:date>2022-12-26T13:55:27Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17026#M11104</link>
      <description>&lt;P&gt;the hacky solution above is meant to be used only while developing my own python module - this way I can avoid packaging a whl, deploying to the cluster, restarting the cluster and even restarting the notebook interpreter.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I agree that it is not suited for production. For that I would use either a whl ref in the workflow file or just prepare a docker image.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2022 17:21:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17026#M11104</guid>
      <dc:creator>uzadude</dc:creator>
      <dc:date>2022-12-26T17:21:21Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17027#M11105</link>
      <description>&lt;P&gt;To be honest I'm just inspecting which repo folder I'm running from (dev/test/prod) and sys.path.appending an appropriate path before importing my packages. Seems to work and its covered by the Terraform provider.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 07:25:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17027#M11105</guid>
      <dc:creator>Jfoxyyc</dc:creator>
      <dc:date>2022-12-29T07:25:39Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17028#M11106</link>
      <description>&lt;P&gt;The issue with that is that the driver's sys.path is not added to the executors' sys.path, and you could get "module not found" error if your code tries to import one of your modules.&lt;/P&gt;&lt;P&gt;but it will work fine for simple code that is self-contained.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 16:29:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17028#M11106</guid>
      <dc:creator>uzadude</dc:creator>
      <dc:date>2022-12-29T16:29:21Z</dc:date>
    </item>
    <item>
      <title>Re: Permanently add python file path to sys.path in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17029#M11107</link>
      <description>&lt;P&gt;I've been successfully using this in Delta Live Table pipelines with many nodes. Seems to work for my use case.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 21:14:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/permanently-add-python-file-path-to-sys-path-in-databricks/m-p/17029#M11107</guid>
      <dc:creator>Jfoxyyc</dc:creator>
      <dc:date>2022-12-29T21:14:33Z</dc:date>
    </item>
  </channel>
</rss>

