Hello,
I've looked around, but cannot find an answer. In my Azure Databricks workspace, users have Python notebooks which all make use of the same helper functions and classes. Instead of housing the helper code in notebooks and having %run magics in notebooks, I want to organize my helper code in .py files so that users can import them as modules.
This would work if I had, say, foo.py in the same folder where my notebook is (then I could import foo), or in a subfolder (then I could from subfolder import foo). But the notebooks are arranged in different folders, and foo.py is somewhere else, so it will not always be in the notebook's sys.path.
What can I do to make foo.py available for import in any notebook in any folder?
I know I could import sys and sys.path.append(<path to foo>) at the start of every notebook, but I don't want to do that either, as I'm trying to make things simple for people writing notebooks.
I tried using a cluster init script which would 1) create a folder for foo.py under /databricks/python_scripts/, which happens to be in my sys.path, and 2) copy foo.py there from /Workspace/Shared/foo/. The script would create the folder all right, but could not copy Workspace files there. (I tried placing foo.py in an Azure Data Lake Storage, and having the init script copy it from there. That worked, but clearly that's not a good solution.)
Perhaps I could create a library for foo.py and install it on the cluster, but that seems pretty convoluted compared to simply having foo.py somewhere the import command will find it. How can I accomplish that?
Thanks,
JS