Import python files as modules in workspace
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-20-2023 11:50 PM
I'm deploying a new workspace for testing the deployed notebooks. But when trying to import the python files as module in the newly deployed workspace, I'm getting an error saying "function not found".
Two points to note here:
1. If I append absolute path of the python file to the sys path, it runs fine. But when I append relative path, it throws "not found" error (Relative path is not able to pick workspace's path correctly)
2. If the python file and the notebook are in a same directory, it works fine
I know I can use files in repos to fix this but is it possible to do this using workspace only ?
NOTE:
In the deployed workspace ,The cwd of the file i want to import is in this format : /home/spark-9851f...-....-....-....-.. (not aligned with the folder structure)
But when working with repos, the cwd is correct and is aligned with the folder structure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2023 09:06 AM
Hi @Retired_mod, I see your suggestion to append the necessary path to the sys.path. I'm curious if this is the recommendation for projects deployed via Databricks Asset Bundles. I want to maintain a project structure that looks something like this:
project/
├── app/
│ ├── nb1.py
│ ├── nb2.py
└── src/
├── foo.py
└── bar.py
I want do the following import in nb1:
from src.foo import foo_func
If this were a Databricks Repo, that would work fine since I think Databricks repos add the root to sys.path. However, I'm deploying via Databricks Asset Bundles, which deploy to a workspace directory, not a repo. I'm curious if there are any better recommendations for Databricks Asset Bundles deployments, e.g. could it be deployed directly to a repo?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-15-2023 08:59 PM
@TimW did you ever solve this? I haven't found a successful way to achieve the same as you've depicted (which we can easily do when using Repos.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-16-2023 05:25 AM
Hi @JeremyFord, after some research I think that in my example appending the 'project' directory to sys.path is in fact the recommended way to do this. In studying for the Data Engineering Professional Exam, I came across this resource, which gives some pretty clear examples on how Databricks recommends importing .py modules from outside of your current working directory: https://github.com/databricks-academy/cli-demo/blob/published/notebooks/00_refactoring_to_relative_i....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-17-2023 02:58 AM
@JeremyFord, I found this recommendation that is probably the best way to handle this. They suggest passing the path of your bundle root as a param to your notebook and then appending to sys.path. I haven't tried it myself, but looks like a good approach.