cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
cancel
Showing results for 
Search instead for 
Did you mean: 

Import python files as modules in workspace

Avin_Kohale
New Contributor

I'm deploying a new workspace for testing the deployed notebooks. But when trying to import the python files as module in the newly deployed workspace, I'm getting an error saying "function not found".

Two points to note here:

1. If I append absolute path of the python file to the sys path, it runs fine. But when I append relative path, it throws "not found" error (Relative path is not able to pick workspace's path correctly)

2. If the python file and the notebook are in a same directory, it works fine

I know I can use files in repos to fix this but is it possible to do this using workspace only ?

NOTE:

In the deployed workspace ,The cwd of the file i want to import is in this format : /home/spark-9851f...-....-....-....-.. (not aligned with the folder structure)

But when working with repos, the cwd is correct and is aligned with the folder structure

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @Avin_Kohale , 

An error was encountered when importing Python files as modules in a newly deployed workspace
- Error message: "Function not found."


- Two points to note:
 • Relative path not able to correctly pick the workspace's path
 • Python file and notebook need to be in the same directory for it to work


- Steps to resolve the issue using the workspace:
 1. Ensure the Python file and notebook are in the same directory
 2. If not in the same directory, use __file__ attribute to get the absolute path of the notebook and construct the relative path to Python file
    • Example code provided
 3. Append relative path to sys path and import python file as a module

TimReddick
New Contributor III

Hi @Kaniz, I see your suggestion to append the necessary path to the sys.path. I'm curious if this is the recommendation for projects deployed via Databricks Asset Bundles. I want to maintain a project structure that looks something like this:

project/
├── app/
│   ├── nb1.py
│   ├── nb2.py
└── src/
    ├── foo.py
    └── bar.py

I want do the following import in nb1:

from src.foo import foo_func

If this were a Databricks Repo, that would work fine since I think Databricks repos add the root to sys.path. However, I'm deploying via Databricks Asset Bundles, which deploy to a workspace directory, not a repo. I'm curious if there are any better recommendations for Databricks Asset Bundles deployments, e.g. could it be deployed directly to a repo?

@TimW did you ever solve this? I haven't found a successful way to achieve the same as you've depicted (which we can easily do when using Repos.

Hi @JeremyFord, after some research I think that in my example appending the 'project' directory to sys.path is in fact the recommended way to do this. In studying for the Data Engineering Professional Exam, I came across this resource, which gives some pretty clear examples on how Databricks recommends importing .py modules from outside of your current working directory: https://github.com/databricks-academy/cli-demo/blob/published/notebooks/00_refactoring_to_relative_i....

@JeremyFord, I found this recommendation that is probably the best way to handle this. They suggest passing the path of your bundle root as a param to your notebook and then appending to sys.path. I haven't tried it myself, but looks like a good approach.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.