cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Import python files as modules in workspace

Avin_Kohale
New Contributor

I'm deploying a new workspace for testing the deployed notebooks. But when trying to import the python files as module in the newly deployed workspace, I'm getting an error saying "function not found".

Two points to note here:

1. If I append absolute path of the python file to the sys path, it runs fine. But when I append relative path, it throws "not found" error (Relative path is not able to pick workspace's path correctly)

2. If the python file and the notebook are in a same directory, it works fine

I know I can use files in repos to fix this but is it possible to do this using workspace only ?

NOTE:

In the deployed workspace ,The cwd of the file i want to import is in this format : /home/spark-9851f...-....-....-....-.. (not aligned with the folder structure)

But when working with repos, the cwd is correct and is aligned with the folder structure

4 REPLIES 4

TimReddick
Contributor

Hi @Retired_mod, I see your suggestion to append the necessary path to the sys.path. I'm curious if this is the recommendation for projects deployed via Databricks Asset Bundles. I want to maintain a project structure that looks something like this:

project/
├── app/
│   ├── nb1.py
│   ├── nb2.py
└── src/
    ├── foo.py
    └── bar.py

I want do the following import in nb1:

from src.foo import foo_func

If this were a Databricks Repo, that would work fine since I think Databricks repos add the root to sys.path. However, I'm deploying via Databricks Asset Bundles, which deploy to a workspace directory, not a repo. I'm curious if there are any better recommendations for Databricks Asset Bundles deployments, e.g. could it be deployed directly to a repo?

@TimW did you ever solve this? I haven't found a successful way to achieve the same as you've depicted (which we can easily do when using Repos.

Hi @JeremyFord, after some research I think that in my example appending the 'project' directory to sys.path is in fact the recommended way to do this. In studying for the Data Engineering Professional Exam, I came across this resource, which gives some pretty clear examples on how Databricks recommends importing .py modules from outside of your current working directory: https://github.com/databricks-academy/cli-demo/blob/published/notebooks/00_refactoring_to_relative_i....

@JeremyFord, I found this recommendation that is probably the best way to handle this. They suggest passing the path of your bundle root as a param to your notebook and then appending to sys.path. I haven't tried it myself, but looks like a good approach.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group