10-04-2024 11:12 PM
Hi everyone,
I’m currently working on a project in Databricks(version 13.3 LTS) and could use some help with importing external Python files as modules into my notebook. I’m aiming to organize my code better and reuse functions across different notebooks.
Could someone please provide detailed steps or best practices for this? Are there any specific configurations I should be aware of, or recommended file structures?
Here’s the structure I’m working with:
I need to import functions from utilities.py to use them in my notebook. However, I’m encountering a "module not found" error when I try to do this.
Thanks for your assistance!
10-05-2024 08:45 AM
Hi @adhi_databricks ,
to accomplish what you need:
1. Make sure that utilities.py are a file, and not a notebook. If you created the notebook, you will not be able to do the imports.
2. Append path to utils and import utilities (not utils.utilities):
10-05-2024 09:45 AM
Hi @filipniziol,
It works with the Serverless cluster. However, I'm using the 13.3 LTS DBR version, and it's not functioning as expected. Is there a specific version that is to be compatible with this usecase?
10-05-2024 10:10 AM - edited 10-05-2024 10:19 AM
Hi @adhi_databricks ,
Since Databricks Runtime 14.0+ the default current directory has been changed:
https://docs.databricks.com/en/files/workspace-modules.html
To solve it:
1. Use Serverless, or any version 14+, like 14.3 LTS, 15.4 LTS
2. In version 13.3 change the path according to the old design, one of the solutions is simply to use absolute path to the utils:
One more thing, when retesting with different runtime/path, make sure to Clear state in the notebook:
10-06-2024 07:16 AM
Hey @filipniziol,
Here’s the path to the notebook:
/Workspace/Repos/xyz@email.com/de/usecase/main/data_pipelines/notebooks/main_notebook
And here’s the path to the utils Python file:
/Workspace/Repos/xyz@email.com/de/usecase/main/data_pipelines/utils.py
I’m on version 13.3 of the DBR, and using the absolute path in the Repos folder didn’t work for me. However, it did work in the Workspace/Shared folder, as you mentioned.
Could you help me with this? Thanks!
10-06-2024 07:50 AM - edited 10-06-2024 07:54 AM
Hi @adhi_databricks ,
Here is the article on current working directory for different versions:
https://docs.databricks.com/en/files/cwd-dbr-14.html
The solution is change the current working directory to your current directory and then to use relative paths (in your case it should be "../"):
import os
os.chdir("/tmp")
Here is the code tested using DBR 13.3 inside /Workspace/Repos:
10-06-2024 08:34 AM
Hey @filipniziol ,
According to the documentation from the link you shared, when code runs in a path under Workspace/Repos, the current working directory depends on your admin configuration and the cluster's DBR version. Specifically, for workspaces with enableWorkspaceFilesystem set to dbr11.0+ on DBR versions 11.0 and higher
the CWD is the directory containing the notebook or script being executed.
I'm using os.getcwd to get the CWD, which reflects where the script is running. However, I'm having trouble with sys.path.append, both with and without the os.chdir command. Any insights?
10-06-2024 08:47 AM - edited 10-06-2024 08:51 AM
Hi @adhi_databricks ,
in your case the file is called utilities and not utils. You need to import the name of the file, so to import utilities.
EDIT:
I realized first in was utilities, later utils.
Could you share once again what are your paths and what is the name of the .py file.
Also is your "main_notebok" a notebook, or it is also some folder where the notebook is located.
10-06-2024 08:53 AM
Hey @filipniziol, name of file is utils.py and notebook is main_notebook(NOTEBOOK)
Here’s the path to the notebook:
/Workspace/Repos/xyz@email.com/de/usecase/main/data_pipelines/notebooks/main_notebook
And here’s the path to the utils Python file:
/Workspace/Repos/xyz@email.com/de/usecase/main/data_pipelines/utils.py
10-06-2024 08:56 AM
Hmm.. it looks good. Could you clear state and cell output and try once again?
10-06-2024 09:01 AM
I’ve already tried that, but it didn’t work.
10-06-2024 09:19 AM
Hi @adhi_databricks ,
1. Double check the directories Python is using to look for modules:
import sys
... add the path...
print(sys.path)
2. Double check utils.py is a file and not a notebook
3. Try to set manually the current working directory:
import os
os.chdir('/Workspace/Repos/xyz@email.com/de/usecase/main/data_pipelines/notebooks')
4. Always test first clearing the state of the notebook
5. Prioritize your custom path using sys.path.insert(0, path) instead sys.path.append()
10-06-2024 09:33 AM
hey @filipniziol ,As DBR Version is 13.3 and enableWorkspaceFilesystem is enabled the cwd is already set to
/Workspace/Repos/xyz@email.com/de/usecase/main/data_pipelines/notebooks
But still prioritized the path using sys.path.insert(0,path),still facing Module not found error😐
10-06-2024 09:57 AM
Hi @adhi_databricks ,
I am out of ideas in this case. Is utils.py the correct python file, no errors found.
Could you test with some simple code like below?
I am starting to think there is something wrong with the file (although you mentioned it works in /Shared folder).
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group