cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Importing python module

tariq
New Contributor III

I'm not sure how a simple thing like importing a module in python can be so broken in such a product. First, I was able to make it work using the following:

import sys
sys.path.append("/Workspace/Repos/Github Repo/sparkling-to-databricks/src")
from utils.some_util import *

I was able to use the imported function. But then I restarted the cluster and this would not work even though the path is in sys.path.

I also tried the following:

spark.sparkContext.addPyFile("/Workspace/Repos/Github Repo/my-repo/src/utils/some_util.py")

This did not work either. Can someone please tell me what I'm doing wrong here and suggest a solution. Thanks.

4 REPLIES 4

Anonymous
Not applicable

https://docs.databricks.com/libraries/index.html the docs should help.

Keep in mind that paths on the cluster are distributed paths and that paths in python are local to the driver. If you do %fs ls / you'll get a different result than if you do %sh fs ls /

tariq
New Contributor III

So there's no other way than creating a library?

KrishZ
Contributor

I too wonder the same thing. How can importing a python module be so difficult and not even documented lol.

No need for libraries..

Here's what worked for me..

Step1: Upload the module by first opening a notebook >> File >> Upload Data >> drag and drop your module

Step2: Click on Next

Step3: Copy the databricks path for your module. (this path is diplayed in the pop up that you see just after click on Next)

For me , if my module is named test_module the path looks like 

  • dbfs:/FileStore/shared_uploads/krishz@company.com/test_module.py

Step4: Append the above to Path (albeit with changes)

  • Change 1: Change dbfs:/ to /dbfs/
  • Change 2: remove your module name from the path 🙂

Now my path to append looks like

  • /dbfs/FileStore/shared_uploads/krishz@company.com

Step5:

import sys
sys.path.append("/dbfs/FileStore/shared_uploads/krishz@company.com")

Step6:

Now you can import simply by using below:

import test_module

Post this, you may write code as if you had imported test_module in your usual Jupyter notebook - No need to worry about any databricks intricacies.

Lemme know if you are unclear about any step. Honestly speaking, I don't know why you were recommended to use libraries for such a simple request.

tariq
New Contributor III

Thanks for the reply. The file I have is part of a repo within the repo structure. Is there a way to import dependencies within a repo. The repo structure looks something like below:

imageAnd I need to import the some_util module in my_notebook.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group