cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Importing python module

tariq
New Contributor III

I'm not sure how a simple thing like importing a module in python can be so broken in such a product. First, I was able to make it work using the following:

import sys
sys.path.append("/Workspace/Repos/Github Repo/sparkling-to-databricks/src")
from utils.some_util import *

I was able to use the imported function. But then I restarted the cluster and this would not work even though the path is in sys.path.

I also tried the following:

spark.sparkContext.addPyFile("/Workspace/Repos/Github Repo/my-repo/src/utils/some_util.py")

This did not work either. Can someone please tell me what I'm doing wrong here and suggest a solution. Thanks.

4 REPLIES 4

Anonymous
Not applicable

https://docs.databricks.com/libraries/index.html the docs should help.

Keep in mind that paths on the cluster are distributed paths and that paths in python are local to the driver. If you do %fs ls / you'll get a different result than if you do %sh fs ls /

tariq
New Contributor III

So there's no other way than creating a library?

KrishZ
Contributor

I too wonder the same thing. How can importing a python module be so difficult and not even documented lol.

No need for libraries..

Here's what worked for me..

Step1: Upload the module by first opening a notebook >> File >> Upload Data >> drag and drop your module

Step2: Click on Next

Step3: Copy the databricks path for your module. (this path is diplayed in the pop up that you see just after click on Next)

For me , if my module is named test_module the path looks like 

  • dbfs:/FileStore/shared_uploads/krishz@company.com/test_module.py

Step4: Append the above to Path (albeit with changes)

  • Change 1: Change dbfs:/ to /dbfs/
  • Change 2: remove your module name from the path 🙂

Now my path to append looks like

  • /dbfs/FileStore/shared_uploads/krishz@company.com

Step5:

import sys
sys.path.append("/dbfs/FileStore/shared_uploads/krishz@company.com")

Step6:

Now you can import simply by using below:

import test_module

Post this, you may write code as if you had imported test_module in your usual Jupyter notebook - No need to worry about any databricks intricacies.

Lemme know if you are unclear about any step. Honestly speaking, I don't know why you were recommended to use libraries for such a simple request.

tariq
New Contributor III

Thanks for the reply. The file I have is part of a repo within the repo structure. Is there a way to import dependencies within a repo. The repo structure looks something like below:

imageAnd I need to import the some_util module in my_notebook.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.