cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Where does Python "import" statement in a Notebook search (on Azure) for libraries?

Newyn
New Contributor III

I know that import can find installed cluster libraries, workspace libraries etc. as well as how to create wheel libraries externally and upload them.

However, in a project I came across the feature that import was able to find the folder in, if I remember correctly, a git repository. Essentially, I could import the Python library from the folder without installing it as a, for example, wheel library.

I have tried to find information about this feature in the documentation. However, it is a bit hard to find, partly because "import" has a lot of different denotations in he Databricks environment.

So, apart from installed libraries, where does Python import statement in a notebook look for libraries?

1 ACCEPTED SOLUTION

Accepted Solutions

I think this might help: https://docs.databricks.com/_static/notebooks/files-in-repos.html

It's a quick explanation of the path and how to alter it in databricks.

View solution in original post

5 REPLIES 5

Abishek
Databricks Employee
Databricks Employee

Hi @Jonas Mellin​

Can you try to run the below command from the notebook you will get the list of libraries installed in the DBR.

For example:

%sh ls -l /databricks/python3/lib/python3.8/site-packages 

Screenshot 2022-08-25 at 13.49.05Also ref the docs for notebooks for python: https://docs.databricks.com/libraries/notebooks-python-libraries.html

If you still need any clarification Please let me know, I will try to explain further 🙂

Newyn
New Contributor III

That is not really what I want. What I want is to have a folder in a git repository connected to Databricks and that notebooks can search this folder or the repository for packages developed in-house. I know of the way to deploy them as wheel libraries both on clusters and in workspaces as well as how to connect them to Jobs.

It is much more convenient to be able to work in a proper IDE, then check in the change, check them out in Databricks and test them. By mistake I came across this in a project and the package was a folder in the root of the repository. When I ran tests, import found the folder in the repository. Very convenient. The tests resided in the same repository.

However, I have not fully understood or have had time to figure out how import could find this. There are numerous questions concerning this:

  1. Is the import search restricted to the same repository or can it go across to other repositories?
  2. Can import of this package deployed in this way be performed from a notebook in a workspace?
  3.  

I do not want to perform any ugly hacks changing sys.path inside the notebooks or the packages.

I think this might help: https://docs.databricks.com/_static/notebooks/files-in-repos.html

It's a quick explanation of the path and how to alter it in databricks.

Newyn
New Contributor III

I have been a bit busy, but I followed up the question with a reply. Thanks for the reminder.

Vidula
Honored Contributor

Hi @Jonas Mellin​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group