cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Using (python) import on azure databricks

sebg
New Contributor II

Hello,

My team is currently working on azure databricks with a mid sized repo. When we wish to import pyspark functions and classes from other notebooks we currently use

%run <relpath>

which is less than ideal.

I would like to replicate the functionality of this repo. However, when I clone it into the "repos" (see screenshot) section on my azure databricks instance, it will not work.

image 

The location of the repo is not

/Workspace/Repos/<username>/<repo_name>

the best lock I can get on a location is an adb hyperlink

https://adb-<workspace_id>.13.azuredatabricks.net/?o=<workspace_id>#folder/<repo_id>;

When I run sys.path(), I get the following output:

/databricks/python_shell/scripts 
/local_disk0/spark-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/userFiles-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 
/databricks/spark/python /databricks/spark/python/lib/py4j-0.10.9.1-src.zip 
/databricks/jars/spark--driver--driver-spark_3.2_2.12_deploy.jar 
/databricks/jars/spark--maven-trees--ml--10.x--graphframes--org.graphframes--graphframes_2.12--org.graphframes__graphframes_2.12__0.8.2-db1-spark3.2.jar 
/databricks/python_shell 
/usr/lib/python38.zip 
/usr/lib/python3.8 
/usr/lib/python3.8/lib-dynload 
/databricks/python/lib/python3.8/site-packages 
/usr/local/lib/python3.8/dist-packages 
/usr/lib/python3/dist-packages 
/databricks/.python_edge_libs 
/databricks/python/lib/python3.8/site-packages/IPython/extensions

It seems that the second path is where my notebook is being run from, but it is not persistent and the directory is empty.

So, my question is how can I use python's import functionality (as indicated in the linked example documentation) with my repo in adb?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Sebastian Gay​ , This section guides developing notebooks and jobs in Azure Databricks using Python. The first subsection provides links to tutorials for common workflows and tasks. The second subsection provides links to APIs, libraries, and critical tools.

To work with non-notebook files in Databricks Repos, you must be running Databricks Runtime 8.4 or above.

You need to enable files in Databricks Repos at first.

An admin can enable the Files in Repos feature as follows:

  1. Go to the Admin Console.
  2. Click the Workspace Settings tab.
  3. In the Repos section, click the Files in Repos toggle.

After the feature has been enabled, you must restart your cluster and refresh your browser before you can use Files in Repos.

Additionally, the first time you access a repo after Files in Repos is enabled, you must open the Git dialog. A dialog indicates that you must perform a pull operation to sync non-notebook files in the repo. Select Agree and Pull to sync files. If there are any merge conflicts, another dialog appears, giving you the option of discarding your conflicting changes or pushing your changes to a new branch.

To Confirm Files in Repos are enabled,

You can use the command 

%sh pwd

in a notebook inside a Repo to check if Files in Repos are enabled.

  • If Files in Repos is not enabled, the response is
/databricks/driver

  • If Files in Repos is enabled, the response is 
/Workspace/Repos/<path to notebook directory>

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.