cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to import local python file in notebook?

jsaddam28
New Contributor III

for example I have one.py and two.py in databricks and I want to use one of the module from one.py in two.py. Usually I do this in my local machine by import statement like below

two.py__

from one import module1

.

.

.

How to do this in databricks???

24 REPLIES 24

awaiskaleem
New Contributor II

Tried this. %run runs the py file, print a print statement in external file. But what I want is to get a variable from external file and use it in current notebook. That doesn't work. (added some print statements below, the variable in file is called pseg_main)

0693f000007OrnwAAC

amanpreetkaur
New Contributor III

Is it possible to import a particular function using %run statement instead of running the whole notebook?

Dipan
New Contributor II

Databricks is a very rigid environment. They don't promote modularizing code. I don't understand why they disable / dissuade use of such basic concepts which are so generic to all programming languages.

Even after so many years this is still a problem

exactly. I don't understand what is the path of the current notebook if I do ssh

JavierOrozco
New Contributor III

The way to solve this problem is to add the path of your code to the system, then proceed to import modules selectively or all modules in a file.

You can download an example notebook from here https://github.com/javierorozco/databricks_import_python_module

import sys

// Add the path to system, local or mounted S3 bucket, e.g. /dbfs/mnt/<path_to_bucket> sys.path.append('/databricks/driver/') sys.path.append('/databricks/driver/databricks_import_python_module/') sys.path.append('/databricks/driver/databricks_import_python_module/test.py')

@javier.orozco@realeuesit.com, I was able to work with the file created in your repo (test.py) but my with my own modules I am getting error. Anything I am missing?

0693f000007Oro5AAC

Ecedysis
New Contributor II

Works for me to upload my python file to dbfs using the databricks CLI:

dbfs cp mymodule.py dbfs:/path/to/module/mymodule.py --overwrite

Then the following works:

import sys
sys.path.append('/dbfs/path/to/module')
#the file is /dbfs/path/to/module/mymodule.py
import mymodule

It's a nice hack but how do I connect to the cluster driver to do real remote ssh development. I could connect via ssh to the driver but it seems there is a different python there which has no pyspark

HananShteingart
New Contributor III

This is very cumbersome for someone who is used to develop data science projects with modules packages and classes and not just notebooks. Why does not data bricks allow this? I know about databricks-connect but it does not solve the problem as the driver runs locally and not remotely. What I want is a real ssh remote development experience.

StephanieRivera
Valued Contributor II
Valued Contributor II

USE REPOS! 😁

Repos is able to call a function that is in a file in the same Github repo as long as Files is enabled in the admin panel.

So if I have utils.py with:

import pandas as pd
 
def clean_data():
  # Load wine data
  data = pd.read_csv("/dbfs/databricks-datasets/wine-quality/winequality-white.csv", sep=";")
  print(data)
  # Remove spaces from column names
  data.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)`

my notebook can call above with this:

import utils
 
utils.clean_data()

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.