Databricks Community

jsaddam28 · ‎09-04-2015

for example I have one.py and two.py in databricks and I want to use one of the module from one.py in two.py. Usually I do this in my local machine by import statement like below

two.py__

from one import module1

.

How to do this in databricks???

awaiskaleem · ‎12-30-2019

Tried this. %run runs the py file, print a print statement in external file. But what I want is to get a variable from external file and use it in current notebook. That doesn't work. (added some print statements below, the variable in file is called pseg_main)

amanpreetkaur · ‎05-03-2019

Is it possible to import a particular function using %run statement instead of running the whole notebook?

Dipan · ‎07-18-2019

Databricks is a very rigid environment. They don't promote modularizing code. I don't understand why they disable / dissuade use of such basic concepts which are so generic to all programming languages.

Even after so many years this is still a problem

HananShteingart · ‎11-25-2020

exactly. I don't understand what is the path of the current notebook if I do ssh

JavierOrozco · ‎07-18-2019

The way to solve this problem is to add the path of your code to the system, then proceed to import modules selectively or all modules in a file.

You can download an example notebook from here https://github.com/javierorozco/databricks_import_python_module

import sys

// Add the path to system, local or mounted S3 bucket, e.g. /dbfs/mnt/<path_to_bucket> sys.path.append('/databricks/driver/') sys.path.append('/databricks/driver/databricks_import_python_module/') sys.path.append('/databricks/driver/databricks_import_python_module/test.py')

shantanuneema · ‎12-24-2019

@javier.orozco@realeuesit.com, I was able to work with the file created in your repo (test.py) but my with my own modules I am getting error. Anything I am missing?

Ecedysis · ‎05-13-2020

Works for me to upload my python file to dbfs using the databricks CLI:

dbfs cp mymodule.py dbfs:/path/to/module/mymodule.py --overwrite

Then the following works:

import sys
sys.path.append('/dbfs/path/to/module')
#the file is /dbfs/path/to/module/mymodule.py
import mymodule

HananShteingart · ‎11-28-2020

It's a nice hack but how do I connect to the cluster driver to do real remote ssh development. I could connect via ssh to the driver but it seems there is a different python there which has no pyspark

HananShteingart · ‎11-28-2020

This is very cumbersome for someone who is used to develop data science projects with modules packages and classes and not just notebooks. Why does not data bricks allow this? I know about databricks-connect but it does not solve the problem as the driver runs locally and not remotely. What I want is a real ssh remote development experience.

StephanieAlba · ‎10-11-2021

USE REPOS! 😁

Repos is able to call a function that is in a file in the same Github repo as long as Files is enabled in the admin panel.

So if I have utils.py with:

import pandas as pd
 
def clean_data():
  # Load wine data
  data = pd.read_csv("/dbfs/databricks-datasets/wine-quality/winequality-white.csv", sep=";")
  print(data)
  # Remove spaces from column names
  data.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)`

my notebook can call above with this: