09-04-2015 12:18 AM
for example I have one.py and two.py in databricks and I want to use one of the module from one.py in two.py. Usually I do this in my local machine by import statement like below
two.py__
from one import module1
.
.
.
How to do this in databricks???
12-30-2019 03:20 AM
05-03-2019 12:13 AM
Is it possible to import a particular function using %run statement instead of running the whole notebook?
07-18-2019 07:55 AM
Databricks is a very rigid environment. They don't promote modularizing code. I don't understand why they disable / dissuade use of such basic concepts which are so generic to all programming languages.
Even after so many years this is still a problem
11-25-2020 12:06 PM
exactly. I don't understand what is the path of the current notebook if I do ssh
07-18-2019 03:18 PM
The way to solve this problem is to add the path of your code to the system, then proceed to import modules selectively or all modules in a file.
You can download an example notebook from here https://github.com/javierorozco/databricks_import_python_module
import sys// Add the path to system, local or mounted S3 bucket, e.g. /dbfs/mnt/<path_to_bucket> sys.path.append('/databricks/driver/') sys.path.append('/databricks/driver/databricks_import_python_module/') sys.path.append('/databricks/driver/databricks_import_python_module/test.py')
12-24-2019 08:49 PM
@javier.orozco@realeuesit.com, I was able to work with the file created in your repo (test.py) but my with my own modules I am getting error. Anything I am missing?
05-13-2020 06:27 PM
Works for me to upload my python file to dbfs using the databricks CLI:
dbfs cp mymodule.py dbfs:/path/to/module/mymodule.py --overwrite
Then the following works:
import sys
sys.path.append('/dbfs/path/to/module')
#the file is /dbfs/path/to/module/mymodule.py
import mymodule
11-28-2020 02:13 AM
It's a nice hack but how do I connect to the cluster driver to do real remote ssh development. I could connect via ssh to the driver but it seems there is a different python there which has no pyspark
11-28-2020 02:11 AM
This is very cumbersome for someone who is used to develop data science projects with modules packages and classes and not just notebooks. Why does not data bricks allow this? I know about databricks-connect but it does not solve the problem as the driver runs locally and not remotely. What I want is a real ssh remote development experience.
10-11-2021 09:43 AM
USE REPOS! 😁
Repos is able to call a function that is in a file in the same Github repo as long as Files is enabled in the admin panel.
So if I have utils.py with:
import pandas as pd
def clean_data():
# Load wine data
data = pd.read_csv("/dbfs/databricks-datasets/wine-quality/winequality-white.csv", sep=";")
print(data)
# Remove spaces from column names
data.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)`
my notebook can call above with this:
import utils
utils.clean_data()
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group