cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to import a helper module that uses databricks specific modules (dbutils)

mjbobak
New Contributor III

I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-in dbutils. Now back in my main notebook, when I try to execute the helper function that uses dbutils, I get an error: [NameError: name 'dbutils' is not defined]. How can I create a helper module that imports seamlessly and can leverage dbutils?

1 ACCEPTED SOLUTION

Accepted Solutions

mjbobak
New Contributor III

Looks like if I add the appropriate imports into the helper.py file then all is corrected.

from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
 
spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)

View solution in original post

5 REPLIES 5

mjbobak
New Contributor III

Looks like if I add the appropriate imports into the helper.py file then all is corrected.

from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
 
spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)

Atanu
Esteemed Contributor
Esteemed Contributor

So the above resolved the issue? Please let us know if you still stuck. Thanks

mjbobak
New Contributor III

All set. issue resolved

Shawn_Eary
New Contributor III

This is a little off topic, but I'm trying to run a PySpark script in VSCode via DataBricks ConnectV2:
https://www.youtube.com/watch?v=AP5dGiCU188
When I do that, I get the error mjbobak describes about dbutils not being defined. 

When I use mjbobak's code or the code Elisabetta shares on SO 
https://stackoverflow.com/questions/50813493/nameerror-name-dbutils-is-not-defined-in-pyspark
the error goes away, but then I get a runtime error:
"No operations allowed on this path" in response to the following dbutils.fs.ls call:
theFiles = dbutils.fs.ls("/Volumes/myTestData/shawn_test/staging/inbound")

Is there a proper way to define/import dbutils when using Connect V2 to try to debug a PySpark file that is saved locally?

amitca71
Contributor II

Hi,

i 'm facing similiar issue, when deploying via dbx.

I have an helper notebook, that when executing it via jobs works fine (without any includes)

while i deploy it via dbx (to same cluster), the helper notebook results with

dbutils.fs.ls(path)

NameError: name 'dbutils' is not defined

(for main notebook, that callse the helper function notebook, i have dbutils.widgets, and it doesnt have any issue)

(dbx execute my-task --task=silver --cluster-name="my-multi-cluster": builds a wheel and deploy on the databricks cluster)

adding the includes suggesetd dont resolve the issue.

any advise?

thanks,

Amir

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!