cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to import a helper module that uses databricks specific modules (dbutils)

mjbobak
New Contributor III

I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-in dbutils. Now back in my main notebook, when I try to execute the helper function that uses dbutils, I get an error: [NameError: name 'dbutils' is not defined]. How can I create a helper module that imports seamlessly and can leverage dbutils?

1 ACCEPTED SOLUTION

Accepted Solutions

mjbobak
New Contributor III

Looks like if I add the appropriate imports into the helper.py file then all is corrected.

from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
 
spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)

View solution in original post

5 REPLIES 5

mjbobak
New Contributor III

Looks like if I add the appropriate imports into the helper.py file then all is corrected.

from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
 
spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)

Atanu
Esteemed Contributor
Esteemed Contributor

So the above resolved the issue? Please let us know if you still stuck. Thanks

mjbobak
New Contributor III

All set. issue resolved

Shawn_Eary
New Contributor III

This is a little off topic, but I'm trying to run a PySpark script in VSCode via DataBricks ConnectV2:
https://www.youtube.com/watch?v=AP5dGiCU188
When I do that, I get the error mjbobak describes about dbutils not being defined. 

When I use mjbobak's code or the code Elisabetta shares on SO 
https://stackoverflow.com/questions/50813493/nameerror-name-dbutils-is-not-defined-in-pyspark
the error goes away, but then I get a runtime error:
"No operations allowed on this path" in response to the following dbutils.fs.ls call:
theFiles = dbutils.fs.ls("/Volumes/myTestData/shawn_test/staging/inbound")

Is there a proper way to define/import dbutils when using Connect V2 to try to debug a PySpark file that is saved locally?

amitca71
Contributor II

Hi,

i 'm facing similiar issue, when deploying via dbx.

I have an helper notebook, that when executing it via jobs works fine (without any includes)

while i deploy it via dbx (to same cluster), the helper notebook results with

dbutils.fs.ls(path)

NameError: name 'dbutils' is not defined

(for main notebook, that callse the helper function notebook, i have dbutils.widgets, and it doesnt have any issue)

(dbx execute my-task --task=silver --cluster-name="my-multi-cluster": builds a wheel and deploy on the databricks cluster)

adding the includes suggesetd dont resolve the issue.

any advise?

thanks,

Amir

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.