cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to import a helper module that uses databricks specific modules (dbutils)

mjbobak
New Contributor III

I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-in dbutils. Now back in my main notebook, when I try to execute the helper function that uses dbutils, I get an error: [NameError: name 'dbutils' is not defined]. How can I create a helper module that imports seamlessly and can leverage dbutils?

1 ACCEPTED SOLUTION

Accepted Solutions

mjbobak
New Contributor III

Looks like if I add the appropriate imports into the helper.py file then all is corrected.

from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
 
spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)

View solution in original post

5 REPLIES 5

mjbobak
New Contributor III

Looks like if I add the appropriate imports into the helper.py file then all is corrected.

from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
 
spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)

Atanu
Esteemed Contributor
Esteemed Contributor

So the above resolved the issue? Please let us know if you still stuck. Thanks

mjbobak
New Contributor III

All set. issue resolved

This is a little off topic, but I'm trying to run a PySpark script in VSCode via DataBricks ConnectV2:
https://www.youtube.com/watch?v=AP5dGiCU188
When I do that, I get the error mjbobak describes about dbutils not being defined. 

When I use mjbobak's code or the code Elisabetta shares on SO 
https://stackoverflow.com/questions/50813493/nameerror-name-dbutils-is-not-defined-in-pyspark
the error goes away, but then I get a runtime error:
"No operations allowed on this path" in response to the following dbutils.fs.ls call:
theFiles = dbutils.fs.ls("/Volumes/myTestData/shawn_test/staging/inbound")

Is there a proper way to define/import dbutils when using Connect V2 to try to debug a PySpark file that is saved locally?

We've all gotten rather used to developing in the browser and simply accepting the lack of local debugging, variable inspection and the niceties of software engineering... but not for much longer! Databricks have recently released Databricks Connect V2, based on the new open Spark Connect API ...

amitca71
Contributor II

Hi,

i 'm facing similiar issue, when deploying via dbx.

I have an helper notebook, that when executing it via jobs works fine (without any includes)

while i deploy it via dbx (to same cluster), the helper notebook results with

dbutils.fs.ls(path)

NameError: name 'dbutils' is not defined

(for main notebook, that callse the helper function notebook, i have dbutils.widgets, and it doesnt have any issue)

(dbx execute my-task --task=silver --cluster-name="my-multi-cluster": builds a wheel and deploy on the databricks cluster)

adding the includes suggesetd dont resolve the issue.

any advise?

thanks,

Amir

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group