- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2022 06:31 PM
I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-in dbutils. Now back in my main notebook, when I try to execute the helper function that uses dbutils, I get an error: [NameError: name 'dbutils' is not defined]. How can I create a helper module that imports seamlessly and can leverage dbutils?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2022 06:49 AM
Looks like if I add the appropriate imports into the helper.py file then all is corrected.
from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2022 06:49 AM
Looks like if I add the appropriate imports into the helper.py file then all is corrected.
from pyspark.dbutils import DBUtils
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2022 11:42 AM
So the above resolved the issue? Please let us know if you still stuck. Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2022 11:43 AM
All set. issue resolved
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2023 01:51 PM
This is a little off topic, but I'm trying to run a PySpark script in VSCode via DataBricks ConnectV2:
https://www.youtube.com/watch?v=AP5dGiCU188
When I do that, I get the error mjbobak describes about dbutils not being defined.
When I use mjbobak's code or the code Elisabetta shares on SO
https://stackoverflow.com/questions/50813493/nameerror-name-dbutils-is-not-defined-in-pyspark
the error goes away, but then I get a runtime error:
"No operations allowed on this path" in response to the following dbutils.fs.ls call:
theFiles = dbutils.fs.ls("/Volumes/myTestData/shawn_test/staging/inbound")
Is there a proper way to define/import dbutils when using Connect V2 to try to debug a PySpark file that is saved locally?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2022 07:51 AM
Hi,
i 'm facing similiar issue, when deploying via dbx.
I have an helper notebook, that when executing it via jobs works fine (without any includes)
while i deploy it via dbx (to same cluster), the helper notebook results with
dbutils.fs.ls(path)
NameError: name 'dbutils' is not defined
(for main notebook, that callse the helper function notebook, i have dbutils.widgets, and it doesnt have any issue)
(dbx execute my-task --task=silver --cluster-name="my-multi-cluster": builds a wheel and deploy on the databricks cluster)
adding the includes suggesetd dont resolve the issue.
any advise?
thanks,
Amir

