Databricks Community

daniel23 · ‎08-10-2023

I have the following command that runs in my databricks notebook.

spark.conf.get("spark.databricks.clusterUsageTags.managedResourceGroup")

I have wrapped this command into a function (simplified).

def get_info():
    return spark.conf.get("spark.databricks.clusterUsageTags.managedResourceGroup")

I have then added this function in a .py module, that I install as a private package in the environment of my workspace. I am able to import this function and call it.

However, when I run this function, I receive an error message.

get_info()
>>> NameError: name 'spark' is not defined

If I define the same function in the body of the notebook, I can run it without problems.

- Why bringing this function to a separate module forces me to import spark? What's the proper way of creating a separate module with spark functions? How to import them?

- If possible, what is happening under the hood, that makes it work when I define the function in the notebook, but not work when I import it?

daniel23 · ‎08-14-2023

Thanks for your reply.

I have redefined the function, including spark in the scope:

def get_info(spark: SparkSession):
    return spark.conf.get("spark.databricks.clusterUsageTags.managedResourceGroup")

After implementing the change, it works.

Hence, thank you for the explanation and the suggested approach.

Best,

Databricks Community

How to properly import spark functions?

Connect with Databricks Users in Your Area

Join Us as a Community Technical Moderator

Databricks Community Champion - October 2024 - Filip Niziol

Become Our Next Monthly Community Champion!

Introducing Simple, Fast, and Scalable Batch LLM Inference on Mosaic AI Model Serving

Databricks Migration Strategy: Lessons Learned