cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

master notebook cannot find the udf registered in the child notebook

andrew0117
Contributor

The master notebook is calling a child notebook using

dbutils.notebook.run("PathToChildnotebook")

. The child notebook defines a user-defined function (UDF) and registers it using

spark.udf.register

. However, when the child notebook finishes running and returns to the master notebook, the UDF cannot be found and the master notebook encounters an error when it tries to use the UDF?

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@andrew liโ€‹ :

The reason why the UDF cannot be found is that when the child notebook finishes running, the Spark context that was used to define and register the UDF is destroyed. Therefore, the UDF is no longer available in the Spark context used by the master notebook.

To solve this issue, you can either define and register the UDF in the master notebook, or you can pass the UDF as a parameter to the master notebook from the child notebook using the dbutils.notebook.exit() function.

Here's an example of how you can pass the UDF as a parameter from the child notebook to the master notebook:

In the child notebook:

def my_udf(x):
  return x + 1
 
spark.udf.register("my_udf", my_udf)
 
dbutils.notebook.exit(my_udf)

In the master notebook:

child_udf = dbutils.notebook.run("PathToChildnotebook", timeout_seconds=600)
spark.udf.register("my_udf", child_udf)

In this example, the my_udf UDF is defined and registered in the child notebook, and then passed as a parameter to the dbutils.notebook.exit() function. The dbutils.notebook.run() function in the master notebook calls the child notebook and returns the UDF as a string. The UDF is then registered in the Spark context of the master notebook using spark.udf.register().

View solution in original post

4 REPLIES 4

Debayan
Esteemed Contributor III

Hi, Please let us know the error code.

Also, please tag @Debayanโ€‹ with your next response so that I will be notified. Thanks!

@Debayan Mukherjeeโ€‹ the error code is : undefined function: my_udf_name. This function is neither a bulit-in/temporary function nor a persistent function that is qualified as spark_catalog_.default.my_udf_name.

To my understanding, the dbuils.notebook.run triggered a separate job, which defined and registered a function, but both jobs were executed on the same cluster, so they were within the same spark session, and registered udf is tied to the spark session. Why the masternote cannot call the udf defined in the child notebook here?

If I use magic command %run, it will be executed in the same job, and masternote book has no issue calling the registerd udf defined in the child notebook.

Debayan
Esteemed Contributor III

Just to reconfirm, reference: https://docs.databricks.com/notebooks/notebook-workflows.html#run-multiple-notebooks-concurrently, looking into it if we have got the error earlier.

Anonymous
Not applicable

@andrew liโ€‹ :

The reason why the UDF cannot be found is that when the child notebook finishes running, the Spark context that was used to define and register the UDF is destroyed. Therefore, the UDF is no longer available in the Spark context used by the master notebook.

To solve this issue, you can either define and register the UDF in the master notebook, or you can pass the UDF as a parameter to the master notebook from the child notebook using the dbutils.notebook.exit() function.

Here's an example of how you can pass the UDF as a parameter from the child notebook to the master notebook:

In the child notebook:

def my_udf(x):
  return x + 1
 
spark.udf.register("my_udf", my_udf)
 
dbutils.notebook.exit(my_udf)

In the master notebook:

child_udf = dbutils.notebook.run("PathToChildnotebook", timeout_seconds=600)
spark.udf.register("my_udf", child_udf)

In this example, the my_udf UDF is defined and registered in the child notebook, and then passed as a parameter to the dbutils.notebook.exit() function. The dbutils.notebook.run() function in the master notebook calls the child notebook and returns the UDF as a string. The UDF is then registered in the Spark context of the master notebook using spark.udf.register().

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group