cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

master notebook cannot find the udf registered in the child notebook

andrew0117
Contributor

The master notebook is calling a child notebook using

dbutils.notebook.run("PathToChildnotebook")

. The child notebook defines a user-defined function (UDF) and registers it using

spark.udf.register

. However, when the child notebook finishes running and returns to the master notebook, the UDF cannot be found and the master notebook encounters an error when it tries to use the UDF?

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@andrew li​ :

The reason why the UDF cannot be found is that when the child notebook finishes running, the Spark context that was used to define and register the UDF is destroyed. Therefore, the UDF is no longer available in the Spark context used by the master notebook.

To solve this issue, you can either define and register the UDF in the master notebook, or you can pass the UDF as a parameter to the master notebook from the child notebook using the dbutils.notebook.exit() function.

Here's an example of how you can pass the UDF as a parameter from the child notebook to the master notebook:

In the child notebook:

def my_udf(x):
  return x + 1
 
spark.udf.register("my_udf", my_udf)
 
dbutils.notebook.exit(my_udf)

In the master notebook:

child_udf = dbutils.notebook.run("PathToChildnotebook", timeout_seconds=600)
spark.udf.register("my_udf", child_udf)

In this example, the my_udf UDF is defined and registered in the child notebook, and then passed as a parameter to the dbutils.notebook.exit() function. The dbutils.notebook.run() function in the master notebook calls the child notebook and returns the UDF as a string. The UDF is then registered in the Spark context of the master notebook using spark.udf.register().

View solution in original post

4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Please let us know the error code.

Also, please tag @Debayan​ with your next response so that I will be notified. Thanks!

@Debayan Mukherjee​ the error code is : undefined function: my_udf_name. This function is neither a bulit-in/temporary function nor a persistent function that is qualified as spark_catalog_.default.my_udf_name.

To my understanding, the dbuils.notebook.run triggered a separate job, which defined and registered a function, but both jobs were executed on the same cluster, so they were within the same spark session, and registered udf is tied to the spark session. Why the masternote cannot call the udf defined in the child notebook here?

If I use magic command %run, it will be executed in the same job, and masternote book has no issue calling the registerd udf defined in the child notebook.

Debayan
Esteemed Contributor III
Esteemed Contributor III

Just to reconfirm, reference: https://docs.databricks.com/notebooks/notebook-workflows.html#run-multiple-notebooks-concurrently, looking into it if we have got the error earlier.

Anonymous
Not applicable

@andrew li​ :

The reason why the UDF cannot be found is that when the child notebook finishes running, the Spark context that was used to define and register the UDF is destroyed. Therefore, the UDF is no longer available in the Spark context used by the master notebook.

To solve this issue, you can either define and register the UDF in the master notebook, or you can pass the UDF as a parameter to the master notebook from the child notebook using the dbutils.notebook.exit() function.

Here's an example of how you can pass the UDF as a parameter from the child notebook to the master notebook:

In the child notebook:

def my_udf(x):
  return x + 1
 
spark.udf.register("my_udf", my_udf)
 
dbutils.notebook.exit(my_udf)

In the master notebook:

child_udf = dbutils.notebook.run("PathToChildnotebook", timeout_seconds=600)
spark.udf.register("my_udf", child_udf)

In this example, the my_udf UDF is defined and registered in the child notebook, and then passed as a parameter to the dbutils.notebook.exit() function. The dbutils.notebook.run() function in the master notebook calls the child notebook and returns the UDF as a string. The UDF is then registered in the Spark context of the master notebook using spark.udf.register().

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.