cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

[UDF_MAX_COUNT_EXCEEDED] Exceeded query-wide UDF limit of 5 UDFs

Yaacoub
New Contributor

In my project I defined a UDF:

 

@udf(returnType=IntegerType())
def ends_with_one(value, bit_position):
    if bit_position + len(value) < 0: 
        return 0
    else:
        return int(value[bit_position] == '1')

spark.udf.register("ends_with_one", ends_with_one)

 

But somehow instead of registering the UDF once, it get's registered every time I call it:

 

df = df.withColumn('Ends_With_One', ends_with_one(col('Column_To_Check'), lit(-1)))

 

And after a few function calls I get the following error message:

 

[UDF_MAX_COUNT_EXCEEDED] Exceeded query-wide UDF limit of 5 UDFs (limited during public preview). Found 6. The UDFs were: `ends_with_one`,`ends_with_one`,`ends_with_one`,`ends_with_one`,`ends_with_one`,`ends_with_one`.

 

I spent a lot of time researching but I couldn't find my mistake.

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Yaacoub , 

The reason why your UDF gets registered every time you call it is because Spark registers UDFs at the session level, not the notebook level. This meansIf you are registering a UDF multiple times, it can result in exceeding the maximum number UDFs allowed per query, which is 5 in the public preview of Databricks SQL. To resolve this issue, you can define the UDF outside of the loop or function that is calling it.

Here is an example of how you can modify your code to register the UDF only once:

from pyspark.sql.functions import udf

@udf(returnType=IntegerType())
def ends_with_one(value, bit_position):
    if bit_position + len(value) < 0: 
        return 0
    else:
        return int(value[bit_position] == '1')

spark.udf.register("ends_with_one", ends_with_one)

# You can call the registered UDF in a loop, function or elsewhere

df = df.withColumn('Ends_With_One', ends_with_one(col('Column_To_Check'), lit(-1)))

By defining the UDF outside of the loop or function that is calling it, you will register the UDF only once and prevent it from being registered multiple times by repeated calls to the function.

Also, keep in mind that using too many UDFs can negatively impact query performance, so it's generally a good practice to use built-in Spark functions or DataFrame API operations whenever possible to achieve the same results for better performance.

jose_gonzalez
Moderator
Moderator

Hi @Yaacoub,

Just a friendly follow-up. Have you had a chance to review my colleague's reply? Please inform us if it contributes to resolving your query.

I used the proposed solution by defining the UDF outside of the loop, but I still got the same error. I run the same code on Azure Synapse without any problem. I would appreciate it if you could assist me in how I can address the UDF problem.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.