11-17-2021 06:37 PM
Hi Team, When i am trying to register a permanant function i am getting the below error.
%sql
CREATE FUNCTION simple_udf AS 'SimpleUdf'
USING JAR '/tmp/SimpleUdf.jar';
%sql
select simple_udf(2)
Error Details :
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: Can not load class 'SimpleUdf' when registering the function 'default.simple_udf', please make sure it is on the classpath;
11-18-2021 03:21 PM
Hi @Werner Stinckens , Yes i followed the same instructions and was trying to solve this with a java program then later planning to convert python script to a jar file.
Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class related jar into the project,
Code :
import org.apache.hadoop.hive.ql.exec.UDF;
public class SimpleUdf extends UDF {
public int evaluate(int value) {
return value + 10;
}
}
Jar file Link :
http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm
And then tried the below commands
%sql
CREATE FUNCTION simple_udf AS 'SimpleUdf'
USING JAR '/tmp/SimpleUdf.jar';
AND
%sql
CREATE FUNCTION simple_udf AS 'SimpleUdf'
USING JAR '/dbfs/tmp/SimpleUdf.jar';
%sql
select simple_udf(2)
I placed it on the File Store of data bricks cluster and it kept throwing the same error, please make sure the class path is correct.
11-18-2021 12:06 AM
Hi @ Manoj! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.
11-18-2021 08:44 AM
Hi @Kaniz Fatma , Nice meeting you! Have you got a chance to create permanent functions in data bricks using python scripts, I am trying to anonymize the data using permanent views and permanent functions.. but i am having trouble in creating permanent functions
11-18-2021 09:47 AM
Hi @Manoj Kumar Rayalla , The error could be due to the unavailability of jars in the worker nodes.
11-18-2021 03:23 PM
Hi @Werner Stinckens , Yes i followed the same instructions and was trying to solve this first for java program then later planning to convert python script to a jar file.
Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class file related jar into the project,
Code :
import org.apache.hadoop.hive.ql.exec.UDF;
public class SimpleUdf extends UDF {
public int evaluate(int value) {
return value + 10;
}
}
Jar file Link :
http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm
I placed it on the File Store of data bricks cluster and kept throwing the same error, please make sure the class path is correct.
I see you are from Data Bricks did you guys any documentation that helps me understand with an example...
Below is there but its not clear
11-18-2021 12:14 AM
it seems the cluster cannot find the jar. My guess is your path is incorrect/incomplete or as the error says it: you have to let the cluster know where the jar can be found (classpath)
11-18-2021 03:19 AM
probably correct path is /dbfs/tmp/SimpleUdf.jar
11-18-2021 08:19 AM
hi @Manoj Kumar Rayalla ,
Like @Hubert Dudek mentioned, make sure to check what is the exact location for your Jar file. Make sure to list your path first to check if the file it located in the right location.
11-18-2021 08:41 AM
Thanks for the Suggestions @Werner Stinckens @Jose Gonzalez
@Hubert Dudek . No luck in changing the path
Did anyone else got the same error while creating permanent functions using python script? I see java and scala has documentation but not python.
Thanks,
Manoj
11-18-2021 09:15 AM
ok now I see the issue: you want to use python, but provide a jar (for java/scala).
I remember another post asking for sql functions in python:
I would think the USING FILE would work.
As long as you follow the class_name requirements.
The implementing class should extend one of the base classes as follows:
Also the docs literally state python is possible:
In addition to the SQL interface, Spark allows you to create custom user defined scalar and aggregate functions using Scala, Python, and Java APIs. See User-defined scalar functions (UDFs) and User-defined aggregate functions (UDAFs) for more information.
11-18-2021 03:21 PM
Hi @Werner Stinckens , Yes i followed the same instructions and was trying to solve this with a java program then later planning to convert python script to a jar file.
Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class related jar into the project,
Code :
import org.apache.hadoop.hive.ql.exec.UDF;
public class SimpleUdf extends UDF {
public int evaluate(int value) {
return value + 10;
}
}
Jar file Link :
http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm
And then tried the below commands
%sql
CREATE FUNCTION simple_udf AS 'SimpleUdf'
USING JAR '/tmp/SimpleUdf.jar';
AND
%sql
CREATE FUNCTION simple_udf AS 'SimpleUdf'
USING JAR '/dbfs/tmp/SimpleUdf.jar';
%sql
select simple_udf(2)
I placed it on the File Store of data bricks cluster and it kept throwing the same error, please make sure the class path is correct.
11-18-2021 11:15 PM
Ok, so I tried your example myself.
The issue lies within your jar.
So I used your jar and tried myself, using a class which is in the jar:
What this function does, I have no idea. But it is important that you point to an existing class, and take the complete name, not only the last part (so in this case org.apache.hadoop.hive.ql.udf.UDFLog and not just UDFLog).
Hope this helps.
01-26-2022 01:45 PM
hi @Werner Stinckens @Jose Gonzalez @Hubert Dudek @Kaniz Fatma
Thanks for all the help, Appreciate it. I was able to create permanent functions and use eclipse to create the runnable jar. However, Does anyone have any idea on how to deploy the jar on SQL End Point Cluster.
01-27-2022 12:14 AM
I doubt if that is possible at the moment, it would also go against the use of photon (which is C++).
01-27-2022 05:10 AM
As SQL End Point is serverless I doubt it is possible
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group