cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Did any one in the community create Permanent Functions using using python script ? I tried but i am getting the below error, Please advise

Manoj
Contributor II

Hi Team, When i am trying to register a permanant function i am getting the below error.

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/tmp/SimpleUdf.jar';

%sql

select simple_udf(2)

Error Details :

com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: Can not load class 'SimpleUdf' when registering the function 'default.simple_udf', please make sure it is on the classpath;

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @Werner Stinckens​ , Yes i followed the same instructions and was trying to solve this with a java program then later planning to convert python script to a jar file.

Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class related jar into the project,

Code :

import org.apache.hadoop.hive.ql.exec.UDF;

public class SimpleUdf extends UDF {

public int evaluate(int value) {

return value + 10;

}

}

Jar file Link :

http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm

And then tried the below commands

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/tmp/SimpleUdf.jar';

AND

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/dbfs/tmp/SimpleUdf.jar';

%sql

select simple_udf(2)

I placed it on the File Store of data bricks cluster and it kept throwing the same error, please make sure the class path is correct.

View solution in original post

15 REPLIES 15

Kaniz
Community Manager
Community Manager

Hi @ Manoj! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Manoj
Contributor II

Hi @Kaniz Fatma​ , Nice meeting you! Have you got a chance to create permanent functions in data bricks using python scripts, I am trying to anonymize the data using permanent views and permanent functions.. but i am having trouble in creating permanent functions

Kaniz
Community Manager
Community Manager

Hi @Manoj Kumar Rayalla​ , The error could be due to the unavailability of jars in the worker nodes.

Manoj
Contributor II

Hi @Werner Stinckens​ , Yes i followed the same instructions and was trying to solve this first for java program then later planning to convert python script to a jar file.

Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class file related jar into the project,

Code :

import org.apache.hadoop.hive.ql.exec.UDF;

public class SimpleUdf extends UDF {

public int evaluate(int value) {

return value + 10;

}

}

Jar file Link :

http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm

I placed it on the File Store of data bricks cluster and kept throwing the same error, please make sure the class path is correct.

I see you are from Data Bricks did you guys any documentation that helps me understand with an example...

Below is there but its not clear

https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-functio...

-werners-
Esteemed Contributor III

it seems the cluster cannot find the jar. My guess is your path is incorrect/incomplete or as the error says it: you have to let the cluster know where the jar can be found (classpath)

Hubert-Dudek
Esteemed Contributor III

probably correct path is /dbfs/tmp/SimpleUdf.jar

jose_gonzalez
Moderator
Moderator

hi @Manoj Kumar Rayalla​ ,

Like @Hubert Dudek​  mentioned, make sure to check what is the exact location for your Jar file. Make sure to list your path first to check if the file it located in the right location.

Manoj
Contributor II

Thanks for the Suggestions @Werner Stinckens​  @Jose Gonzalez​ 

@Hubert Dudek​  . No luck in changing the path

Did anyone else got the same error while creating permanent functions using python script? I see java and scala has documentation but not python.

Thanks,

Manoj

-werners-
Esteemed Contributor III

ok now I see the issue: you want to use python, but provide a jar (for java/scala).

I remember another post asking for sql functions in python:

I would think the USING FILE would work.

As long as you follow the class_name requirements.

The implementing class should extend one of the base classes as follows:

  • Should extend UDF or UDAF in org.apache.hadoop.hive.ql.exec package.
  • Should extend AbstractGenericUDAFResolver, GenericUDF, or GenericUDTF in org.apache.hadoop.hive.ql.udf.generic package.
  • Should extend UserDefinedAggregateFunction in org.apache.spark.sql.expressions package.

Also the docs literally state python is possible:

In addition to the SQL interface, Spark allows you to create custom user defined scalar and aggregate functions using Scala, Python, and Java APIs. See User-defined scalar functions (UDFs) and User-defined aggregate functions (UDAFs) for more information.

Hi @Werner Stinckens​ , Yes i followed the same instructions and was trying to solve this with a java program then later planning to convert python script to a jar file.

Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class related jar into the project,

Code :

import org.apache.hadoop.hive.ql.exec.UDF;

public class SimpleUdf extends UDF {

public int evaluate(int value) {

return value + 10;

}

}

Jar file Link :

http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm

And then tried the below commands

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/tmp/SimpleUdf.jar';

AND

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/dbfs/tmp/SimpleUdf.jar';

%sql

select simple_udf(2)

I placed it on the File Store of data bricks cluster and it kept throwing the same error, please make sure the class path is correct.

-werners-
Esteemed Contributor III

Ok, so I tried your example myself.

The issue lies within your jar.

  1. the jar you provide is not called 'SimpleUdf.jar' but 'hive_0_4_1.jar'.
  2. your jar should contain a class called 'SimpleUdf', but there is no such function in there.

So I used your jar and tried myself, using a class which is in the jar:

image 

image 

What this function does, I have no idea. But it is important that you point to an existing class, and take the complete name, not only the last part (so in this case org.apache.hadoop.hive.ql.udf.UDFLog and not just UDFLog).

Hope this helps.

Manoj
Contributor II

hi @Werner Stinckens​  @Jose Gonzalez​  @Hubert Dudek​ @Kaniz Fatma​ 

Thanks for all the help, Appreciate it. I was able to create permanent functions and use eclipse to create the runnable jar. However, Does anyone have any idea on how to deploy the jar on SQL End Point Cluster.

-werners-
Esteemed Contributor III

I doubt if that is possible at the moment, it would also go against the use of photon (which is C++).

Hubert-Dudek
Esteemed Contributor III

As SQL End Point is serverless I doubt it is possible

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.