cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Did any one in the community create Permanent Functions using using python script ? I tried but i am getting the below error, Please advise

Manoj
Contributor II

Hi Team, When i am trying to register a permanant function i am getting the below error.

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/tmp/SimpleUdf.jar';

%sql

select simple_udf(2)

Error Details :

com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: Can not load class 'SimpleUdf' when registering the function 'default.simple_udf', please make sure it is on the classpath;

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @Werner Stinckensโ€‹ , Yes i followed the same instructions and was trying to solve this with a java program then later planning to convert python script to a jar file.

Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class related jar into the project,

Code :

import org.apache.hadoop.hive.ql.exec.UDF;

public class SimpleUdf extends UDF {

public int evaluate(int value) {

return value + 10;

}

}

Jar file Link :

http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm

And then tried the below commands

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/tmp/SimpleUdf.jar';

AND

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/dbfs/tmp/SimpleUdf.jar';

%sql

select simple_udf(2)

I placed it on the File Store of data bricks cluster and it kept throwing the same error, please make sure the class path is correct.

View solution in original post

15 REPLIES 15

Kaniz_Fatma
Community Manager
Community Manager

Hi @ Manoj! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Hi @Kaniz Fatmaโ€‹ , Nice meeting you! Have you got a chance to create permanent functions in data bricks using python scripts, I am trying to anonymize the data using permanent views and permanent functions.. but i am having trouble in creating permanent functions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Manoj Kumar Rayallaโ€‹ , The error could be due to the unavailability of jars in the worker nodes.

Manoj
Contributor II

Hi @Werner Stinckensโ€‹ , Yes i followed the same instructions and was trying to solve this first for java program then later planning to convert python script to a jar file.

Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class file related jar into the project,

Code :

import org.apache.hadoop.hive.ql.exec.UDF;

public class SimpleUdf extends UDF {

public int evaluate(int value) {

return value + 10;

}

}

Jar file Link :

http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm

I placed it on the File Store of data bricks cluster and kept throwing the same error, please make sure the class path is correct.

I see you are from Data Bricks did you guys any documentation that helps me understand with an example...

Below is there but its not clear

https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-functio...

-werners-
Esteemed Contributor III

it seems the cluster cannot find the jar. My guess is your path is incorrect/incomplete or as the error says it: you have to let the cluster know where the jar can be found (classpath)

Hubert-Dudek
Esteemed Contributor III

probably correct path is /dbfs/tmp/SimpleUdf.jar

jose_gonzalez
Moderator
Moderator

hi @Manoj Kumar Rayallaโ€‹ ,

Like @Hubert Dudekโ€‹  mentioned, make sure to check what is the exact location for your Jar file. Make sure to list your path first to check if the file it located in the right location.

Manoj
Contributor II

Thanks for the Suggestions @Werner Stinckensโ€‹  @Jose Gonzalezโ€‹ 

@Hubert Dudekโ€‹  . No luck in changing the path

Did anyone else got the same error while creating permanent functions using python script? I see java and scala has documentation but not python.

Thanks,

Manoj

-werners-
Esteemed Contributor III

ok now I see the issue: you want to use python, but provide a jar (for java/scala).

I remember another post asking for sql functions in python:

I would think the USING FILE would work.

As long as you follow the class_name requirements.

The implementing class should extend one of the base classes as follows:

  • Should extend UDF or UDAF in org.apache.hadoop.hive.ql.exec package.
  • Should extend AbstractGenericUDAFResolver, GenericUDF, or GenericUDTF in org.apache.hadoop.hive.ql.udf.generic package.
  • Should extend UserDefinedAggregateFunction in org.apache.spark.sql.expressions package.

Also the docs literally state python is possible:

In addition to the SQL interface, Spark allows you to create custom user defined scalar and aggregate functions using Scala, Python, and Java APIs. See User-defined scalar functions (UDFs) and User-defined aggregate functions (UDAFs) for more information.

Hi @Werner Stinckensโ€‹ , Yes i followed the same instructions and was trying to solve this with a java program then later planning to convert python script to a jar file.

Below is the program that i used , Using Eclipse i was able to generate the .jar file successfully by adding "org.apache.hadoop.hive.ql.exec.UDF" this class related jar into the project,

Code :

import org.apache.hadoop.hive.ql.exec.UDF;

public class SimpleUdf extends UDF {

public int evaluate(int value) {

return value + 10;

}

}

Jar file Link :

http://www.java2s.com/Code/Jar/h/Downloadhive041jar.htm

And then tried the below commands

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/tmp/SimpleUdf.jar';

AND

%sql

CREATE FUNCTION simple_udf AS 'SimpleUdf'

  USING JAR '/dbfs/tmp/SimpleUdf.jar';

%sql

select simple_udf(2)

I placed it on the File Store of data bricks cluster and it kept throwing the same error, please make sure the class path is correct.

-werners-
Esteemed Contributor III

Ok, so I tried your example myself.

The issue lies within your jar.

  1. the jar you provide is not called 'SimpleUdf.jar' but 'hive_0_4_1.jar'.
  2. your jar should contain a class called 'SimpleUdf', but there is no such function in there.

So I used your jar and tried myself, using a class which is in the jar:

image 

image 

What this function does, I have no idea. But it is important that you point to an existing class, and take the complete name, not only the last part (so in this case org.apache.hadoop.hive.ql.udf.UDFLog and not just UDFLog).

Hope this helps.

Manoj
Contributor II

hi @Werner Stinckensโ€‹  @Jose Gonzalezโ€‹  @Hubert Dudekโ€‹ @Kaniz Fatmaโ€‹ 

โ€‹

Thanks for all the help, Appreciate it. I was able to create permanent functions and use eclipse to create the runnable jar. However, Does anyone have any idea on how to deploy the jar on SQL End Point Cluster.

-werners-
Esteemed Contributor III

I doubt if that is possible at the moment, it would also go against the use of photon (which is C++).

Hubert-Dudek
Esteemed Contributor III

As SQL End Point is serverless I doubt it is possible

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group