cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

org.apache.spark.sql.AnalysisException: Undefined function: 'MAX'

TylerTamasaucka
New Contributor II

I am trying to create a JAR for a Azure Databricks job but some code that works when using the notebook interface does not work when calling the library through a job. The weird part is that the job will complete the first run successfully but on any subsequent runs, it will fail. I have to restart my cluster to get it to run and then it will fail again on the second run.

I have created a view on a dataframe :

val df = spark.read.parquet(path)
df.createOrReplaceTempView("table1")

However, when I go to query the view with an aggregate function it yields an error:

val get_max_id_array = spark.sql("SELECT MAX(%s) FROM table1".format(get_id_column_array(0))).first()

Error:

ERROR Uncaught throwable from user code: org.apache.spark.sql.AnalysisException: Undefined function: 'MAX'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7

5 REPLIES 5

shyam_9
Databricks Employee
Databricks Employee

Hi @Tyler Tamasauckas,

Please try as max(df("column_name")) please have look at below blog post regarding max function

https://www.programcreek.com/scala/org.apache.spark.sql.functions.max

omprakash_scala
New Contributor II

Hi @Tyler Tamasauckas​ ,

I was also facing same issue with the sql functions 'upper' and ‘hash’.

In the jar we have to call SparkSession.builder().getOrCreate() or SparkContext.getOrCreate() API to get the spark/sparkcontext instance.

In the jar if we use object and main() method approach, upon using for the first time it works fine, later on it is somehow .. strangely losing the instance. Don't know the exact reason for that.

The work around is to use “object .. extends App” approach in the jar, then it is working.

The App trait approach is taking 10 seconds more time when compared to object with main method. This is for the first time only, that too for the first activity. It is because the App trait uses delayed initialization feature. Applies to all Scala Applications.

If we still need to use main method approach, define spark instance as implicit and use that implicit wherever we use that instance.

e.g.

object SomeName {

def UserDefinedMethod(query:String)(implicit spark:SparkSession) = {spark.sql(query)} // This UserDefinedMethod gets spark implicitly.

def main(args: Array[String]): Unit = {

implicit val spark = SparkSession.builder().getOrCreate()

spark…

}

}

Note: Object extends App will get the arguments from Scala 2.9 onward.

Hi, @omprakash.scala@gmail.com​ 

Could you please tell more about the issue you had and its solution?

We now have a similar problem, a job failed on the second run with the exception "Undefined function: to_unix_timestamp. This function is neither a built-in/temporary function..." and the only fix is to restart the cluster, I tried to change my main class to "object ... extends App" approach but it still didn't work.

I searched over the internet and found this post is the only possible clue, looking forward for your response.

Thanks,

Chen

skaja
New Contributor II

I am facing similar issue when trying to use from_utc_timestamp function. I am able to call the function from databricks notebook but when I use the same function inside my java jar and running as a job in databricks, it is giving below error.

AnalysisException: Undefined function: from_utc_timestamp. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.from_utc_timestamp.;

iwxshubham
New Contributor II

Did you find the solution?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group