11-18-2019 12:59 PM
I am trying to create a JAR for a Azure Databricks job but some code that works when using the notebook interface does not work when calling the library through a job. The weird part is that the job will complete the first run successfully but on any subsequent runs, it will fail. I have to restart my cluster to get it to run and then it will fail again on the second run.
I have created a view on a dataframe :
val df = spark.read.parquet(path)
df.createOrReplaceTempView("table1")
However, when I go to query the view with an aggregate function it yields an error:
val get_max_id_array = spark.sql("SELECT MAX(%s) FROM table1".format(get_id_column_array(0))).first()
Error:
ERROR Uncaught throwable from user code: org.apache.spark.sql.AnalysisException: Undefined function: 'MAX'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
11-18-2019 11:15 PM
Hi @Tyler Tamasauckas,
Please try as max(df("column_name")) please have look at below blog post regarding max function
https://www.programcreek.com/scala/org.apache.spark.sql.functions.max
02-27-2020 03:50 AM
Hi @Tyler Tamasauckas ,
I was also facing same issue with the sql functions 'upper' and ‘hash’.
In the jar we have to call SparkSession.builder().getOrCreate() or SparkContext.getOrCreate() API to get the spark/sparkcontext instance.
In the jar if we use object and main() method approach, upon using for the first time it works fine, later on it is somehow .. strangely losing the instance. Don't know the exact reason for that.
The work around is to use “object .. extends App” approach in the jar, then it is working.
The App trait approach is taking 10 seconds more time when compared to object with main method. This is for the first time only, that too for the first activity. It is because the App trait uses delayed initialization feature. Applies to all Scala Applications.
If we still need to use main method approach, define spark instance as implicit and use that implicit wherever we use that instance.
e.g.
object SomeName {
def UserDefinedMethod(query:String)(implicit spark:SparkSession) = {spark.sql(query)} // This UserDefinedMethod gets spark implicitly.
def main(args: Array[String]): Unit = {
implicit val spark = SparkSession.builder().getOrCreate()
spark…
}
}
Note: Object extends App will get the arguments from Scala 2.9 onward.
05-16-2022 06:54 AM
Hi, @omprakash.scala@gmail.com
Could you please tell more about the issue you had and its solution?
We now have a similar problem, a job failed on the second run with the exception "Undefined function: to_unix_timestamp. This function is neither a built-in/temporary function..." and the only fix is to restart the cluster, I tried to change my main class to "object ... extends App" approach but it still didn't work.
I searched over the internet and found this post is the only possible clue, looking forward for your response.
Thanks,
Chen
10-12-2022 12:57 AM
I am facing similar issue when trying to use from_utc_timestamp function. I am able to call the function from databricks notebook but when I use the same function inside my java jar and running as a job in databricks, it is giving below error.
AnalysisException: Undefined function: from_utc_timestamp. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.from_utc_timestamp.;
08-15-2024 02:26 AM
Did you find the solution?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group