cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

AnalysisException when running SQL queries

Merchiv
New Contributor III

When running some SQL queries using spark.sql(...), we sometimes get a variant of the following error:

AnalysisException: Undefined function: current_timestamp. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.current_timestamp.; line 2 pos 35

The function that is missing sometimes changes (e.g. in other cases it's UUID()), but they are functions that are standard Databricks SQL builtins.

The SQL queries are being called using pyspark code that is inside a module.

Example:

(not actual code, but edited version since I can't paste everything here)

In the notebook we run:

from sql_utilities import example_log_status
 
example_log_status(id, status)

Which imports code from our module sql_utilities.py:

def example_log_status(id, status):
    query= f"""UPDATE foo.exampleTable
    SET Status="{status}", ModifiedAt=current_timestamp()
    WHERE RunLogId="{RunLogId}"
    """
    spark.sql(query)

Extra information:

Databricks runtime: 11.3 LTS

This only happens when jobs are scheduled from Azure DataFactory. We were not able to replicate this by manually running these queries. The jobs are typically scheduled in parallel on the same cluster.

1 ACCEPTED SOLUTION

Accepted Solutions

GRCL
New Contributor III

Here is the resolution for my particular case, it should help you for the first topic issue.

We were using those 2 lines of code in a library (whl file library in Databricks cluster global init script) created by our own (needed to do that in v9.1 because spark session was not available without this code) :

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

On v11.3 we found that we have now to replace the 2 lines of code by only one (get the active session), previous method seems deprecated :

spark = SparkSession.getActiveSession()

And everything works well !

View solution in original post

8 REPLIES 8

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

Please open a support ticket, this might be a bug.

Thanks, I opened a ticket and I'll update when I have a response.

Anonymous
Not applicable

Hi @Ivo Merchiers​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

GRCL
New Contributor III

Hi, we have the same issue since few days when executing notebooks from Azure Data Factory :

AnalysisException: Undefined function: count. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.count.

We tried a lot of things to find where is the problem but didn't find yet, support ticket has been opened ...

For now we think that it is something relative to the SparkContext, a link to jar seems to be broken, if we detach and re-attach the notebook (no need to restart it) it works.

11.3 LTS

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@Clement G​ you did the right thing by opening a support ticket. Please let me know how it goes.

GRCL
New Contributor III

Here is the resolution for my particular case, it should help you for the first topic issue.

We were using those 2 lines of code in a library (whl file library in Databricks cluster global init script) created by our own (needed to do that in v9.1 because spark session was not available without this code) :

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

On v11.3 we found that we have now to replace the 2 lines of code by only one (get the active session), previous method seems deprecated :

spark = SparkSession.getActiveSession()

And everything works well !

Merchiv
New Contributor III

That was also the suggestion from databricks support that helped in our case.

ashish1
New Contributor III

This is most likely a conflict in the lib code, you can uninstall some libs on your cluster and try to narrow it down to the problematic one.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!