cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

AnalysisException when running SQL queries

Merchiv
New Contributor III

When running some SQL queries using spark.sql(...), we sometimes get a variant of the following error:

AnalysisException: Undefined function: current_timestamp. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.current_timestamp.; line 2 pos 35

The function that is missing sometimes changes (e.g. in other cases it's UUID()), but they are functions that are standard Databricks SQL builtins.

The SQL queries are being called using pyspark code that is inside a module.

Example:

(not actual code, but edited version since I can't paste everything here)

In the notebook we run:

from sql_utilities import example_log_status
 
example_log_status(id, status)

Which imports code from our module sql_utilities.py:

def example_log_status(id, status):
    query= f"""UPDATE foo.exampleTable
    SET Status="{status}", ModifiedAt=current_timestamp()
    WHERE RunLogId="{RunLogId}"
    """
    spark.sql(query)

Extra information:

Databricks runtime: 11.3 LTS

This only happens when jobs are scheduled from Azure DataFactory. We were not able to replicate this by manually running these queries. The jobs are typically scheduled in parallel on the same cluster.

1 ACCEPTED SOLUTION

Accepted Solutions

GRCL
New Contributor III

Here is the resolution for my particular case, it should help you for the first topic issue.

We were using those 2 lines of code in a library (whl file library in Databricks cluster global init script) created by our own (needed to do that in v9.1 because spark session was not available without this code) :

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

On v11.3 we found that we have now to replace the 2 lines of code by only one (get the active session), previous method seems deprecated :

spark = SparkSession.getActiveSession()

And everything works well !

View solution in original post

8 REPLIES 8

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

Please open a support ticket, this might be a bug.

Thanks, I opened a ticket and I'll update when I have a response.

Anonymous
Not applicable

Hi @Ivo Merchiers​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

GRCL
New Contributor III

Hi, we have the same issue since few days when executing notebooks from Azure Data Factory :

AnalysisException: Undefined function: count. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.count.

We tried a lot of things to find where is the problem but didn't find yet, support ticket has been opened ...

For now we think that it is something relative to the SparkContext, a link to jar seems to be broken, if we detach and re-attach the notebook (no need to restart it) it works.

11.3 LTS

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

@Clement G​ you did the right thing by opening a support ticket. Please let me know how it goes.

GRCL
New Contributor III

Here is the resolution for my particular case, it should help you for the first topic issue.

We were using those 2 lines of code in a library (whl file library in Databricks cluster global init script) created by our own (needed to do that in v9.1 because spark session was not available without this code) :

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

On v11.3 we found that we have now to replace the 2 lines of code by only one (get the active session), previous method seems deprecated :

spark = SparkSession.getActiveSession()

And everything works well !

Merchiv
New Contributor III

That was also the suggestion from databricks support that helped in our case.

ashish1
New Contributor III

This is most likely a conflict in the lib code, you can uninstall some libs on your cluster and try to narrow it down to the problematic one.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.