cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to run a saved query from a Notebook (PySpark)

pacman
New Contributor

Hi Team! Noob to Databricks, so apologies if I ask a dumb question.

I have created a relatively large series of queries that fetch and organize the data I want.  I'm ready to drive all of these from a Notebook (likely PySpark).

An example query is saved as test1 under my user account (it shows up under the Queries heading in the left navbar).  I've also created a Notebook under that same account called Investigations (and both the notebook and query show up under Workspace).

My sample code is simple:

query_name = "test1"
results = spark.sql(f"SELECT * FROM {query_name}")

# Show the results
display(results)

 
But it produces the TABLE_OR_VIEW_NOT_FOUND error:

AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `test1` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.

 
I'm sure the issue is just that I'm not properly referencing the query.  Help!!

6 REPLIES 6

Wojciech_BUK
Valued Contributor III

Databricks is just pasting string value "test1" inside your code , so it is equivalent of running SQL command.

SELECT * FROM test1

So it is expecting to read data from existing table test1, but it can't find it

All tables can be referenced by providing catalog name , schema name and table name like: my_catalog.my_schema.test1

Or 

You can point your current session to use certain calatlog and schema , like : USE my_catalog.my_schema

Then it will always try to search a table in this schema and you can refere to table like you did.

I am not sure if this is what you are looking for but if it won't help,  please attach screeahot from your UI

Walter_C
Databricks Employee
Databricks Employee

The error you're encountering is because you're trying to execute a SQL query on a saved query name, which is not a table or view in your database. The spark.sql() function is used to run SQL queries directly on tables or views in your Spark session.

If you want to execute a saved query, you need to fetch the SQL text of the saved query first. However, Databricks does not provide a built-in function to fetch the SQL text of a saved query directly in a notebook.

As a workaround, you can manually copy the SQL text of your saved query and use it in your notebook. Here's an example:

 

# Replace this with the SQL text of your saved query
sql_text = """
SELECT *
FROM your_table
WHERE your_condition
"""

results = spark.sql(sql_text)

# Show the results
display(results)
 

 

In this example, replace sql_text with the actual SQL text of your saved query. This way, you're running the SQL query directly on your data, not trying to run it on the saved query name.

If you have a large number of saved queries and manually copying the SQL text is not feasible, you might need to consider using Databricks' REST API or CLI to automate the process of fetching the SQL text of your saved queries.

uday_satapathy
Databricks Employee
Databricks Employee

You can do something like this using databricks-sdk:

Here's the saved query

Screenshot 2024-10-17 at 2.48.51โ€ฏPM.png

 

Pyspark code:

import re
from pyspark.sql import DataFrame
from databricks.sdk import WorkspaceClient

def run_saved_query(query_id:str, **params) -> DataFrame:
  query_body, args = get_query_by_id(query_id)
  query = replace_double_curly(query_body)
  df = spark.sql(query, **params)
  return df

def get_query_by_id(query_id):
  w = WorkspaceClient()
  q = w.queries.get(query_id)
  query_body = q.query
  query_params = [p.name for p in q.options.parameters]
  return query_body, query_params

def replace_double_curly(text):
    # Adjust regex to match the outermost double curly braces without capturing nested ones
    def strip_outer_whitespace(match):
        inner_text = match.group(1)
        return f'{{{inner_text.strip()}}}'
    
    # Match only the outermost {{...}} braces
    result = re.sub(r'\{\{\s*([^{}]*?)\s*\}\}', strip_outer_whitespace, text)
    return result

params = {"year": 2016}
df = run_saved_query('d045b229-e5d0-4de0-bcbc-bec6cbba61d7', **params)
df.display()

Screenshot 2024-10-17 at 2.54.03โ€ฏPM.png

 

With the new update to the Databricks SQL editor, how would one do this now? A query's id is no longer in the query URL

Walter_C
Databricks Employee
Databricks Employee

How does it shows to you when opening a saved query in the new editor? I have checked on my side and it is still showing the id.

Walter_C
Databricks Employee
Databricks Employee

You can also get the query ids by listing the queries through API call https://docs.databricks.com/api/workspace/queries/list 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group