topic Re: How to run a saved query from a Notebook (PySpark) in Data Engineering

How to run a saved query from a Notebook (PySpark)

pacman — Fri, 29 Dec 2023 18:17:46 GMT

Hi Team! Noob to Databricks, so apologies if I ask a dumb question.

I have created a relatively large series of queries that fetch and organize the data I want. I'm ready to drive all of these from a Notebook (likely PySpark).

An example query is saved as test1 under my user account (it shows up under the Queries heading in the left navbar). I've also created a Notebook under that same account called Investigations (and both the notebook and query show up under Workspace).

My sample code is simple:

query_name = "test1"
results = spark.sql(f"SELECT * FROM {query_name}")

# Show the results
display(results)

But it produces the TABLE_OR_VIEW_NOT_FOUND error:

AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `test1` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.

I'm sure the issue is just that I'm not properly referencing the query. Help!!

Re: How to run a saved query from a Notebook (PySpark)

Wojciech_BUK — Fri, 29 Dec 2023 23:13:33 GMT

Databricks is just pasting string value "test1" inside your code , so it is equivalent of running SQL command.

SELECT * FROM test1

So it is expecting to read data from existing table test1, but it can't find it

All tables can be referenced by providing catalog name , schema name and table name like: my_catalog.my_schema.test1

You can point your current session to use certain calatlog and schema , like : USE my_catalog.my_schema

Then it will always try to search a table in this schema and you can refere to table like you did.

I am not sure if this is what you are looking for but if it won't help, please attach screeahot from your UI

Re: How to run a saved query from a Notebook (PySpark)

Walter_C — Sat, 23 Mar 2024 14:14:52 GMT

The error you're encountering is because you're trying to execute a SQL query on a saved query name, which is not a table or view in your database. The spark.sql() function is used to run SQL queries directly on tables or views in your Spark session.

If you want to execute a saved query, you need to fetch the SQL text of the saved query first. However, Databricks does not provide a built-in function to fetch the SQL text of a saved query directly in a notebook.

As a workaround, you can manually copy the SQL text of your saved query and use it in your notebook. Here's an example:

# Replace this with the SQL text of your saved query
sql_text = """
SELECT *
FROM your_table
WHERE your_condition
"""

results = spark.sql(sql_text)

# Show the results
display(results)

In this example, replace sql_text with the actual SQL text of your saved query. This way, you're running the SQL query directly on your data, not trying to run it on the saved query name.

If you have a large number of saved queries and manually copying the SQL text is not feasible, you might need to consider using Databricks' REST API or CLI to automate the process of fetching the SQL text of your saved queries.

Re: How to run a saved query from a Notebook (PySpark)

uday_satapathy — Thu, 17 Oct 2024 18:54:47 GMT

You can do something like this using databricks-sdk:

Here's the saved query

Pyspark code:

import re from pyspark.sql import DataFrame from databricks.sdk import WorkspaceClient def run_saved_query(query_id:str, **params) -> DataFrame: query_body, args = get_query_by_id(query_id) query = replace_double_curly(query_body) df = spark.sql(query, **params) return df def get_query_by_id(query_id): w = WorkspaceClient() q = w.queries.get(query_id) query_body = q.query query_params = [p.name for p in q.options.parameters] return query_body, query_params def replace_double_curly(text): # Adjust regex to match the outermost double curly braces without capturing nested ones def strip_outer_whitespace(match): inner_text = match.group(1) return f'{{{inner_text.strip()}}}' # Match only the outermost {{...}} braces result = re.sub(r'\{\{\s*([^{}]*?)\s*\}\}', strip_outer_whitespace, text) return result params = {"year": 2016} df = run_saved_query('d045b229-e5d0-4de0-bcbc-bec6cbba61d7', **params) df.display()

Re: How to run a saved query from a Notebook (PySpark)

cloudy_data — Thu, 26 Dec 2024 22:48:46 GMT

With the new update to the Databricks SQL editor, how would one do this now? A query's id is no longer in the query URL

Re: How to run a saved query from a Notebook (PySpark)

Walter_C — Fri, 03 Jan 2025 22:20:21 GMT

How does it shows to you when opening a saved query in the new editor? I have checked on my side and it is still showing the id.

Re: How to run a saved query from a Notebook (PySpark)

Walter_C — Fri, 03 Jan 2025 22:23:05 GMT

You can also get the query ids by listing the queries through API call https://docs.databricks.com/api/workspace/queries/list

Re: How to run a saved query from a Notebook (PySpark)

aethorimn_cgr — Wed, 25 Jun 2025 23:58:40 GMT

@uday_satapathy Hi Uday. Do you know if this method works for many users? In case I need to share the script so a teammate may use it.