Databricks Community

ayush25091995 · ‎07-02-2024

Hi,

we are using SQL history query API by selecting catalog and schema name directly from SQL editor instead of passing it through query, we are not getting the schema name and catalog name in query text for that particular id.

So, how can we get the schema and catalog name for that types of queries because the queryid also does not match with the respective system generated query.

Thanks,

Ayush

raphaelblg · ‎07-02-2024

Hi @ayush25091995 , check the "Query Profile" tab in the UI or check the Spark plans through the Spark UI.

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

View solution in original post

raphaelblg · ‎07-02-2024

Hi @ayush25091995 , check the "Query Profile" tab in the UI or check the Spark plans through the Spark UI.

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

ayush25091995 · ‎07-02-2024

Hi @raphaelblg , we are using REST API for sql warehouse to get the query history and query is being written in sql editor with selected catalog and schema name in drop down of editor, so for that particulars queries, we are just getting table name in response, not the schema and catalog name

mtajmouati · ‎07-02-2024

Hi,

You can try :

# Function to extract schema and catalog from query history
def extract_schema_catalog(query_history):
    for query in query_history.get("res", []):
        query_id = query.get("query_id")
        query_text = query.get("query_text")
        catalog = query.get("metadata", {}).get("details", {}).get("catalog")
        schema = query.get("metadata", {}).get("details", {}).get("schema")
        print(f"Query ID: {query_id}")
        print(f"Query Text: {query_text}")
        print(f"Catalog: {catalog}")
        print(f"Schema: {schema}")

ayush25091995 · ‎07-03-2024

Hi @mtajmouati, there is no field/key called metadata in API response for sql query history.
https://docs.databricks.com/api/azure/workspace/queryhistory/list

mtajmouati · ‎07-03-2024

True ! try this :

import requests
import json

# Define your Databricks workspace URL and API token
databricks_instance = "https://<your-databricks-instance>"
api_token = "dapi<your-api-token>"

# Fetch SQL query history
def get_query_history():
    url = f"{databricks_instance}/api/2.0/sql/history/queries"
    headers = {
        "Authorization": f"Bearer {api_token}"
    }
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    return response.json()

# Retrieve query history
query_history = get_query_history()

# Print retrieved query history for inspection
print(json.dumps(query_history, indent=4))

def extract_query_metadata(query_history):
    for query in query_history.get("res", []):
        query_id = query.get("query_id")
        query_text = query.get("query_text")
        user_name = query.get("user_name")
        # Fetch metadata from your Databricks environment or catalog
        # This is a placeholder function
        catalog, schema = fetch_catalog_and_schema(query_text, user_name)
        print(f"Query ID: {query_id}")
        print(f"Query Text: {query_text}")
        print(f"Catalog: {catalog}")
        print(f"Schema: {schema}")
        print("---")

# Placeholder function to fetch catalog and schema based on query text and user context
def fetch_catalog_and_schema(query_text, user_name):
    # Implement logic to fetch catalog and schema
    # For example, parse the query text to identify table names
    # and cross-reference with known metadata
    # Return default values if necessary
    catalog = "default_catalog"
    schema = "default_schema"
    return catalog, schema

# Extract and print metadata
extract_query_metadata(query_history)

ayush25091995 · ‎07-03-2024

@mtajmouati, i got your point and even thought same but the issue is, lets say, i have got table name after parsing query text, and if i match with metadata to find catalog and schema, there could be possibility that same table might be present in multiple schema and catalog right?
How will i make sure the query that i run for my catalog and schema is which one? below image if you see, after selecting catalog and schema, i am just running simple query and in query history i am getting without schema and catalog for below query, so how will i make sure i ran query for this particular catalog and schema? what if multiple entries match with same table name right.

SQL_Editor:-

Query history:-