Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @ADBricksExplore , 

Short answer: there isn’t a supported public API that returns the “Substatements / Subqueries” panel you see in the Query History or Profile UI. The GraphQL endpoints the UI relies on are internal and not stable or supported for automation.

What you can do—reliably and with supported surfaces—is reconstruct those child statements using the Query History system table or the Query History API / CLI, keyed off the parent statement’s session and run metadata.

What works today (supported and durable)

  • Use the system.query.history system table to pull every statement executed in the same job task run and/or session as the parent CALL. The parent CALL and each child SQL statement show up as separate rows with their own statement_id values.

  • system.query.history includes statement_id, session_id, and a rich query_source struct with job metadata, including job_task_run_id, which is the cleanest way to correlate children back to the parent.

  • You can jump from any system.query.history row to the UI profile using statement_id, but there’s no public API to programmatically fetch the operator-level DAG or profile JSON beyond manual UI export.

  • If you prefer APIs or CLI, the Query History REST API or the databricks query-history list –include-metrics command works well; you just filter client-side by session, time window, or job_task_run_id.

  • Query History system table retention is currently published as 365 days, so plan periodic ingestion if you need longer-term analysis.

  • In serverless workspaces, Query History captures SQL and Python statements from notebooks and jobs as well.

How to pull child statements for a given CALL

  1. Look up the parent context

    Once you have the parent statement_id (from the UI or logs), pull its execution metadata:

 

SELECT
  statement_id,
  session_id,
  start_time,
  end_time,
  executed_by,
  compute,
  query_source
FROM system.query.history
WHERE statement_id = '<PARENT_STATEMENT_ID>';

session_id identifies the execution session.

query_source.job_info.job_task_run_id, when present, identifies the specific Job Task Run.

2A) Preferred: correlate by Job Task Run

If job_task_run_id is populated, use it to fetch everything executed by that task and exclude the parent:

WITH parent AS (
  SELECT
    query_source.job_info.job_task_run_id AS jtr_id,
    statement_id AS parent_id
  FROM system.query.history
  WHERE statement_id = '<PARENT_STATEMENT_ID>'
)
SELECT h.*
FROM system.query.history h
JOIN parent p
  ON h.query_source.job_info.job_task_run_id = p.jtr_id
WHERE h.statement_id <> p.parent_id
ORDER BY h.start_time;

This mirrors how the UI groups sub-statements under a CALL within a job task run and is the most robust approach.

2B) Fallback: session_id plus time window

If job_task_run_id isn’t available, fall back to session and time bounds:

WITH parent AS (
  SELECT
    session_id,
    start_time AS t0,
    end_time   AS t1,
    statement_id AS parent_id
  FROM system.query.history
  WHERE statement_id = '<PARENT_STATEMENT_ID>'
)
SELECT h.*
FROM system.query.history h
JOIN parent p
  ON h.session_id = p.session_id
WHERE h.start_time >= p.t0
  AND (p.t1 IS NULL OR h.end_time <= p.t1)
  AND h.statement_id <> p.parent_id
ORDER BY h.start_time;
  1. Make it incremental and durable

    For ongoing analysis, ingest Query History deltas into your own table using update_time or start_time:

 

INSERT INTO my_org.sql_substatement_log
SELECT
  h.account_id,
  h.workspace_id,
  h.statement_id,
  h.session_id,
  h.execution_status,
  h.statement_text,
  h.statement_type,
  h.start_time,
  h.end_time,
  h.execution_duration_ms,
  h.read_rows,
  h.read_bytes,
  h.produced_rows,
  h.written_bytes,
  h.query_source
FROM system.query.history h
WHERE h.update_time > TIMESTAMPADD(MINUTE, -15, CURRENT_TIMESTAMP);

Helpful realities to be aware of

  • There is no documented public API to fetch the UI’s “Substatements” block or the full operator-level DAG programmatically. Manual UI export is the only supported way to get that JSON today.

  • For lightweight pulls, the CLI is convenient:

    databricks query-history list –include-metrics –max-results 1000 …

  • You can filter Query History by compute, user, time range, and more via UI or API, then jump to the profile using statement_id when you need deeper inspection.

  • Profiles are not generated for results served entirely from cache. If you need a profile, rerun the query with a trivial change.

  • Make sure system tables are enabled in your account and metastore; otherwise system.query.history won’t be available.

 

End-to-end pattern (SQL)

Given a parent CALL statement_id, fetch its children and persist them:

WITH parent AS (
  SELECT
    statement_id AS parent_id,
    coalesce(query_source.job_info.job_task_run_id, '') AS jtr_id,
    session_id,
    start_time AS t0,
    end_time   AS t1
  FROM system.query.history
  WHERE statement_id = '<PARENT_STATEMENT_ID>'
),
children AS (
  SELECT h.*
  FROM system.query.history h
  JOIN parent p
    ON (
         (p.jtr_id <> '' AND h.query_source.job_info.job_task_run_id = p.jtr_id)
         OR
         (p.jtr_id = ''  AND h.session_id = p.session_id
                       AND h.start_time >= p.t0
                       AND (p.t1 IS NULL OR h.end_time <= p.t1))
       )
  WHERE h.statement_id <> p.parent_id
)
INSERT INTO my_org.sql_substatement_log
SELECT
  statement_id,
  session_id,
  statement_type,
  statement_text,
  execution_status,
  start_time,
  end_time,
  execution_duration_ms,
  read_rows,
  read_bytes,
  produced_rows,
  written_bytes
FROM children
ORDER BY start_time;

API and CLI equivalents

  • REST: /api/2.0/sql/history/queries, filter by time or warehouse, then correlate by session_id or job_task_run_id after identifying the parent.

  • CLI: databricks query-history list –include-metrics, followed by client-side filtering.

Why the GraphQL approach doesn’t work

The GraphQL endpoints visible in browser network traces are internal UI interfaces. They are not supported for automation and won’t work reliably outside the browser context. The supported surfaces are the UI, REST API, CLI, and system tables—with system.query.history being the recommended option for scalable, repeatable ingestion.

 

Hope this helps, Louis.