cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Want to see logs for lineage view run events

jitendrajha11
New Contributor II

Hi All,

I need your help, as I am running jobs it is getting successful, when I click on job and there we can find lineage > View run events option when click on it. I see below steps.

  1. Job Started: The job is triggered.
  2. Waiting for Cluster: The job waits for the cluster to be ready.
  3. Cluster Ready: The cluster becomes ready to execute the job.
  4. Started Running: The job starts running.
  5. Succeeded: The job completes successfully after processing the data.

    I want to see all 5 stages logs, where I will see it in detail. I am going to stores logs in volume, there i am able to see driver, eventlog, executor etc. in which folder they are storing because I have checked all logs but not able to see any information. 

5 REPLIES 5

bianca_unifeye
New Contributor III

https://docs.databricks.com/aws/en/jobs/monitor#export-job-runs

In the article look for job export

For the compute:

  1. On the compute page, click the Advanced toggle.
  2. Click the Logging tab.
  3. Select a destination type.
  4. Enter the Log path.

https://docs.databricks.com/aws/en/compute/configure

nayan_wylde
Esteemed Contributor

The stages you mentioned—Job Started, Waiting for Cluster, Cluster Ready, Started Running, Succeeded—are Databricks job lifecycle events, not Spark events.
They are stored in Databricks internal job service, not in the driver/executor logs. You can access them via:

Jobs UI → View Run Events (what you already did)
Databricks REST API:
Use the https://docs.databricks.com/api/azure/workspace/jobs/getrun to retrieve detailed lifecycle events programmatically.

If you want to persist these lifecycle logs:

You need to export them via API and then write them to your volume or external storage.

The driver/event/executor logs will only show Spark-related execution details, not cluster provisioning or job trigger events.

 

 

 

 

in Jobs UI → View Run Events I am not able to see anything please find the attachment and provide information step by step 

jitendrajha11
New Contributor II

Hi Team/Member,

As I am running jobs it is getting successful, when I click on job and there we can find lineage > View run events option when click on it. We find below steps and also added screenshot of it. I want screenshot stages logs, where i will find logs for stages in the screenshot. 

  1. Job Started: The job is triggered.
  2. Waiting for Cluster: The job waits for the cluster to be ready.
  3. Cluster Ready: The cluster becomes ready to execute the job.
  4. Started Running: The job starts running.
  5. Succeeded: The job completes successfully after processing the data.

 

 

mitchellg-db
Databricks Employee
Databricks Employee

Hi there,

I vibe-coded* a query where I was able to derive most of your events from the system tables:

If you have SELECT access to system tables, this could be an efficient way to gather the events. You could set up a Spark Declarative Pipeline to perform incremental refreshes to build history. Again, the phases are derived from timestamps in the system tables, rather than being clearly labeled. I did NOT verify that they match the run events in the Job UI, so you may want to do that still:
  1. Job Started: job_start_time
  2. Waiting for Cluster: job_start_time -> first_task_start
  3. Cluster Ready: first_task_start timestamp
  4. Execution: first_task_start -> last_task_end
  5. Result: result_state + job_end_time
# Job Lifecycle Timeline Analysis
# Purpose: Derive job lifecycle phases by comparing job-level and task-level timing
# Dependencies: system.lakeflow.job_run_timeline, system.lakeflow.job_task_run_timeline, system.lakeflow.jobs
# Assumptions: Unity Catalog enabled, user has SELECT permissions on system tables

job_lifecycle_df = spark.sql("""
------------------------------------------------------------------------
-- CTE: job_runs
-- Source: system.lakeflow.job_run_timeline
-- Purpose: Extract the top-level job run records with start/end times
-- 
-- This table contains one row per job run with the overall run timing.
-- It does NOT break down into phases (waiting, running, etc.) - that's
-- why we need to join with task-level data to derive those phases.
-- 
-- Filtering to last 7 days for performance - adjust as needed.
------------------------------------------------------------------------
WITH job_runs AS (
  SELECT
    job_id,
    run_id,
    run_name,
    trigger_type,
    run_type,
    result_state,
    termination_code,
    period_start_time AS job_start_time,
    period_end_time AS job_end_time
  FROM system.lakeflow.job_run_timeline
  WHERE period_end_time >= CURRENT_DATE - INTERVAL 7 DAYS
),

------------------------------------------------------------------------
-- CTE: task_timing
-- Source: system.lakeflow.job_task_run_timeline
-- Purpose: Aggregate task-level timing to determine when actual work started/ended
-- 
-- The key insight: the GAP between job_start_time and the first task starting
-- represents "waiting for cluster" time. Tasks can't start until compute is ready.
-- 
-- We use MIN(period_start_time) to find when the first task began (cluster ready)
-- and MAX(period_end_time) to find when all tasks completed.
-- 
-- Join key: job_run_id in this table maps to run_id in job_run_timeline.
-- (Note: run_id in this table is the task's own run ID, not the parent job run)
------------------------------------------------------------------------
task_timing AS (
  SELECT
    job_id,
    job_run_id,
    MIN(period_start_time) AS first_task_start,
    MAX(period_end_time) AS last_task_end,
    COUNT(DISTINCT task_key) AS task_count
  FROM system.lakeflow.job_task_run_timeline
  WHERE period_end_time >= CURRENT_DATE - INTERVAL 7 DAYS
  GROUP BY job_id, job_run_id
),

------------------------------------------------------------------------
-- CTE: latest_jobs
-- Source: system.lakeflow.jobs
-- Purpose: Get the current job name for each job_id
-- 
-- The jobs table is versioned - every time a job config changes, a new row
-- is added with an updated change_time. Without deduplication, joining
-- directly causes row multiplication (one output row per job version).
-- 
-- ROW_NUMBER with PARTITION BY job_id ORDER BY change_time DESC assigns
-- rn=1 to the most recent version. We filter to rn=1 in the join.
------------------------------------------------------------------------
latest_jobs AS (
  SELECT
    job_id,
    name,
    ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY change_time DESC) AS rn
  FROM system.lakeflow.jobs
)

------------------------------------------------------------------------
-- Main SELECT
-- Purpose: Join the three CTEs to produce the job lifecycle view
-- 
-- Lifecycle phases derived:
--   1. Job Started:           job_start_time (from job_run_timeline)
--   2. Waiting for Cluster:   job_start_time -> first_task_start
--   3. Cluster Ready:         first_task_start timestamp
--   4. Execution:             first_task_start -> last_task_end
--   5. Result:                result_state + job_end_time
-- 
-- Additional derived metrics:
--   - cleanup_duration: time between last task completing and job officially ending
--   - total_duration: end-to-end job time
-- 
-- LEFT JOINs used because:
--   - Some jobs may fail before any tasks start (no task_timing records)
--   - Some jobs may have been deleted (no latest_jobs record)
-- 
-- Filtered to run_type = 'JOB_RUN' to exclude SUBMIT_RUN and WORKFLOW_RUN
-- which have different semantics.
------------------------------------------------------------------------
SELECT
  j.job_id,
  j.run_id,
  jobs.name AS job_name,
  j.trigger_type,
  j.run_type,
  
  -- Stage 1: Job Started
  j.job_start_time,

    -- Stage 2: Result
  j.job_end_time,
  j.result_state,
  j.termination_code,
  
  -- Stage 3-4: Waiting for Cluster (gap between job start and first task start)
  t.first_task_start AS cluster_ready_time,
  ROUND(TIMESTAMPDIFF(MILLISECOND, j.job_start_time, t.first_task_start) / 1000.0, 3) AS waiting_for_cluster_sec,
  
  -- Stage 5: Running (first task start to last task end)
  t.last_task_end AS execution_end_time,
  ROUND(TIMESTAMPDIFF(MILLISECOND, t.first_task_start, t.last_task_end) / 1000.0, 3) AS execution_duration_sec,
  
  -- Cleanup time (last task end to job end)
  ROUND(TIMESTAMPDIFF(MILLISECOND, t.last_task_end, j.job_end_time) / 1000.0, 3) AS cleanup_duration_sec,
  
  -- Total duration
  ROUND(TIMESTAMPDIFF(MILLISECOND, j.job_start_time, j.job_end_time) / 1000.0, 3) AS total_duration_sec,
  
  -- Task count for context
  t.task_count

FROM job_runs j
LEFT JOIN task_timing t
  ON j.job_id = t.job_id AND j.run_id = t.job_run_id
LEFT JOIN latest_jobs jobs
  ON j.job_id = jobs.job_id AND jobs.rn = 1
WHERE j.run_type = 'JOB_RUN'
ORDER BY j.job_start_time DESC
""")

display(job_lifecycle_df)

*Note: In accordance with our community Generative AI policy, I did personally verify the results in a Databricks workspace.