Databricks Community

Subhrajyoti · ‎11-19-2024

For one of our requirement, we need to derive a relation between spark job, stage ,task id with the underlying code executed after a workflow job is getting triggered using a job cluster. So far we are able to develop a relation between the Workflow job id and spark job,task,stage id.

Please suggest how should we proceed to derive a tabular relation between the spark job id along with the underlying code in the notebook.

VZLA · 2 weeks ago

Hi @Subhrajyoti thanks for your question!

I'm not sure if you have tried this already, but by combining listener logs with structured tabular data, you can create a clear mapping between Spark job executions and the corresponding notebook code. You could leverage Spark’s JobListener interface to capture job, stage, and task information programmatically. Implement a custom listener to log job start and end events along with their properties (e.g., job ID, associated stages, and triggering commands). This lets you capture metadata about jobs as they run, enabling correlation with your notebook code.

Then, you can log relationships between Spark jobs and the triggering notebook code into a structured storage system like a Delta table and use Python logging or spark.sparkContext.setJobDescription to associate a job with a meaningful description (e.g., the code snippet or notebook cell), e.g.:

spark.sparkContext.setJobDescription("ETL Step 1: Load Data")

...And finally write metadata to the table during execution for post-analysis, e.g:

import datetime
spark.sql(f"""
    INSERT INTO spark_job_metadata VALUES (
        '{job_id}', '{stage_id}', '{task_id}', 'example_notebook', 'ETL Step 1', '{datetime.datetime.now()}'
    )
""")

Databricks Community

Deriving a relation between spark job and underlying code

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences