cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Deriving a relation between spark job and underlying code

Subhrajyoti
New Contributor

For one of our requirement, we need to derive a relation between spark job, stage ,task id with the underlying code executed after a workflow job is getting triggered using a job cluster. So far we are able to develop a relation between the Workflow job id and spark job,task,stage id. 

Please suggest how should we proceed to derive a tabular relation between the spark job id along with the underlying code in the notebook.

 

 

1 REPLY 1

VZLA
Databricks Employee
Databricks Employee

Hi @Subhrajyoti thanks for your question!

I'm not sure if you have tried this already, but by combining listener logs with structured tabular data, you can create a clear mapping between Spark job executions and the corresponding notebook code. You could leverage Spark’s JobListener interface to capture job, stage, and task information programmatically. Implement a custom listener to log job start and end events along with their properties (e.g., job ID, associated stages, and triggering commands). This lets you capture metadata about jobs as they run, enabling correlation with your notebook code.

Then, you can log relationships between Spark jobs and the triggering notebook code into a structured storage system like a Delta table and use Python logging or spark.sparkContext.setJobDescription to associate a job with a meaningful description (e.g., the code snippet or notebook cell), e.g.:

spark.sparkContext.setJobDescription("ETL Step 1: Load Data")

...And finally write metadata to the table during execution for post-analysis, e.g:

import datetime
spark.sql(f"""
    INSERT INTO spark_job_metadata VALUES (
        '{job_id}', '{stage_id}', '{task_id}', 'example_notebook', 'ETL Step 1', '{datetime.datetime.now()}'
    )
""")

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group