Hi @Subhrajyoti thanks for your question!
I'm not sure if you have tried this already, but by combining listener logs with structured tabular data, you can create a clear mapping between Spark job executions and the corresponding notebook code. You could leverage Spark’s JobListener interface to capture job, stage, and task information programmatically. Implement a custom listener to log job start and end events along with their properties (e.g., job ID, associated stages, and triggering commands). This lets you capture metadata about jobs as they run, enabling correlation with your notebook code.
Then, you can log relationships between Spark jobs and the triggering notebook code into a structured storage system like a Delta table and use Python logging or spark.sparkContext.setJobDescription to associate a job with a meaningful description (e.g., the code snippet or notebook cell), e.g.:
spark.sparkContext.setJobDescription("ETL Step 1: Load Data")
...And finally write metadata to the table during execution for post-analysis, e.g:
import datetime
spark.sql(f"""
INSERT INTO spark_job_metadata VALUES (
'{job_id}', '{stage_id}', '{task_id}', 'example_notebook', 'ETL Step 1', '{datetime.datetime.now()}'
)
""")