How to identify the goal of a specific Spark job?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2025 07:33 AM
I'm analyzing the performance of a DBR/Spark request. In this case, the cluster is created using a custom image, and then we run a job on it.
I've dived into the "Spark UI" part of the DBR interface, and identified 3 jobs that appear to account for an outsized amount of execution time: `write at WriteIntoDeltaCommand.scala:85`, `collect at GenerateSymlinkManifest.scala:295`, and `execute at DeltaOptimizedWriterExec.scala:130`. While the UI lets me dig into more detail, it doesn't seem to specify anywhere what the *purpose* of each job is. Is there anywhere I can look to find out why Spark decided it needed to execute this particular 3 jobs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2025 10:37 AM
The spark jobs are decided based on your spark code. You can look at the spark plan to understand what operations each spark job/stage is executing

