How to identify the goal of a specific Spark job?

mrstevegross — Fri, 31 Jan 2025 15:33:54 GMT

I'm analyzing the performance of a DBR/Spark request. In this case, the cluster is created using a custom image, and then we run a job on it.

I've dived into the "Spark UI" part of the DBR interface, and identified 3 jobs that appear to account for an outsized amount of execution time: `write at WriteIntoDeltaCommand.scala:85`, `collect at GenerateSymlinkManifest.scala:295`, and `execute at DeltaOptimizedWriterExec.scala:130`. While the UI lets me dig into more detail, it doesn't seem to specify anywhere what the *purpose* of each job is. Is there anywhere I can look to find out why Spark decided it needed to execute this particular 3 jobs?

Re: How to identify the goal of a specific Spark job?

Lakshay — Fri, 31 Jan 2025 18:37:25 GMT

The spark jobs are decided based on your spark code. You can look at the spark plan to understand what operations each spark job/stage is executing

topic How to identify the goal of a specific Spark job? in Get Started Discussions

How to identify the goal of a specific Spark job?

Re: How to identify the goal of a specific Spark job?