Databricks Community

mrstevegross · ‎01-31-2025

I'm analyzing the performance of a DBR/Spark request. In this case, the cluster is created using a custom image, and then we run a job on it.

I've dived into the "Spark UI" part of the DBR interface, and identified 3 jobs that appear to account for an outsized amount of execution time: `write at WriteIntoDeltaCommand.scala:85`, `collect at GenerateSymlinkManifest.scala:295`, and `execute at DeltaOptimizedWriterExec.scala:130`. While the UI lets me dig into more detail, it doesn't seem to specify anywhere what the *purpose* of each job is. Is there anywhere I can look to find out why Spark decided it needed to execute this particular 3 jobs?

Lakshay · ‎01-31-2025

The spark jobs are decided based on your spark code. You can look at the spark plan to understand what operations each spark job/stage is executing

Databricks Community

How to identify the goal of a specific Spark job?

Photos

Connect with Databricks Users in Your Area

Data + AI Summit 2025 — registration now open!

Women’s Week Challenge: Play, Engage & Win Swag

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Databricks DevConnect: Global Community Meetups for Data Engineers

Databricks Community Champion - February 2025 - Stefan Koch