cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to improve Spark UI Job Description for pyspark?

igorgatis
New Contributor II

I find it quite hard to understand Spark UI for my pyspark pipelines. For example, when one writes `spark.read.table("sometable").show()` it shows:

igorgatis_0-1697034219608.png

I learned that `DataFrame` API actually may spawn jobs before running the actual job. In the example above, job 15 collects data which is used in job 16. In both cases, the description gives no clue on what is going on.

Clicking on job 15 link, it shows a stage that looks like this:

igorgatis_1-1697034492125.png

Whose link leads to:

igorgatis_2-1697034528335.png

The job 16 is quite similar though it mentions the table name. Things get messier when DAG gets more complex.

Is there a recommended way to improve this? I'm aware of `setJobDescription`, `setLocalProperty` (with `callSite.short` and `callSite.long` but dealing with them directly is also not easy.

 

1 REPLY 1

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @igorgatis,

A polite reminder. Have you had a chance to review my colleague's reply? Please inform us if it contributes to resolving your query.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group