How to deal with Slow Jobs?

Knowledge Sharing Hub

Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.

Definitely configure job timeouts, and configure notifications.

This will help you to identify slowness due to various factors.

It is crucial to also investigate and fix the issue causing the slowness.

The first step is to identify the problem. This can be done by comparing the run times of the same job at different instances.
The next step is to analyze the details of the job. This can be done by checking the SQL query plan, the read time, and the cloud storage request duration, etc from the Spark UI.
External factors such as storage and network can also affect the job run time. Checking Logs and some system level commands can give insights if the vm is still up and running.

While comparing the two runs (good and bad), try answering the following

Is this an intermittent failure or it degraded after a certain point of time.
If it is slow after a certain time consistently was there a DBR change?
Is the volume of the data the same?
Have the cluster configs changed/updated?

While comparing the DAGs and sql plan

Look for the stages that took most time.
Use filters and reduce data size.
Check the joins, metrics.

In the logs

You can check for errors, warnings.
Go to the timestamp when the stage was delayed.
Compare it with a run where it took the expected time.

Bonus tip - Enable speculative execution for tasks to re-run slow tasks in parallel. spark.speculation=true

Keep a look out for my new post for more tuning on a slow task/jobs.

0 REPLIES 0

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.