Katalin555
New Contributor II

Hi @Alberto_Umana ,
Yes I checked and did not see any other information. We are using Driver: Standard_DS5_v2 · Workers: Standard_E16a_v4 · 1-6 workers, at the stage when the pipeline fails the shuffle information was :

  • Shuffle Read Size / Records: 257.1 MiB / 49459142
  • Shuffle Write Size / Records: 16.8 MiB / 1535990

5 tasks on 5 nodes succeed, then next task is tried 4 times on 4 different workers and fails with

ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Command exited with code 134

on all Memory utilization looks ok:

One example:

Katalin555_0-1740141827630.png