@ashishCh
The [CANNOT_OPEN_SOCKET] failures stem from PySpark’s default, socket‑based data transfer path used when collecting rows back to Python (e.g., .collect(), .first(), .take()), where the local handshake to a JVM‑opened ephemeral port on 127.0.0.1 intermittently times out or is refused.
This can happen due to Spot Instance termination/ Executor unresponsiveness due to memory/CPU pressure etc.
To mitigate this error, can you add the following Spark Configuration to your Job Compute Clusters:
spark.databricks.pyspark.useFileBasedCollect true
This switches the data transfer mechanism from sockets to temporary files, thereby avoiding reliance on the local network layer.