Databricks Community

ashishCh · 11 hours ago

This error pops up in my Databricks workflow 1 out of 10 times, and everytime it occurs I see the below message in event logs.
Compute upsize complete, but below target size. The current worker count is 1, out of a target of 3.

And right after this my job cluster terminates with the socket error message.

These are my cluster configs, if required.

Coffee77 · 10 hours ago

Difficult to know but maybe it has to do with usage of spot instances as it seems root cause is kind of random. In theory spot instances can be terminated at any time by cloud provider if it needs the capacity back, BUT databricks should handle this fact correctly to replace lost spot workers or apply resilient policies to avoid that type of errors.

So, I can't ensure that is your issue. However, you can try to disable that option for a given time taking into account that costs will be a little higher. In anycase, don't use "spot" instances in PROD unless your workloads can afford breaks.

Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

iyashk-DB · 5 hours ago

@ashishCh

The [CANNOT_OPEN_SOCKET] failures stem from PySpark’s default, socket‑based data transfer path used when collecting rows back to Python (e.g., .collect(), .first(), .take()), where the local handshake to a JVM‑opened ephemeral port on 127.0.0.1 intermittently times out or is refused.

This can happen due to Spot Instance termination/ Executor unresponsiveness due to memory/CPU pressure etc.

To mitigate this error, can you add the following Spark Configuration to your Job Compute Clusters:
spark.databricks.pyspark.useFileBasedCollect true

This switches the data transfer mechanism from sockets to temporary files, thereby avoiding reliance on the local network layer.

Databricks Community

Facing CANNOT_OPEN_SOCKET error after job cluster fails to upsacle to target nodes

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples