cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

[CANNOT_OPEN_SOCKET] Can not open socket: ["tried to connect to ('127.0.0.1', 45287)

timo82
Visitor

Hello,

after databricks update the Runtime from Release: 15.4.24 to Release: 15.4.25 we getting in all jobs the Error:

[CANNOT_OPEN_SOCKET] Can not open socket: ["tried to connect to ('127.0.0.1', 45287)

What we can do here?

Greetings

2 REPLIES 2

Advika
Databricks Employee
Databricks Employee

Hello @timo82!

Can you try adding 'spark.databricks.pyspark.useFileBasedCollect': 'true' to your Spark config?

Vasireddy
Contributor

Hey @timo82,

This error indicates Python workers cannot communicate with the JVM after the maintenance update. Since it's affecting all jobs after upgrading to 15.4.25.

try these steps:

--> Completely restart the cluster (stop then start, not just restart) to reinitialize socket listeners
--> Check init scripts, Temporarily remove any cluster init scripts and test if jobs succeed without them, as maintenance updates can introduce incompatibilities
--> Review Spark configurations - Check driver logs for deprecated or conflicting Spark configs that may have changed between 15.4.24 and 15.4.25

Code workarounds:
--> Add warmup operations, Insert a simple operation like df.limit(1).collect() at the start of your jobs before the main processing to establish the connection
--> Implement retry logic, Wrap initial Spark actions in try-catch blocks, as socket errors can be transient during startup

The code workarounds help address the timing and initialization issues that cause the socket error between Python workers and the JVM.

If still failing:
--> Check cluster access mode,Verify you're using the appropriate access mode (Shared or Single User) for your workload
--> Increase cluster resources, Scale up memory if errors are intermittent under load
--> Roll back to 15.4.24, If blocking production, temporarily revert while investigating further
--> Contact Databricks support, Since this affects all jobs after a maintenance update, there may be a regression in 15.4.25

 

harisankar

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now