cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

[CANNOT_OPEN_SOCKET] Can not open socket: ["tried to connect to ('127.0.0.1', 45287)

timo82
New Contributor

Hello,

after databricks update the Runtime from Release: 15.4.24 to Release: 15.4.25 we getting in all jobs the Error:

[CANNOT_OPEN_SOCKET] Can not open socket: ["tried to connect to ('127.0.0.1', 45287)

What we can do here?

Greetings

1 ACCEPTED SOLUTION

Accepted Solutions

Vasireddy
Contributor II

yes, exactly, Changing from 15.4.x-scala2.12 to 15.4.24-scala2.12 will pin your cluster to the 15.4.24 patch and prevent it from auto-upgrading to the problematic 15.4.25 version.

harisankar

View solution in original post

7 REPLIES 7

Advika
Databricks Employee
Databricks Employee

Hello @timo82!

Can you try adding 'spark.databricks.pyspark.useFileBasedCollect': 'true' to your Spark config?

Vasireddy
Contributor II

Hey @timo82,

This error indicates Python workers cannot communicate with the JVM after the maintenance update. Since it's affecting all jobs after upgrading to 15.4.25.

try these steps:

--> Completely restart the cluster (stop then start, not just restart) to reinitialize socket listeners
--> Check init scripts, Temporarily remove any cluster init scripts and test if jobs succeed without them, as maintenance updates can introduce incompatibilities
--> Review Spark configurations - Check driver logs for deprecated or conflicting Spark configs that may have changed between 15.4.24 and 15.4.25

Code workarounds:
--> Add warmup operations, Insert a simple operation like df.limit(1).collect() at the start of your jobs before the main processing to establish the connection
--> Implement retry logic, Wrap initial Spark actions in try-catch blocks, as socket errors can be transient during startup

The code workarounds help address the timing and initialization issues that cause the socket error between Python workers and the JVM.

If still failing:
--> Check cluster access mode,Verify you're using the appropriate access mode (Shared or Single User) for your workload
--> Increase cluster resources, Scale up memory if errors are intermittent under load
--> Roll back to 15.4.24, If blocking production, temporarily revert while investigating further
--> Contact Databricks support, Since this affects all jobs after a maintenance update, there may be a regression in 15.4.25

 

harisankar

Thx for your details.

How we can roll back to 15.4.24?

We config the cluster type only at an yaml, not the running time version.

 

 job_clusters:
- job_cluster_key: default
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: Standard_D64s_v3
autoscale:
min_workers: 1
max_workers: 5
enable_elastic_disk: true
data_security_mode: SINGLE_USER
spark_conf:
spark.databricks.pip.ignoreSSL: true
spark.sql.inMemoryColumnarStorage.compressed: true
spark.sql.adaptive.enabled: true
spark.sql.adaptive.coalescePartitions.enabled: true
spark.databricks.delta.schema.autoMerge.enabled: true
spark.databricks.adaptive.autoOptimizeShuffle.enabled: true
spark.executor.heartbeatInterval: 300000
spark.network.timeout: 320000
spark.sql.codegen: true
 

Greetings

timo82
New Contributor

spark_version: 15.4.x-scala2.12

to

spark_version: 15.4.24-scala2.12 

Correct?

Vasireddy
Contributor II

yes, exactly, Changing from 15.4.x-scala2.12 to 15.4.24-scala2.12 will pin your cluster to the 15.4.24 patch and prevent it from auto-upgrading to the problematic 15.4.25 version.

harisankar

Hansjoerg
New Contributor II

@Vasireddy 
Using Bundles doesn't seem to allow to provide a fixed patch version:

Error: cannot update job: INVALID_PARAMETER_VALUE: Invalid spark version 15.4.24-scala2.12.
  with databricks_job.pdv-partnerbul-dbxservice-housekeeping,
  on bundle.tf

Vasireddy
Contributor II

Hi @Hansjoerg,

Apologies for the confusion earlier. You are right Bundles doesn't allow pinning to specific patch versions like 15.4.24.

Your best option is to skip Bundles for now and use the regular Databricks Jobs setup (via UI or Jobs API) where you can specify exactly 15.4.24-scala2.12
to avoid the broken 15.4.25 version.

This will let you roll back to the working version while Databricks fixes the socket issue in 15.4.25.

harisankar

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now