Databricks Community

HansAdriaans · ‎09-18-2024

Hi, I'm running a databricks pipeline hourly using python notebooks checked out from git with on-demand compute (using r6gd.xlarge 32GB + 4 CPU's Gravaton). Most of the times the pipeline runs without problems. However, sometimes the first notebook fails with the error after starting the cluster

```Can not open socket: ["tried to connect to ('127.0.0.1', 36349), but an error occurred: [Errno 111] Connection refused"].```

This happens with the first interaction between the notebook and spark, running the command

```

km_per_nm = (

spark.read.table(assumptions_table)

.where(f.col("item") == "km_per_nm")

.collect()[0]

.asDict()

.get("value")

)

```

I know that this could be a sign of an OOM issue, but the result of that query is a single value from 1 row. The entire table is just 1.5 kb anyway and the driver has 32GiB of mem.

Can you guys help me with ideas where to look at ?