Re: Code on cluster runs idefinitely

Yogasathyandrun · Sunday

The fact that print("Hello") eventually works but SELECT 1 never completes suggests the cluster may be running but not fully initialized for Spark workloads.

A few things I’d check first:

Cluster Event Log for any provisioning or startup errors.
Spark UI → Executors to confirm workers/executors are actually coming up.
Driver logs for startup exceptions or connectivity issues.
Whether this is a single-node cluster or a cluster with separate workers.

One other thing that stands out is the use of m4.large, which is a fairly old instance family. If possible, try spinning up a small cluster on a newer instance type (for example m5 or m6 generation) and see if the behavior changes.

Also, which Databricks Runtime version are you running, and does the issue occur immediately after cluster startup or only after the cluster has been idle for some time?

Data Engineer | Apache Spark | Delta Lake | Databricks