Hi @BST, When running a Spark job in cluster mode, it involves a central manager (e.g., YARN, Mesos, Kubernetes), a driver program, and worker nodes. The driver program is submitted to the central manager, which allocates resources and decides where to run the driver. Worker nodes execute the tasks. Data location influences driver placement, as Spark may aim to put it on the same or nearby machines to minimize data transfer. The central manager makes these decisions based on resource availability and data locality.