Re: Cluster setup for ML work for Pandas in Spark,...

Anonymous · ‎01-21-2022

Python code runs on the driver. Distributed/Spark code runs on the workers.

Here are some cluster tips:

If you're doing ML, then use an ML runtime.

If you're not doing distributed stuff, use a single node cluster.

Don't use autoscaling for ML.

For Deep Learning use GPUs

Try to size the cluster for the data size.