Anonymous
Not applicable
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2022 12:00 PM
Python code runs on the driver. Distributed/Spark code runs on the workers.
Here are some cluster tips:
If you're doing ML, then use an ML runtime.
If you're not doing distributed stuff, use a single node cluster.
Don't use autoscaling for ML.
For Deep Learning use GPUs
Try to size the cluster for the data size.