Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-11-2021 01:40 AM
As a general best practice Spark is useful when it becomes difficult to process data on a single machine. For example, Python users love using pandas but when DataFrames start to approach the 1-10 million row mark processing on a single machine becomes difficult.
A great aspect about Spark on Databricks is that you can only use the compute that you need. So if you are working with a smaller dataset that is too big for a single machine you can spin up a cluster with 1-2 workers.