Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2021 03:14 PM
The modeling algorithms in Spark MLlib will only accept a vectorized column as input. This is done for reasons of efficiency and scaling.
The vector assembler will express the features efficiently using techniques like spark vector, which allow a larger amount of data to be handled with less memory. This helps the modeling algorithms run efficiently even on large data columns.