What is the difference between DataFrame.first(), head(), head(n), and take(n), show(), show(n)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2015 02:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2015 02:48 PM
Sorted Data
If your data is sorted using either
sort() or ORDER BY, these operations will be deterministic and return either the 1st element using first()/head() or the top-n using head(n)/take(n).show()/show(n) return Unit (void) and will print up to the first 20 rows in a tabular form.
These operations may require a shuffle if there are any aggregations, joins, or sorts in the underlying query.
Unsorted Data
If the data is not sorted, these operations are not guaranteed to return the 1st or top-n elements - and a shuffle may not be required.
show()/show(n) return Unit (void) and will print up to 20 rows in a tabular form and in no particular order.
If no shuffle is required (no aggregations, joins, or sorts), these operations will be optimized to inspect enough partitions to satisfy the operation - likely a much smaller subset of the overall partitions of the dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2026 12:48 PM
These are action methods that return data -
first() : Returns the very first row of the dataframe as a single row.
head() : This does the same as first(), returns the first row
head(n): Returns an array or list of the first n rows
take(n): Similar to head(n), it retrieves the first n rows and returns them as an array
These action items display data-
show(): Prints the first 20 rows in a tabular format
show(n): Prints the first n rows in a tabular format