Sorted Data
If your data is sorted using either
sort()
or
ORDER BY
, these operations will be deterministic and return either the 1st element using first()/head() or the top-n using head(n)/take(n).
show()/show(n) return Unit (void) and will print up to the first 20 rows in a tabular form.
These operations may require a shuffle if there are any aggregations, joins, or sorts in the underlying query.
Unsorted Data
If the data is not sorted, these operations are not guaranteed to return the 1st or top-n elements - and a shuffle may not be required.
show()/show(n) return Unit (void) and will print up to 20 rows in a tabular form and in no particular order.
If no shuffle is required (no aggregations, joins, or sorts), these operations will be optimized to inspect enough partitions to satisfy the operation - likely a much smaller subset of the overall partitions of the dataset.