Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2021 01:45 PM
@John Constantine , "The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle." https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.collect_list.htm...
Generally using collect_list in production is not the best solution. Usually, there are other ways to achieve what is needed.
My blog: https://databrickster.medium.com/