Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 05:59 AM - edited 09-04-2024 07:15 AM
Hi there, you can use a map function. Create a map with the creatively named create_map, and then sort by the values in the map.
The code will look sooooomething like this (although not tested this to take it as pseudo code)
from pyspark.sql.functions import create_map, lit, col
categories=['small', 'medium', 'large', 'xlarge']
map = create_map([val for (i, category_col) in enumerate(categories) for val in (category_col, lit(i))])
#gives <'map(small, 0, medium, 1, large, 2, xlarge, 3)'>
display(df.orderBy(map[col('category_col')]))