holly
Databricks Employee
Databricks Employee

Hi there, you can use a map function. Create a map with the creatively named create_map, and then sort by the values in the map.

The code will look sooooomething like this (although not tested this to take it as pseudo code)

from pyspark.sql.functions import create_map, lit, col

categories=['small', 'medium', 'large', 'xlarge']

map = create_map([val for (i, category_col) in enumerate(categories) for val in (category_col, lit(i))])

#gives <'map(small, 0, medium, 1, large, 2, xlarge, 3)'>


display(df.orderBy(map[col('category_col')]))