Dashboard use case - order of bars
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-02-2024 04:56 AM
On a spark dataframe, is there any smart way to set the order of a categorical feature explicitly, equivalent to Categorical(ordered=list) in Pandas? The use case here is a dashboard in Databricks, and I want the bars to be arranged in certain order.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 05:59 AM - edited 09-04-2024 07:15 AM
Hi there, you can use a map function. Create a map with the creatively named create_map, and then sort by the values in the map.
The code will look sooooomething like this (although not tested this to take it as pseudo code)
from pyspark.sql.functions import create_map, lit, col
categories=['small', 'medium', 'large', 'xlarge']
map = create_map([val for (i, category_col) in enumerate(categories) for val in (category_col, lit(i))])#gives <'map(small, 0, medium, 1, large, 2, xlarge, 3)'> display(df.orderBy(map[col('category_col')]))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 06:32 AM
Thanks! One question, this code will order the whole dataframe based on the logic from create_map. However, I want to put on several figures, all with their own sorting logic, on display in a dashboard. I don' think this method will work for that use-case?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2024 07:25 AM
Ah, I think I see. Let's say your dataset has category_col1 with {S, M, L, XL} values, then category_col2 with {XS, S M} and you want to sort the data by category_col1 and category_col2.
If you want to specify the order for the user, you can duplicate the create_map step with and make map_1 and map_2 and then order by two columns. You can do this as part of your pipeline and save the results to your table so it's not only available as part of the dataframe.
BUT
If you want the end user to be able to sort the end Databricks visualisation / table by clicking values that's something we don't have at the moment. I think it's a sensible ask so I'll raise this with our BI team.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2024 07:27 AM
Thanks for your effort!