Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-13-2021 05:45 PM
Show all distinct values per column in dataframe
Problem Statement:
I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.
Use this code to show the output below:
%python
from pyspark.sql.functions import col, collect_set
distincts = df.agg(*(collect_set(col(c)).alias(c) for c in df.columns))
distincts.display()
Labels:
- Labels:
-
Collect_set
-
Distinct Values
-
Pyspark
-
Values
1 REPLY 1
Anonymous
Not applicable
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-14-2021 10:25 AM
@Artem Yevtushenkoโ - This is great! Thank you for sharing!

