cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Artem_Yevtushen
New Contributor III

Show all distinct values per column in dataframe

Problem Statement:

I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.

Use this code to show the output below:

%python
 
from pyspark.sql.functions import col, collect_set
 
distincts = df.agg(*(collect_set(col(c)).alias(c) for c in df.columns))
distincts.display()

collect set table

1 REPLY 1

Anonymous
Not applicable

@Artem Yevtushenko​ - This is great! Thank you for sharing!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.