Databricks Community

satya · ‎09-08-2016

like in pandas I usually do df['columnname'].unique()

raela · ‎04-04-2017

df.select("columnname").distinct().show()

AbhishekYada · ‎12-16-2018

this code returns data that's not iterable, i.e. I see the distinct data bit am not able to iterate over it in code. Any other way that enables me to do it. I tried using toPandas() to convert in it into Pandas df and then get the iterable with unique values. However, running into '' Pandas not found' error message. How can I install Pandas i my pyspark env, if my local already has Pandas running!

AbimaelDomingue · ‎08-06-2021

If you just want to print the results and not use the results for other processing, this is the way to go.

ShuminWu · ‎06-14-2017

Hi, tried using .distinct().show() as advised, but am getting the error TypeError: 'DataFrame' object is not callable.

The dataframe was read in from a csv file using spark.read.csv, other functions like describe works on the df. any reason for this? how should I go about retrieving the list of unique values in this case?

sorry if question is very basic. noob at this. Thanks!

Rodneyjoyce · ‎04-22-2019

To get the count of the distinct values:

df.select(F.countDistinct("colx")).show()

Or to count the number of records for each distinct value:

df.groupBy("colx").count().orderBy().show()

AnujGupta · ‎11-22-2020

Thanks for this. The latter worked well for me. However, sorry for my ignorance here but what is F in the first one? The code works without the F. !The latter worked well for me.

ldfo · ‎07-01-2020

Hi, this worked for me.

distinct_ids = [x.id for x in data.select('id').distinct().collect()]

Ger_Martinez · ‎09-02-2020

nice way, also very "pythonic" minded

AbimaelDomingue · ‎08-06-2021

If you want to use the values to make some processing, this is the way to go.

AbimaelDomingue · ‎08-06-2021

If you want to use the results for some other data processing this is the way to go.

Databricks Community

how to get unique values of a column in pyspark dataframe

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!