Databricks Community

Mado · ‎12-20-2022

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.

Example dataset:

# Prepare Data
data = [('A', 1), \
        ('A', 2), \
        ('B', 3)
  ]
 
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)

If I run the following code in Databricks:

In the output, I don't see if condition is met. If I create a pandas DataFrame:

import pandas as pd
pdf = pd.DataFrame(data, columns=columns)

I can check if condition is met for all rows:

How can I get the same output when working with Spark DataFrame?

UmaMahesh1 · ‎12-20-2022

Hi @Mohammad Saber

Since your output will be a column object you just need to use df for that.

The following will work.

df.select(df['col_1'] == 'A').show()

Hope this helps... Do mark it as an answer if it helps..

Cheers

Uma Mahesh D

View solution in original post

UmaMahesh1 · ‎12-20-2022

Hi @Mohammad Saber

Since your output will be a column object you just need to use df for that.

The following will work.

df.select(df['col_1'] == 'A').show()

Hope this helps... Do mark it as an answer if it helps..

Cheers

Uma Mahesh D

Mado · ‎12-20-2022

Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?

UmaMahesh1 · ‎12-20-2022

You can use a "when otherwise" and give the condition you want. It works similar to sql case when query.

E.g.

df.select(when(df['col_1'] == 'A', "Condition1").otherwise("Condition2")).show()

Uma Mahesh D

Mado · ‎12-20-2022

Thanks. I meant changing the column name. I tried:

df.select( df['col_1']=='A').alias('Condition').show()

But, it didn't work.

UmaMahesh1 · ‎12-20-2022

That alias needs to wrapped inside the select function to rename the column.

df.select((df['col_1'] == 'A').alias("Condition")).show()

Uma Mahesh D

Ajay-Pandey · ‎12-20-2022

Hi you can use display() or show() function that will provide you expected results.

Ajay Kumar Pandey

Databricks Community

How to see if condition is True / False for all rows in a DataFrame?

Connect with Databricks Users in Your Area

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs

Data + AI Summit: Call for Presentations

Season's Speedings: Databricks SQL Delivers 4x Performance Boost Over Two Years

Now Hiring: Databricks Community Technical Moderator

Become Our Next Monthly Community Champion!