cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to see if condition is True / False for all rows in a DataFrame?

Mado
Valued Contributor II

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.

Example dataset:

# Prepare Data
data = [('A', 1), \
        ('A', 2), \
        ('B', 3)
  ]
 
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)

If I run the following code in Databricks:

image 

In the output, I don't see if condition is met. If I create a pandas DataFrame:

import pandas as pd
pdf = pd.DataFrame(data, columns=columns)

I can check if condition is met for all rows:

image 

How can I get the same output when working with Spark DataFrame?

1 ACCEPTED SOLUTION

Accepted Solutions

UmaMahesh1
Honored Contributor III

Hi @Mohammad Saber​ 

Since your output will be a column object you just need to use df for that.

The following will work.

df.select(df['col_1'] == 'A').show()

imageHope this helps... Do mark it as an answer if it helps..

Cheers

View solution in original post

6 REPLIES 6

UmaMahesh1
Honored Contributor III

Hi @Mohammad Saber​ 

Since your output will be a column object you just need to use df for that.

The following will work.

df.select(df['col_1'] == 'A').show()

imageHope this helps... Do mark it as an answer if it helps..

Cheers

Mado
Valued Contributor II

Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?

UmaMahesh1
Honored Contributor III

You can use a "when otherwise" and give the condition you want. It works similar to sql case when query.

E.g.

df.select(when(df['col_1'] == 'A', "Condition1").otherwise("Condition2")).show()

image

Mado
Valued Contributor II

Thanks. I meant changing the column name. I tried:

df.select( df['col_1']=='A').alias('Condition').show()

But, it didn't work.

UmaMahesh1
Honored Contributor III

That alias needs to wrapped inside the select function to rename the column.

df.select((df['col_1'] == 'A').alias("Condition")).show()

Ajay-Pandey
Esteemed Contributor III

Hi you can use display() or show() function that will provide you expected results.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.