cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to see if condition is True / False for all rows in a DataFrame?

Mado
Valued Contributor II

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.

Example dataset:

# Prepare Data
data = [('A', 1), \
        ('A', 2), \
        ('B', 3)
  ]
 
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)

If I run the following code in Databricks:

image 

In the output, I don't see if condition is met. If I create a pandas DataFrame:

import pandas as pd
pdf = pd.DataFrame(data, columns=columns)

I can check if condition is met for all rows:

image 

How can I get the same output when working with Spark DataFrame?

1 ACCEPTED SOLUTION

Accepted Solutions

UmaMahesh1
Honored Contributor III

Hi @Mohammad Saber​ 

Since your output will be a column object you just need to use df for that.

The following will work.

df.select(df['col_1'] == 'A').show()

imageHope this helps... Do mark it as an answer if it helps..

Cheers

View solution in original post

6 REPLIES 6

UmaMahesh1
Honored Contributor III

Hi @Mohammad Saber​ 

Since your output will be a column object you just need to use df for that.

The following will work.

df.select(df['col_1'] == 'A').show()

imageHope this helps... Do mark it as an answer if it helps..

Cheers

Mado
Valued Contributor II

Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?

UmaMahesh1
Honored Contributor III

You can use a "when otherwise" and give the condition you want. It works similar to sql case when query.

E.g.

df.select(when(df['col_1'] == 'A', "Condition1").otherwise("Condition2")).show()

image

Mado
Valued Contributor II

Thanks. I meant changing the column name. I tried:

df.select( df['col_1']=='A').alias('Condition').show()

But, it didn't work.

UmaMahesh1
Honored Contributor III

That alias needs to wrapped inside the select function to rename the column.

df.select((df['col_1'] == 'A').alias("Condition")).show()

Ajay-Pandey
Esteemed Contributor III

Hi you can use display() or show() function that will provide you expected results.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!