cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to see if condition is True / False for all rows in a DataFrame?

Mado
Valued Contributor II

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.

Example dataset:

# Prepare Data
data = [('A', 1), \
        ('A', 2), \
        ('B', 3)
  ]
 
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)

If I run the following code in Databricks:

image 

In the output, I don't see if condition is met. If I create a pandas DataFrame:

import pandas as pd
pdf = pd.DataFrame(data, columns=columns)

I can check if condition is met for all rows:

image 

How can I get the same output when working with Spark DataFrame?

1 ACCEPTED SOLUTION

Accepted Solutions

UmaMahesh1
Honored Contributor III

Hi @Mohammad Saber​ 

Since your output will be a column object you just need to use df for that.

The following will work.

df.select(df['col_1'] == 'A').show()

imageHope this helps... Do mark it as an answer if it helps..

Cheers

Uma Mahesh D

View solution in original post

6 REPLIES 6

UmaMahesh1
Honored Contributor III

Hi @Mohammad Saber​ 

Since your output will be a column object you just need to use df for that.

The following will work.

df.select(df['col_1'] == 'A').show()

imageHope this helps... Do mark it as an answer if it helps..

Cheers

Uma Mahesh D

Mado
Valued Contributor II

Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?

UmaMahesh1
Honored Contributor III

You can use a "when otherwise" and give the condition you want. It works similar to sql case when query.

E.g.

df.select(when(df['col_1'] == 'A', "Condition1").otherwise("Condition2")).show()

image

Uma Mahesh D

Mado
Valued Contributor II

Thanks. I meant changing the column name. I tried:

df.select( df['col_1']=='A').alias('Condition').show()

But, it didn't work.

UmaMahesh1
Honored Contributor III

That alias needs to wrapped inside the select function to rename the column.

df.select((df['col_1'] == 'A').alias("Condition")).show()

Uma Mahesh D

Ajay-Pandey
Esteemed Contributor III

Hi you can use display() or show() function that will provide you expected results.

Ajay Kumar Pandey

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group