- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2022 01:01 AM
Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.
Example dataset:
# Prepare Data
data = [('A', 1), \
('A', 2), \
('B', 3)
]
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
If I run the following code in Databricks:
In the output, I don't see if condition is met. If I create a pandas DataFrame:
import pandas as pd
pdf = pd.DataFrame(data, columns=columns)
I can check if condition is met for all rows:
How can I get the same output when working with Spark DataFrame?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2022 01:21 AM
Hi @Mohammad Saber
Since your output will be a column object you just need to use df for that.
The following will work.
df.select(df['col_1'] == 'A').show()
Hope this helps... Do mark it as an answer if it helps..
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2022 01:21 AM
Hi @Mohammad Saber
Since your output will be a column object you just need to use df for that.
The following will work.
df.select(df['col_1'] == 'A').show()
Hope this helps... Do mark it as an answer if it helps..
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2022 04:12 AM
Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2022 04:23 AM
You can use a "when otherwise" and give the condition you want. It works similar to sql case when query.
E.g.
df.select(when(df['col_1'] == 'A', "Condition1").otherwise("Condition2")).show()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2022 04:26 AM
Thanks. I meant changing the column name. I tried:
df.select( df['col_1']=='A').alias('Condition').show()
But, it didn't work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2022 04:30 AM
That alias needs to wrapped inside the select function to rename the column.
df.select((df['col_1'] == 'A').alias("Condition")).show()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2022 03:57 AM
Hi you can use display() or show() function that will provide you expected results.

