12-20-2022 01:01 AM
Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.
Example dataset:
# Prepare Data
data = [('A', 1), \
('A', 2), \
('B', 3)
]
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
If I run the following code in Databricks:
In the output, I don't see if condition is met. If I create a pandas DataFrame:
import pandas as pd
pdf = pd.DataFrame(data, columns=columns)
I can check if condition is met for all rows:
How can I get the same output when working with Spark DataFrame?
12-20-2022 01:21 AM
Hi @Mohammad Saber
Since your output will be a column object you just need to use df for that.
The following will work.
df.select(df['col_1'] == 'A').show()
Hope this helps... Do mark it as an answer if it helps..
Cheers
12-20-2022 01:21 AM
12-20-2022 04:12 AM
Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?
12-20-2022 04:23 AM
12-20-2022 04:26 AM
Thanks. I meant changing the column name. I tried:
df.select( df['col_1']=='A').alias('Condition').show()
But, it didn't work.
12-20-2022 04:30 AM
That alias needs to wrapped inside the select function to rename the column.
df.select((df['col_1'] == 'A').alias("Condition")).show()
12-20-2022 03:57 AM
Hi you can use display() or show() function that will provide you expected results.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now