โ12-20-2022 01:01 AM
Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.
Example dataset:
# Prepare Data
data = [('A', 1), \
('A', 2), \
('B', 3)
]
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
If I run the following code in Databricks:
In the output, I don't see if condition is met. If I create a pandas DataFrame:
import pandas as pd
pdf = pd.DataFrame(data, columns=columns)
I can check if condition is met for all rows:
How can I get the same output when working with Spark DataFrame?
โ12-20-2022 01:21 AM
Hi @Mohammad Saberโ
Since your output will be a column object you just need to use df for that.
The following will work.
df.select(df['col_1'] == 'A').show()
Hope this helps... Do mark it as an answer if it helps..
Cheers
โ12-20-2022 01:21 AM
โ12-20-2022 04:12 AM
Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?
โ12-20-2022 04:23 AM
โ12-20-2022 04:26 AM
Thanks. I meant changing the column name. I tried:
df.select( df['col_1']=='A').alias('Condition').show()
But, it didn't work.
โ12-20-2022 04:30 AM
That alias needs to wrapped inside the select function to rename the column.
df.select((df['col_1'] == 'A').alias("Condition")).show()
โ12-20-2022 03:57 AM
Hi you can use display() or show() function that will provide you expected results.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.