12-20-2022 01:01 AM
Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.
Example dataset:
# Prepare Data
data = [('A', 1), \
('A', 2), \
('B', 3)
]
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
If I run the following code in Databricks:
In the output, I don't see if condition is met. If I create a pandas DataFrame:
import pandas as pd
pdf = pd.DataFrame(data, columns=columns)
I can check if condition is met for all rows:
How can I get the same output when working with Spark DataFrame?
12-20-2022 01:21 AM
Hi @Mohammad Saber
Since your output will be a column object you just need to use df for that.
The following will work.
df.select(df['col_1'] == 'A').show()
Hope this helps... Do mark it as an answer if it helps..
Cheers
12-20-2022 01:21 AM
12-20-2022 04:12 AM
Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?
12-20-2022 04:23 AM
12-20-2022 04:26 AM
Thanks. I meant changing the column name. I tried:
df.select( df['col_1']=='A').alias('Condition').show()
But, it didn't work.
12-20-2022 04:30 AM
That alias needs to wrapped inside the select function to rename the column.
df.select((df['col_1'] == 'A').alias("Condition")).show()
12-20-2022 03:57 AM
Hi you can use display() or show() function that will provide you expected results.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group