โ12-20-2022 01:01 AM
Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.
Example dataset:
# Prepare Data
data = [('A', 1), \
('A', 2), \
('B', 3)
]
# Create DataFrame
columns= ['col_1', 'col_2']
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
If I run the following code in Databricks:
In the output, I don't see if condition is met. If I create a pandas DataFrame:
import pandas as pd
pdf = pd.DataFrame(data, columns=columns)
I can check if condition is met for all rows:
How can I get the same output when working with Spark DataFrame?
โ12-20-2022 01:21 AM
Hi @Mohammad Saberโ
Since your output will be a column object you just need to use df for that.
The following will work.
df.select(df['col_1'] == 'A').show()
Hope this helps... Do mark it as an answer if it helps..
Cheers
โ12-20-2022 01:21 AM
โ12-20-2022 04:12 AM
Thanks. How can I change the column name "(col_1=A)" to e.g. "Condition"?
โ12-20-2022 04:23 AM
โ12-20-2022 04:26 AM
Thanks. I meant changing the column name. I tried:
df.select( df['col_1']=='A').alias('Condition').show()
But, it didn't work.
โ12-20-2022 04:30 AM
That alias needs to wrapped inside the select function to rename the column.
df.select((df['col_1'] == 'A').alias("Condition")).show()
โ12-20-2022 03:57 AM
Hi you can use display() or show() function that will provide you expected results.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group