PySpark column object not callable using "when otherwise" transformation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2022 01:46 PM
The very first "when" function results in the posted error message (see image). The print statement of the count of df_td_amm works. A printSchema of the "df_td_amm" data frame confirms that "AGE" is a column. A select statement is also successful, so wondering why the column isn't accessible in the "when" function. I've tried referencing the column df_td_amm.AGE and get the same error.
I've tried this on two other small dataframes not created using Pandas and encounter the same error. For two other multi-column dataframes, the "when" function works fine referencing the columns as expected.
import pandas as pd
import pyspark.sql
from pyspark.sql.functions import *
from pyspark.sql.types import *
months = list(range(26))
pdf = pd.DataFrame(months, columns = ['AGE'])
df_td_amm = spark.createDataFrame(pdf).cache()
print(df_td_amm.count())
df_age_bucket = df_td_amm \
.withColumn("COMPOUNDING_FREQ", \
when(col("AGE") <= 3, "00-03 Months")
.when(col("AGE") <= 6, "04-06 Months")
.when(col("AGE") <= 9, "07-09 Months")
.other("Other"))
display(df_age_bucket)
- Labels:
-
Pyspark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2022 04:22 AM
the syntax is when(....).otherwise(...), not other(...)
And there are some backslashes missing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-17-2022 01:56 PM
Hi @Karl Lacher,
Just a friendly follow-up. Did the response from Werner's help? if it did, please mark it as best. Otherwise, please let us know if you still need help.

