Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-10-2022 12:23 AM
Hi @Shivers Robert
Try to use something like that
import pyspark.sql.functions as F
def year_sum(year, column_year, column_sum):
return F.when(
F.col(column_year) == year, F.col(column_sum)
).otherwise(F.lit(None))
display(df.select(*[F.sum(year_sum(i, 'year', 'your_column_variable')).alias(str(i)) for i in [2018, 2019]]))
#### OR you can use the pivot method
display(df.groupby(F.lit('fake')).pivot('year').agg(F.sum('your_column_variable')).drop('fake'))let meknow if it works.