I am trying to unpivot a PySpark DataFrame, but I don't get the correct results.
Sample dataset:
# Prepare Data
data = [("Spain", 101, 201, 301), \
("Taiwan", 102, 202, 302), \
("Italy", 103, 203, 303), \
("China", 104, 204, 304)
]
# Create DataFrame
columns= ["Country", "2018", "2019", "2002"]
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
Below is the code I have tried:
from pyspark.sql import functions as F
unpivotExpr = "stack(3, '2018', 2018, '2019', 2019, '2020', 2020) as (Year, CPI)"
unPivotDF = df.select("Country", F.expr(unpivotExpr))
unPivotDF.show()
And the results:
As you can see in the above image, value of column "CPI" is the same as column "Year" which is not expected.
Any idea to solve this issue?