12-12-2022 09:07 PM
I am trying to unpivot a PySpark DataFrame, but I don't get the correct results.
Sample dataset:
# Prepare Data
data = [("Spain", 101, 201, 301), \
("Taiwan", 102, 202, 302), \
("Italy", 103, 203, 303), \
("China", 104, 204, 304)
]
# Create DataFrame
columns= ["Country", "2018", "2019", "2002"]
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)
Below is the code I have tried:
from pyspark.sql import functions as F
unpivotExpr = "stack(3, '2018', 2018, '2019', 2019, '2020', 2020) as (Year, CPI)"
unPivotDF = df.select("Country", F.expr(unpivotExpr))
unPivotDF.show()
And the results:
As you can see in the above image, value of column "CPI" is the same as column "Year" which is not expected.
Any idea to solve this issue?
12-12-2022 11:18 PM
Hi @Mohammad Saber
The issue is because the column name is similar to a literal value and it is taking that constant value for all the keys provided.
To avoid this you can give more proper column names like below.
Else if you have a way where you can explicitly say to spark that 2018 is the column value and not a literal, you can try that too..
Hope this helps...
Cheers
12-12-2022 11:18 PM
Hi @Mohammad Saber
The issue is because the column name is similar to a literal value and it is taking that constant value for all the keys provided.
To avoid this you can give more proper column names like below.
Else if you have a way where you can explicitly say to spark that 2018 is the column value and not a literal, you can try that too..
Hope this helps...
Cheers
08-12-2023 04:56 AM
on a another dummie example i can't reproduce this error, is there an explanation as to why this happens?
12-08-2023 12:32 AM
If i have columns names as below how can i get data
05-02-2024 05:40 PM
You can also use backticks around the column names that would otherwise be recognised as numbers.
from pyspark.sql import functions as F unpivotExpr = "stack(3, '2018', `2018`, '2019', `2019`, '2020', `2020`) as (Year, CPI)" unPivotDF = df.select("Country", F.expr(unpivotExpr)) unPivotDF.show()
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group