cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Using "Select Expr" and "Stack" to Unpivot PySpark DataFrame doesn't produce expected results

Mado
Valued Contributor II

I am trying to unpivot a PySpark DataFrame, but I don't get the correct results.

Sample dataset:

# Prepare Data
data = [("Spain", 101, 201, 301), \
        ("Taiwan", 102, 202, 302), \
        ("Italy", 103, 203, 303), \
        ("China", 104, 204, 304)
  ]
 
# Create DataFrame
columns= ["Country", "2018", "2019", "2002"]
df = spark.createDataFrame(data = data, schema = columns)
df.show(truncate=False)

image 

Below is the code I have tried:

from pyspark.sql import functions as F
 
unpivotExpr = "stack(3, '2018', 2018, '2019', 2019, '2020', 2020) as (Year, CPI)"
unPivotDF = df.select("Country", F.expr(unpivotExpr))
unPivotDF.show()

And the results:

imageAs you can see in the above image, value of column "CPI" is the same as column "Year" which is not expected.

Any idea to solve this issue?

1 ACCEPTED SOLUTION

Accepted Solutions

UmaMahesh1
Honored Contributor III

Hi @Mohammad Saber​ 

The issue is because the column name is similar to a literal value and it is taking that constant value for all the keys provided.

To avoid this you can give more proper column names like below.

imageElse if you have a way where you can explicitly say to spark that 2018 is the column value and not a literal, you can try that too..

Hope this helps...

Cheers

View solution in original post

3 REPLIES 3

UmaMahesh1
Honored Contributor III

Hi @Mohammad Saber​ 

The issue is because the column name is similar to a literal value and it is taking that constant value for all the keys provided.

To avoid this you can give more proper column names like below.

imageElse if you have a way where you can explicitly say to spark that 2018 is the column value and not a literal, you can try that too..

Hope this helps...

Cheers

arggg
New Contributor II

on a another dummie example i can't reproduce this error, is there an explanation as to why this happens?

srinu1246
New Contributor II

If i have columns names as below how can i get data

 

unpivottest = "stack(2,'Turnover (Sas  m)',Turnover (Sas  m),'abc %', abc %) as (kpi_name, kpi_value)"                        
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.