cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?

RiyazAli
Contributor III

I'm reshaping my dataframe as per requirement and I came across this situation where I'm concatenating 2 dataframes and then transposing them. I've done this previously using pandas and the syntax for pandas goes as below:

import pandas as pd
 
df1 = pd.DataFrame(some_dict)
df2 = pd.DataFrame(some_dict)
 
new_df = pd.concat( [df1, df2], axis = "column") #stacking the dfs side by side
 
trans_df = new_df.transpose() or simply new_df.T

Is there a way I could do this in PySpark? Any leads would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Riyaz Ali​ , Here's a generic transpose method (  TransposeDF) that can transpose spark data frame. Click here to get complete details of the technique. Please let me know if it helps.

View solution in original post

6 REPLIES 6

RiyazAli
Contributor III

Hi @Kaniz Fatma​ ,

I no longer see the answer you've posted, but I see you were suggesting to use `union`. As per my understanding, union are used to stack the dfs one upon another with similar schema / column names.

In my situation, I have 2 different DataFrames with different columns (and schema) but same number of records. I want to stack them side by side.

For e.g: DF1 has 2 columns(a and b) and 10 rows and DF2 has 3 columns (x,y, and z) with same 10 rows. I want the resultant DataFrame to be with 10 rows and 5 columns (a,b,c,d and e).

Thank you 😀

Kaniz
Community Manager
Community Manager

Hi @Riyaz Ali​, This recipe helps you stack two DataFrames horizontally in Pyspark. Please let me know if that helps.

RiyazAli
Contributor III

Thanks @Kaniz Fatma​, this has solved half of the problem, the other half is that I need to Transpose the pyspark dataframe. Any help on this one?

Kaniz
Community Manager
Community Manager

Hi @Riyaz Ali​ , Here's a generic transpose method (  TransposeDF) that can transpose spark data frame. Click here to get complete details of the technique. Please let me know if it helps.

RiyazAli
Contributor III

Greatly appreciate the help @Kaniz Fatma​ !!

Eventhough I had to make multiple tweaks to the TransposeDF function, it gave me the idea to begin with. Thanks for the prompt response, I was able to wrap-up this issue with in this day!

Kaniz
Community Manager
Community Manager

@Riyaz Ali​ , Awesome!

We've sent you the certification coupon too. Please confirm on the thread.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.