Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2022 07:21 PM
I'm reshaping my dataframe as per requirement and I came across this situation where I'm concatenating 2 dataframes and then transposing them. I've done this previously using pandas and the syntax for pandas goes as below:
import pandas as pd
df1 = pd.DataFrame(some_dict)
df2 = pd.DataFrame(some_dict)
new_df = pd.concat( [df1, df2], axis = "column") #stacking the dfs side by side
trans_df = new_df.transpose() or simply new_df.T
Is there a way I could do this in PySpark? Any leads would be greatly appreciated.
- Labels:
-
Dataframes
-
Pyspark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2022 11:45 PM
Hi @Kaniz Fatma ,
I no longer see the answer you've posted, but I see you were suggesting to use `union`. As per my understanding, union are used to stack the dfs one upon another with similar schema / column names.
In my situation, I have 2 different DataFrames with different columns (and schema) but same number of records. I want to stack them side by side.
For e.g: DF1 has 2 columns(a and b) and 10 rows and DF2 has 3 columns (x,y, and z) with same 10 rows. I want the resultant DataFrame to be with 10 rows and 5 columns (a,b,c,d and e).
Thank you 😀
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-07-2022 01:15 AM
Thanks @Kaniz Fatma, this has solved half of the problem, the other half is that I need to Transpose the pyspark dataframe. Any help on this one?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-07-2022 03:32 AM
Greatly appreciate the help @Kaniz Fatma !!
Eventhough I had to make multiple tweaks to the TransposeDF function, it gave me the idea to begin with. Thanks for the prompt response, I was able to wrap-up this issue with in this day!

