โ03-31-2016 01:53 PM
How can we compare two data frames using pyspark
I need to validate my output with another dataset
โ03-31-2016 03:22 PM
>>> df1.subtract(df2)
As per API Docs, it returns a new DataFrame containing rows in this frame but not in another frame.
This is equivalent to EXCEPT in SQL.
โ03-31-2016 03:22 PM
>>> df1.subtract(df2)
As per API Docs, it returns a new DataFrame containing rows in this frame but not in another frame.
This is equivalent to EXCEPT in SQL.
โ04-04-2016 06:38 AM
Its giving only the rows which or not in other data frame, Is there anything that validate all the column values in both the df
โ04-04-2016 10:20 AM
@Siddartha Paturuโ I If that is the case, I would recommend to do Join between two dataframes and then compare it for all columns
โ04-05-2016 08:36 AM
how can we compare the columns ?
โ07-20-2016 08:21 AM
recently I am also stuck with this situation. can somebody help me with how to compare columns in this scenario. @Siddartha Paturuโ please help me out with this if already found the solution. Thanks in advance.
โ09-20-2016 03:29 PM
I am stuck with the same issue.. Any new updates on this?
,Is there any solution to this problem??
โ09-24-2016 01:27 AM
Try using
all.equal
function.
It does not sort the dataframes but it checks each cell in
data frame
against the same cell in another one. You can also useidentical()
function.
I would like to share a link which may help to solve your problem https://goo.gl/pgLaEd
โ06-28-2018 06:53 AM
I think the best bet in such a case is to take inner join (equivalent to intersection) by putting a condition on those columns which necessarily need to have same value in both dataframes. For example,
let df1 and df2 are two dataframes. df1 has column (A,B,C) and df2 has columns (D,C,B), then you can create a new dataframe which would be the intersection of df1 and df2 conditioned on column B and C.
df3 = df1.join(df2, [df1.B == df2.B , df1.C == df2.C], how = 'inner' )
df3 will contain only those rows where the above condition is satisfied from df1 and df2.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group