Pyspark dataframe column comparison

UmaMahesh1
Honored Contributor III

I have a string column which is a concatenation of elements with a hyphen as follows. Let 3 values from that column looks like below,

Row 1 - A-B-C-D-E-F

Row 2 - A-B-G-C-D-E-F

Row 3 - A-B-G-D-E-F

I want to compare 2 consecutive rows and create a column with what has changed. Specifically, 4 comparisons

  1. if first element changed
  2. last element changed
  3. elements Added when taking all except first and last
  4. elements removed when taking all except first and last

So my output will look like this

Row1 ; null

Row2 : G added

Row3 : C Removed

any ideas/suggestions ?

Uma Mahesh D