NhatHoang
Valued Contributor II

Hi,

I think you can follow these steps:

1. Use window function to create a new column by shifting, then your df will look like this

id value lag

1 A-B-C-D-E-F null

2 A-B-G-C-D-E-F A-B-C-D-E-F

3 A-B-G-D-E-F A-B-G-C-D-E-F

2. Use split() to convert string to array.

id value lag

1 ['A', 'B', 'C', 'D', 'E', 'F'] null

2 ['A', 'B', 'G', 'C', 'D', 'E', 'F'] ['A', 'B', 'C', 'D', 'E', 'F']

3 ['A', 'B', 'G', 'D', 'E', 'F'] ['A', 'B', 'G', 'C', 'D', 'E', 'F']

3. Create a column using array_except('value', 'lag') to find element in column 'value' but not in column 'lag'

4. Create a column using array_except('lag', 'value') to find element in column 'lag' but not in column 'value'

5. Then, you concat these two columns above, you will have the comparison.

Hope it works.

View solution in original post