python dataframe or hiveSql update based on predec...

as999 · ‎12-02-2021

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.

For example.

Original DF.

sno Object Name shape rating

1 Fruit apple round 1.0

2 Fruit apple round 2.0

3 Fruit apple square 2.5

4 Fruit orange round 1.5

```

Required Target DF.

```

sno Object Name shape rating

1 Fruit apple round 1.0

2 Fruit apple round 2.0

3 Fruit apple round 2.5 <-- automatically detect the difference in shape column and update from square to round

4 Fruit orange round 1.5

```

Pls advise, how to achieve it in databrick using either i.e pyspark or hiveSQL or scala

python dataframe or hiveSql update based on predecessor value?