python dataframe or hiveSql update based on predecessor value?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2021 05:54 AM
I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.
For example.
Original DF.
sno Object Name shape rating
1 Fruit apple round 1.0
2 Fruit apple round 2.0
3 Fruit apple square 2.5
4 Fruit orange round 1.5
```
Required Target DF.
```
sno Object Name shape rating
1 Fruit apple round 1.0
2 Fruit apple round 2.0
3 Fruit apple round 2.5 <-- automatically detect the difference in shape column and update from square to round
4 Fruit orange round 1.5
```
Pls advise, how to achieve it in databrick using either i.e pyspark or hiveSQL or scala
Labels:
- Labels:
-
Pyspark
-
Python Dataframe
-
Scala