cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

python dataframe or hiveSql update based on predecessor value?

as999
New Contributor III

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.  

For example.

Original DF.

sno Object Name  shape  rating

1  Fruit apple round  1.0

2  Fruit apple round  2.0

3  Fruit apple square 2.5

4  Fruit orange round  1.5

```

Required Target DF.

```

sno Object Name  shape rating

1  Fruit apple round 1.0

2  Fruit apple round 2.0

3  Fruit apple round 2.5 <-- automatically detect the difference in shape column and update from square to round

4  Fruit orange round 1.5

```

Pls advise, how to achieve it in databrick using either i.e pyspark or hiveSQL or scala

3 REPLIES 3

-werners-
Esteemed Contributor III

so you want to determine the max number of occurances for a group key?

That is easy: create a df: df:groupBy(Object, Name, Shape).agg(count("*"))

Then join this df with the original and replace the original shape column.

as999
New Contributor III

thanks for reply, can you please elaborate how to join with original and replace the shape column?

-werners-
Esteemed Contributor III

basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().

Then you have to determine which shape has the max(count) for a object/name combo.

can also be done using groupby or window.

Finally you filter on this max et voila.

If you use window functions you can avoid a join I think (doin this out of my head).

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.