Retrieve a row from indexed spark data frame.

Orianh — Thu, 12 May 2022 08:30:00 GMT

Hello guys,

I'm having an issue when trying to get a row values from spark data frame.

I have a DF with index column, and i need to be able to return a row based on index in fastest way possible .

I tried to partitionBy index column, optimize with zorder on index column but its still take too much time to get a row. ( zorder didnt change a thing in table files)

to retrieve a row values take 0.5 sec ~ 4 sec

some code i tried:

row = df.where(df.index == x).collect()
--
row = df.where(df.index == x).take(1)

When using where does spark go on all indexes? or when its found the right one it skip all the rest?

If someone have better so i would like to know.

Thanks for you help !

topic Retrieve a row from indexed spark data frame. in Data Engineering

Retrieve a row from indexed spark data frame.