Hello guys,
I'm having an issue when trying to get a row values from spark data frame.
I have a DF with index column, and i need to be able to return a row based on index in fastest way possible .
I tried to partitionBy index column, optimize with zorder on index column but its still take too much time to get a row. ( zorder didnt change a thing in table files)
to retrieve a row values take 0.5 sec ~ 4 sec
some code i tried:
row = df.where(df.index == x).collect()
--
row = df.where(df.index == x).take(1)
When using where does spark go on all indexes? or when its found the right one it skip all the rest?
If someone have better so i would like to know.
Thanks for you help !