cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Retrieve a row from indexed spark data frame.

Orianh
Valued Contributor II

Hello guys,

I'm having an issue when trying to get a row values from spark data frame.

I have a DF with index column, and i need to be able to return a row based on index in fastest way possible .

I tried to partitionBy index column, optimize with zorder on index column but its still take too much time to get a row. ( zorder didnt change a thing in table files)

to retrieve a row values take 0.5 sec ~ 4 sec

some code i tried:

row = df.where(df.index == x).collect()
--
row = df.where(df.index == x).take(1)

When using where does spark go on all indexes? or when its found the right one it skip all the rest?

If someone have better so i would like to know.

Thanks for you help !

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.