Re: How to loop over spark dataframe with scala ?

Pierrek20 · ‎10-11-2018

Hello ! I 'm rookie to spark scala, here is my problem : tk's in advance for your help

my input dataframe looks like this :

index bucket time ap station rssi

0 1 00:00 1 1 -84.0

1 1 00:00 1 3 -67.0

2 1 00:00 1 4 -82.0

3 1 00:00 1 2 -68.0

4 1 00:00 2 5 -68.0

5 2 00:15 1 3 -83.0

6 2 00:15 1 2 -82.0

7 2 00:15 1 4 -80.0

8 2 00:15 1 1 -72.0

9 2 00:15 2 5 -72.0

10 3 00:30 1 4 -85.0

11 3 00:30 1 3 -77.0

12 3 00:30 1 2 -70.0

13 3 00:30 2 5 -70.0

I would like to write an algorithm to do this :

 for each ap 
  for each station 
     for each bucket 
        if rssi(previous bucket)<rssi(bucket)
        print message

i don't know how to do this in scala ...

my start is :

object coveralg {
    def main(args:Array[String]){
        val spark =SparkSession.builder().appName("coveralg").getOrCreate()
        import spark.implicits._
            val input_data =  spark.read.format("csv").option("header","true").load(args(0))
    }
}

but i don't know how to implement a loop over a dataframe and select values to do the if

Thank you for your answer

Anonymous · ‎12-12-2018

Hi - would you mind explaining for me what you'd like the code to do, I'm not sure I understand at the moment. After that I'll happily provide a suggestion as to what it might look like in Spark 🙂

Eve · ‎11-19-2019

Looping is not always necessary, I always use this foreach method, something like the following:

aps.collect().foreach(row => <do something>)