Re: Scaling issue for inference with a spark.mllib...

admo · ‎03-22-2022

The philosophy for the job would be something like this in Scala :

feature_dataset.foreachPartition { block =>
   block.grouped(10000).foreach { chunk =>
   run_inference_and_write_to_db(chunk)
}

Would you know how to do this with pyspark and rdds ?