@jack karthik What have you tried? Have you tried cast()?
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
df.select(df("colA").cast("string"))
Refer to the programming guide to see the algorithms available in MLlib:
http://spark.apache.org/docs/latest/ml-classification-regression.html
There is no KNN in MLlib, you might want to try another algorithm that's available.
What's the purpose of creating those smaller dataframes? Are you trying to write them out to separate files?
You could just use a filter command and filter by gender, and then generate random samples for each resulting dataframe if you need to.