SiddSingal
Databricks Employee
Databricks Employee

Hi Nithin,

You can use the DataFrame's randomSplit function. For example:

val df = sc.parallelize(1 to 10000).toDF("value") 
val splitDF = df.randomSplit(Array(1,1,1,1,1)) 
val (df1,df2,df3,df4,df5) = (splitDF(0),splitDF(1),splitDF(2),splitDF(3),splitDF(4))

The problem with this is that is does not exactly do perfectly even splits. This might not be a big concern to you, especially because you have many records. Would this be okay?

Sidd