Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2016 03:01 PM
Hi Nithin,
You can use the DataFrame's randomSplit function. For example:
val df = sc.parallelize(1 to 10000).toDF("value")
val splitDF = df.randomSplit(Array(1,1,1,1,1))
val (df1,df2,df3,df4,df5) = (splitDF(0),splitDF(1),splitDF(2),splitDF(3),splitDF(4))
The problem with this is that is does not exactly do perfectly even splits. This might not be a big concern to you, especially because you have many records. Would this be okay?
Sidd