ShankarReddy
New Contributor II

I hope this should work

JavaPairRDD<String, PortableDataStream> jrdd = javaSparkContext.binaryFiles("<path_to_file>");
Map<String, PortableDataStream> mp = jrdd.collectAsMap();
OutputStream os = new FileOutputStream(f);
mp.values().forEach(pd -> {
try {
os.write(pd.toArray());
} catch (IOException e) {
throw new RuntimeException(e);
}
});
os.flush();

And then supplying file to jaxb unmarshaller. Not sure if there is a better way.