I have a dataframe with abt 2 million text rows (1gb). I partition it into about 700 parititons as thats the no of cores available on my cluster exceutors. I run the transformations extracting medical information and then write the results in parquet...