i am running code in 15.4lts and it works fine in all purpose cluster.
processed_counts = df.rdd.mapPartitions(process_partition).reduce(lambda x, y: x + y)
when i run the same code using job cluster, it throw's below error. I verfied the cluster setting and it is fine in both the case.
[NOT_IMPLEMENTED] rdd is not implemented.
--------------------------------------------------------------------------- PySparkNotImplementedError Traceback (most recent call last) line 150 147 print(f"after repartition {df.count()} rows.") 149 150 processed_counts = df.rdd.mapPartitions(process_partition).reduce(lambda x, y: x + y)