spark throws error while using [NOT_IMPLEMENTED] r...

mh7 · ‎01-07-2025

i am running code in 15.4lts and it works fine in all purpose cluster.

processed_counts = df.rdd.mapPartitions(process_partition).reduce(lambda x, y: x + y)

when i run the same code using job cluster, it throw's below error. I verfied the cluster setting and it is fine in both the case.

[NOT_IMPLEMENTED] rdd is not implemented.

--------------------------------------------------------------------------- PySparkNotImplementedError Traceback (most recent call last) line 150 147 print(f"after repartition {df.count()} rows.") 149 150 processed_counts = df.rdd.mapPartitions(process_partition).reduce(lambda x, y: x + y)

spark throws error while using [NOT_IMPLEMENTED] rdd is not implemented.