miklos
Databricks Employee
Databricks Employee

Looks like the following property is pretty high, which consumes a lot of memory on your executors when you cache the dataset.

"spark.storage.memoryFraction:0.9"

This could likely be solved by changing the configuration. Take a look at the upstream tuning docs:

http://spark.apache.org/docs/latest/tuning.html