I know this question/topic is not very specific, but perhaps it asking the question would be useful for people other than me.
I am a newbie to Spark, and while I've been able to get my current model training and data transformations running, they are taking awfully long, and there are conditions that feel symptomatic of Spark not (yet) being properly optimized (by me) for what I'm doing (e.g. oftentimes there are executors sitting idle, often the last few tasks take forever compared to the first 99%, and other assorted issues).
Where is the best place to go to learn how to diagnose and fix Spark performance issues? I'm relatively confident that what I'm experiencing is not related to Databricks and based on my preliminary research it seems like Spark's performance can vary a LOT depending on whether it's been tuned properly for the use case at hand; I just don't know what the best/fastest approach is to becoming a Spark whisperer :-).