How to read and optimize Physical plans in Spark to optimize for TBs and PBs of data workflows

praveenm00
Databricks Partner

One of the Amazon interviews I attended, which was for a Big data engineer asked me for this particular skill of reading and understanding physical plans in spark to optimize MASSIVE dataloads. But I though spark automatically does all these optimizations on its own with respect to optimizing plans using Adaptive query execution. Am I missing something? If so, how do I address this? Great if you folks had experience on the same and could share me some best resources.

 

Thank you!