1. I have alot of data transformation which result in df1, then df1 is starting point for 10 different transformations path. At first I tried to use .cache() and .count() on df1 but it was very slow. I changed caching to saving df1 as delta table and it improved performence significantly. I thought that cache is better approach so what could be the issue there? I checked and I have io cache enabled in spark conf.
2. What is the most efficient way to read small (less than 10 mb) excel file? Is it better to use pandas and transform it to spark df or use library like crealytics?