Thank you for your reply @shan_chandra . I looked at this code and tried doing the same thing. The cluster uses 2 nodes at most, even though there's 60 available. I believe the advantage of using Databricks is to use the distributed compute method, but I'm not sure how to effectively use it.