Louis_Frolio
Databricks Employee
Databricks Employee

@excavator-matt I’d recommend a quick refresher on the Pandas API on Spark to understand the implementation details. This video breaks it down clearly: https://youtu.be/tdZDotqKtps?si=pcIzCUYs2s_TeQKx

Hope this helps. — Louis

With Python as the go-to language for data science, pandas has gained immense popularity in the data science community, as it is simple to learn and use, while powerful, expressive, and flexible. As data volumes grow, a key drawback of pandas is its inability to scale with increasing data volumes