Koalas is distributed on a Databricks cluster similar to how Spark dataframes are also distributed. Pandas dataframes only live on the spark driver in memory. If you are a pandas user and are using a multi-node cluster then you should use koalas to process the data. If you are able to use a single node databricks cluster then pandas could fit your needs as the data likely fits on a single computer.