Why is spark creating 5 jobs and 200 tasks?

mordex · ‎12-05-2025

I am trying to read 1000 small csv files each 30 kb size which are stored in databricks volume.

Below is the query i am doing:

df=spark.read.csv.options(header=true).load('/path')

df.collect()

Why is it creating 5 jobs? Why 1-3 jobs have 200 tasks,4 has 1 and 5 has 32 tasks? Moreover, total data size is 1000*30kb=30mb, then why it is not creating a single partition?

Please help?