Why is spark creating 5 jobs and 200 tasks?

mordex
New Contributor III

I am trying to read 1000 small csv files each 30 kb size which are stored in databricks volume. 

Below is the query i am doing:

df=spark.read.csv.options(header=true).load('/path')

df.collect()

 

Why is it creating 5 jobs? Why 1-3 jobs have 200 tasks,4 has 1 and 5 has 32 tasks? Moreover, total data size is 1000*30kb=30mb, then why it is not creating a single partition?

Please help? 

 

030a9798-9c6f-4ab3-be53-7f6e4a5f7289.jfif