Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2025 12:26 AM
I am trying to read 1000 small csv files each 30 kb size which are stored in databricks volume.
Below is the query i am doing:
df=spark.read.csv.options(header=true).load('/path')
df.collect()
Why is it creating 5 jobs? Why 1-3 jobs have 200 tasks,4 has 1 and 5 has 32 tasks? Moreover, total data size is 1000*30kb=30mb, then why it is not creating a single partition?
Please help?
Labels:
- Labels:
-
Spark