cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Partitioning or Processing : Reading CSV file with size of 5 to 9 GB

NanthakumarYoga
New Contributor

Hi Team,

Would you please guide me on

Instance with 28GB and 8 Cores

1. how data bricks reading 5 to 9GB files from BLOB storage ? ( directly loaded full file into one nodes memory )

2. howmany tasks will be created based on Core ? how many executors will be allocated ? How task designed per size of files.

Regards,Nantha

 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @NanthakumarYogaDatabricks reads data from Blob storage in a distributed way, breaking the data into partitions processed by separate tasks in Spark.


- The size of partitions can be user-controlled, enabling efficient processing of large files without memory issues.
- One executor runs per worker node in Databricks.
- The number of parallel tasks is determined by the number of cores across all executors.
- An instance with eight cores can run up to 8 tasks in parallel.
- The number of executors depends on the cluster configuration.
- Each executor is a JVM process and can run multiple tasks.
- The number of tasks per executor equals the number of cores allocated to that executor.
- Task design per file size is based on data partitioning, not file size.
- Spark splits data into partitions, each processed as a separate task.
- The number of partitions can be user-controlled.