Hi @NanthakumarYoga, Databricks reads data from Blob storage in a distributed way, breaking the data into partitions processed by separate tasks in Spark.
- The size of partitions can be user-controlled, enabling efficient processing of large files without memory issues.
- One executor runs per worker node in Databricks.
- The number of parallel tasks is determined by the number of cores across all executors.
- An instance with eight cores can run up to 8 tasks in parallel.
- The number of executors depends on the cluster configuration.
- Each executor is a JVM process and can run multiple tasks.
- The number of tasks per executor equals the number of cores allocated to that executor.
- Task design per file size is based on data partitioning, not file size.
- Spark splits data into partitions, each processed as a separate task.
- The number of partitions can be user-controlled.