Problem Statement
In one of my recent projects, I faced a significant challenge: Writing a huge dataset of 11,582,763,212 rows and 2,068 columns to a Databricks managed Delta table.
The initial write operation took 22.4 hours using the following setup:
- Cluster Configuration:
- Driver: Standard_E4ads_v5 (4 cores, 32 GB)
- Workers: Standard_E4ads_v5 (4 cores, 32 GB), 2–10 autoscaling
- Databricks Runtime: 15.4.28
- Spark Configurations:

Why Was It So Slow?
- Low Parallelism: spark.sql.shuffle.partitions=16 for billions of rows means each partition handled ~724M rows.
- Cluster Underpowered: Even at 10 workers, only 40 cores for 11.5B rows and 2,068 columns.
- Wide Rows: 2,068 columns caused huge shuffle size and memory pressure.
- Delta Overhead: Auto-compaction during write added extra steps.
Optimization Strategy
1. Increase Shuffle Partitions
Reason: More partitions → smaller chunks → better parallelism → less skew.

2. Partition the Delta Table
Reason: Reduces file size per partition and improves query performance.

3. Adjust Cluster Configuration
Reason: Handles massive shuffle and sort for wide rows.
- Recommended: 8–12 workers of Standard_E8ads_v5 (8 cores, 64 GB each)
- Total: 64–96 cores, 512–768 GB memory
4. Disable Auto-Compact During Initial Load
Reason: Avoids extra compaction steps during heavy write.

5. Post-Write Optimization Workflow
Reason: Compacts small files and improves query performance.
- Write Data: Focus on efficient partitioning and parallelism
- Optimize: spark.sql("OPTIMIZE table_name ZORDER BY (important_columns)")
- Vacuum: spark.sql("VACUUM table_name RETAIN 168 HOURS")
Why This Order?
Combining write, optimize, and vacuum in one job creates a huge DAG with multiple shuffles and risks OOM. Splitting them into separate jobs:
- Write → efficient distribution
- Optimize → file compaction
- Vacuum → cleanup
Expected Impact
- Original runtime: 35+ hours
- After optimization: 10–12 hours (with better cluster and configs)
- For 11B rows × 2,068 columns: With chunked writes and upgraded cluster → 8–12 hours instead of days
Key Takeaways
- Parallelism and partitioning are critical for large-scale writes.
- Cluster sizing matters more than you think.
- Separate write, optimize, and vacuum for better performance and smaller DAGs.
- Disable auto-compaction during initial load and run OPTIMIZE later.