Cluster configuration
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Monday - last edited Monday
Hi,
Please help me configure/choose the cluster configuration. I need to process and merge 6 million records into Azure SQL DB. At the end of the week, 9 billion records need to be processed and merged into Azure SQL DB, and a few transformations need to be performed to load the data into dim and fact tables. considering cost effective
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tuesday
Option 1 Daily Load (6M Records) - Cost-Optimized
Cluster Mode: Single Node
VM Type: Standard_DS4_v2 or Standard_E4ds_v5
Workers: 1
Driver Node: Same as worker
Databricks Runtime: 13.x LTS (Photon Optional)
Terminate after: 10-15 mins of inactivity
Autoscaling: Disabled
Option 2 Weekly Load (9B Records) - Autoscaling
Cluster Mode: Multi-node with Autoscaling
Worker Nodes: Standard_E8ds_v5 (8 vCPUs, 64 GB RAM)
Min Workers: 2
Max Workers: 8 (autoscaling)
Driver Node: Standard_E8ds_v5
Databricks Runtime: 13.x LTS (Photon Enabled)
Terminate after: 10-15 mins
Autoscaling: Enabled

