Cluster configuration
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Monday
Please help me configure/choose the cluster configuration. I need to process and merge 6 million records into Azure SQL DB. At the end of the week, 9 billion records need to be processed and merged into Azure SQL DB, and a few transformations need to be performed to load the data into dim and fact tables. considering cost effective
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tuesday
It will depend on the transformations and how you're loading them. Assuming it's mostly in spark, I recommend starting small using a job compute cluster with autoscaling enabled for cost efficiency. For daily loads (6 million records), a driver and 2–4 workers of Standard_DS3_v2 or Standard_E4ds_v4 should suffice. For weekly loads (9 billion records), scale up to 8–16 workers using Standard_E8ds_v4 or similar, optionally with spot instances to reduce cost. Enabling Photon should also help with cost/performance optimization if it's a SQL-heavy workloads.

