Cluster configuration

Pu_123 — Tue, 25 Mar 2025 05:36:04 GMT

Please help me configure/choose the cluster configuration. I need to process and merge 6 million records into Azure SQL DB. At the end of the week, 9 billion records need to be processed and merged into Azure SQL DB, and a few transformations need to be performed to load the data into dim and fact tables. considering cost effective

Re: Cluster configuration

Shua42 — Tue, 25 Mar 2025 17:33:47 GMT

It will depend on the transformations and how you're loading them. Assuming it's mostly in spark, I recommend starting small using a job compute cluster with autoscaling enabled for cost efficiency. For daily loads (6 million records), a driver and 2–4 workers of Standard_DS3_v2 or Standard_E4ds_v4 should suffice. For weekly loads (9 billion records), scale up to 8–16 workers using Standard_E8ds_v4 or similar, optionally with spot instances to reduce cost. Enabling Photon should also help with cost/performance optimization if it's a SQL-heavy workloads.

topic Re: Cluster configuration in Data Engineering

Cluster configuration

Re: Cluster configuration