Databricks Community

Pu_123 · ‎03-24-2025

Please help me configure/choose the cluster configuration. I need to process and merge 6 million records into Azure SQL DB. At the end of the week, 9 billion records need to be processed and merged into Azure SQL DB, and a few transformations need to be performed to load the data into dim and fact tables. considering cost effective

Shua42 · ‎03-25-2025

It will depend on the transformations and how you're loading them. Assuming it's mostly in spark, I recommend starting small using a job compute cluster with autoscaling enabled for cost efficiency. For daily loads (6 million records), a driver and 2–4 workers of Standard_DS3_v2 or Standard_E4ds_v4 should suffice. For weekly loads (9 billion records), scale up to 8–16 workers using Standard_E8ds_v4 or similar, optionally with spot instances to reduce cost. Enabling Photon should also help with cost/performance optimization if it's a SQL-heavy workloads.

Databricks Community

Cluster configuration

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟