It will depend on the transformations and how you're loading them. Assuming it's mostly in spark, I recommend starting small using a job compute cluster with autoscaling enabled for cost efficiency. For daily loads (6 million records), a driver and 2โ4 workers of Standard_DS3_v2 or Standard_E4ds_v4 should suffice. For weekly loads (9 billion records), scale up to 8โ16 workers using Standard_E8ds_v4 or similar, optionally with spot instances to reduce cost. Enabling Photon should also help with cost/performance optimization if it's a SQL-heavy workloads.