<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cluster configuration in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/cluster-configuration/m-p/113550#M44568</link>
    <description>&lt;P&gt;It will depend on the transformations and how you're loading them. Assuming it's mostly in spark, I recommend starting small using a job compute cluster with autoscaling enabled for cost efficiency. For daily loads (6 million records), a driver and 2–4 workers of Standard_DS3_v2 or Standard_E4ds_v4 should suffice. For weekly loads (9 billion records), scale up to 8–16 workers using Standard_E8ds_v4 or similar, optionally with spot instances to reduce cost. Enabling Photon should also help with cost/performance optimization if it's a SQL-heavy workloads.&lt;/P&gt;</description>
    <pubDate>Tue, 25 Mar 2025 17:33:47 GMT</pubDate>
    <dc:creator>Shua42</dc:creator>
    <dc:date>2025-03-25T17:33:47Z</dc:date>
    <item>
      <title>Cluster configuration</title>
      <link>https://community.databricks.com/t5/data-engineering/cluster-configuration/m-p/113463#M44542</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Please help me configure/choose the cluster configuration. I need to process and merge 6 million records into Azure SQL DB. At the end of the week, 9 billion records need to be processed and merged into Azure SQL DB, and a few transformations need to be performed to load the data into dim and fact tables. considering cost effective&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Mar 2025 05:36:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cluster-configuration/m-p/113463#M44542</guid>
      <dc:creator>Pu_123</dc:creator>
      <dc:date>2025-03-25T05:36:04Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster configuration</title>
      <link>https://community.databricks.com/t5/data-engineering/cluster-configuration/m-p/113550#M44568</link>
      <description>&lt;P&gt;It will depend on the transformations and how you're loading them. Assuming it's mostly in spark, I recommend starting small using a job compute cluster with autoscaling enabled for cost efficiency. For daily loads (6 million records), a driver and 2–4 workers of Standard_DS3_v2 or Standard_E4ds_v4 should suffice. For weekly loads (9 billion records), scale up to 8–16 workers using Standard_E8ds_v4 or similar, optionally with spot instances to reduce cost. Enabling Photon should also help with cost/performance optimization if it's a SQL-heavy workloads.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Mar 2025 17:33:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cluster-configuration/m-p/113550#M44568</guid>
      <dc:creator>Shua42</dc:creator>
      <dc:date>2025-03-25T17:33:47Z</dc:date>
    </item>
  </channel>
</rss>

