<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Performance issue: Running 50 notebooks from ADF in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/performance-issue-running-50-notebooks-from-adf/m-p/47860#M28213</link>
    <description>&lt;P&gt;I have process in Data factory, that loads CDC changes from sql server and then trigger notebook with merge to bronze and silver zone. Single notebook takes about 1 minute to run but when all 50 notebooks are fired at once the whole process takes 25 minutes.&amp;nbsp;&lt;/P&gt;&lt;P&gt;There is not a lot of changes in sql tables. When notebooks run, cluster must scale up and it takes much more time to finish.&lt;/P&gt;&lt;P&gt;Is it really a big deal for cluster to run 50 notebooks in parallel?&lt;/P&gt;&lt;P&gt;cluster config: 12.2 LTS access mode shared&lt;/P&gt;&lt;P&gt;Photon enabled&lt;/P&gt;&lt;P&gt;worker: 2-8 standard DS3 v2&lt;/P&gt;&lt;P&gt;driver: standard DS3 v2&lt;/P&gt;&lt;P&gt;here is screenshot from ganglia - load starts at 0600&lt;/P&gt;</description>
    <pubDate>Tue, 03 Oct 2023 13:28:06 GMT</pubDate>
    <dc:creator>alesventus</dc:creator>
    <dc:date>2023-10-03T13:28:06Z</dc:date>
    <item>
      <title>Performance issue: Running 50 notebooks from ADF</title>
      <link>https://community.databricks.com/t5/data-engineering/performance-issue-running-50-notebooks-from-adf/m-p/47860#M28213</link>
      <description>&lt;P&gt;I have process in Data factory, that loads CDC changes from sql server and then trigger notebook with merge to bronze and silver zone. Single notebook takes about 1 minute to run but when all 50 notebooks are fired at once the whole process takes 25 minutes.&amp;nbsp;&lt;/P&gt;&lt;P&gt;There is not a lot of changes in sql tables. When notebooks run, cluster must scale up and it takes much more time to finish.&lt;/P&gt;&lt;P&gt;Is it really a big deal for cluster to run 50 notebooks in parallel?&lt;/P&gt;&lt;P&gt;cluster config: 12.2 LTS access mode shared&lt;/P&gt;&lt;P&gt;Photon enabled&lt;/P&gt;&lt;P&gt;worker: 2-8 standard DS3 v2&lt;/P&gt;&lt;P&gt;driver: standard DS3 v2&lt;/P&gt;&lt;P&gt;here is screenshot from ganglia - load starts at 0600&lt;/P&gt;</description>
      <pubDate>Tue, 03 Oct 2023 13:28:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/performance-issue-running-50-notebooks-from-adf/m-p/47860#M28213</guid>
      <dc:creator>alesventus</dc:creator>
      <dc:date>2023-10-03T13:28:06Z</dc:date>
    </item>
  </channel>
</rss>

