<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Limited concurrent running DLT's within a pipeline in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/limited-concurrent-running-dlt-s-within-a-pipeline/m-p/106789#M42584</link>
    <description>&lt;P&gt;Hi Champions!&lt;/P&gt;&lt;P&gt;We are currently experiencing a few unexplainable limitations when executing pipelines with &amp;gt; 50 DLT tables. It looks like, that there is some calculation in the background in place, to determine the maximum number of concurrent running DLT's - in our case 16&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Pipeline Config:&lt;BR /&gt;&lt;/STRONG&gt;Cloud: Azure Databricks&lt;BR /&gt;Product Edition: Pro&lt;BR /&gt;Channel: Current&lt;BR /&gt;Pipeline mode: Triggered&lt;BR /&gt;Storage option: Unity Catalog&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Initial Cluster Config:&lt;BR /&gt;&lt;/STRONG&gt;Cluster Policy: None&lt;BR /&gt;Cluster mode: Enhanced autoscaling&lt;BR /&gt;Min / Max Worker: 1/6&lt;BR /&gt;Photon enabled&lt;BR /&gt;Worker / Driver type: Standard_F8&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Analyze the Issue:&lt;BR /&gt;&lt;/STRONG&gt;I checked the detail column for the event_type "cluster_resources" in the event_log tvf for the pipeline. The value for "num_task_slots" is limited to 16. This was a first indicator for me, to think, that this number influence the maximum concurrent running DLT's.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JulianKrger_0-1737625978665.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14297i898B08387BB29D83/image-size/medium?v=v2&amp;amp;px=400" role="button" title="JulianKrger_0-1737625978665.png" alt="JulianKrger_0-1737625978665.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Tries to increase the Concurrency:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Changed the Worker and Driver Types&lt;/LI&gt;&lt;LI&gt;Increased the Driver and Worker Size&lt;/LI&gt;&lt;LI&gt;Changed the product edition&lt;/LI&gt;&lt;LI&gt;Enabled and disabled Photon&lt;/LI&gt;&lt;LI&gt;All Cluster modes (Fixed size, legacy autoscaling, enhanced autoscaling)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;One configuration worked to increase the "num_task_slots". Set the minimum number of workers to 5:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JulianKrger_1-1737626627178.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14298iDACBEEEF086F91C9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="JulianKrger_1-1737626627178.png" alt="JulianKrger_1-1737626627178.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;After that change, the "num_task_slots" increased to 40.&lt;BR /&gt;(I cannot realy derive, why the num_task_slots for 4 worker is 16 and for 5 worker it is 40)&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Concurrency still limited:&lt;BR /&gt;&lt;/STRONG&gt;When I extend my query against the event_log and calculate the number of concurrent running pipelines, the pipeline still execute a maximum of 16 pipelines at the same time:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JulianKrger_2-1737626698802.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14299iE1E0E1989391D8A9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="JulianKrger_2-1737626698802.png" alt="JulianKrger_2-1737626698802.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;So maybe the "num_task_slots" to not influence the number of concurrent running DLT's?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Open questions:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Does the “num_task_slots” have anything to do with the maximum parity?&lt;/LI&gt;&lt;UL&gt;&lt;LI&gt;If yes:&lt;/LI&gt;&lt;UL&gt;&lt;LI&gt;Why the pipeline still caps at 16 concurrent running DLT's while the number has increased to 40?&lt;/LI&gt;&lt;LI&gt;How does Databricks calculate the "num_task_slots"?&lt;/LI&gt;&lt;/UL&gt;&lt;LI&gt;If no:&lt;/LI&gt;&lt;UL&gt;&lt;LI&gt;What does the number "num_task_slots" tell me?&lt;/LI&gt;&lt;LI&gt;What else determine the maximum of concurrent running DLT's?&lt;/LI&gt;&lt;/UL&gt;&lt;/UL&gt;&lt;LI&gt;How can I increase the number of concurrent running pipelines? Is there any cluster conf or DLT conf I can provide?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I do not found any limitation or calculation out there to answer that questions.&lt;BR /&gt;&lt;BR /&gt;I hope that one of you champions out there can help me with this.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Best regards&lt;/P&gt;&lt;P&gt;Julian&lt;/P&gt;</description>
    <pubDate>Thu, 23 Jan 2025 10:07:01 GMT</pubDate>
    <dc:creator>JulianKrüger</dc:creator>
    <dc:date>2025-01-23T10:07:01Z</dc:date>
    <item>
      <title>Limited concurrent running DLT's within a pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/limited-concurrent-running-dlt-s-within-a-pipeline/m-p/106789#M42584</link>
      <description>&lt;P&gt;Hi Champions!&lt;/P&gt;&lt;P&gt;We are currently experiencing a few unexplainable limitations when executing pipelines with &amp;gt; 50 DLT tables. It looks like, that there is some calculation in the background in place, to determine the maximum number of concurrent running DLT's - in our case 16&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Pipeline Config:&lt;BR /&gt;&lt;/STRONG&gt;Cloud: Azure Databricks&lt;BR /&gt;Product Edition: Pro&lt;BR /&gt;Channel: Current&lt;BR /&gt;Pipeline mode: Triggered&lt;BR /&gt;Storage option: Unity Catalog&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Initial Cluster Config:&lt;BR /&gt;&lt;/STRONG&gt;Cluster Policy: None&lt;BR /&gt;Cluster mode: Enhanced autoscaling&lt;BR /&gt;Min / Max Worker: 1/6&lt;BR /&gt;Photon enabled&lt;BR /&gt;Worker / Driver type: Standard_F8&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Analyze the Issue:&lt;BR /&gt;&lt;/STRONG&gt;I checked the detail column for the event_type "cluster_resources" in the event_log tvf for the pipeline. The value for "num_task_slots" is limited to 16. This was a first indicator for me, to think, that this number influence the maximum concurrent running DLT's.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JulianKrger_0-1737625978665.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14297i898B08387BB29D83/image-size/medium?v=v2&amp;amp;px=400" role="button" title="JulianKrger_0-1737625978665.png" alt="JulianKrger_0-1737625978665.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Tries to increase the Concurrency:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Changed the Worker and Driver Types&lt;/LI&gt;&lt;LI&gt;Increased the Driver and Worker Size&lt;/LI&gt;&lt;LI&gt;Changed the product edition&lt;/LI&gt;&lt;LI&gt;Enabled and disabled Photon&lt;/LI&gt;&lt;LI&gt;All Cluster modes (Fixed size, legacy autoscaling, enhanced autoscaling)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;One configuration worked to increase the "num_task_slots". Set the minimum number of workers to 5:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JulianKrger_1-1737626627178.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14298iDACBEEEF086F91C9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="JulianKrger_1-1737626627178.png" alt="JulianKrger_1-1737626627178.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;After that change, the "num_task_slots" increased to 40.&lt;BR /&gt;(I cannot realy derive, why the num_task_slots for 4 worker is 16 and for 5 worker it is 40)&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Concurrency still limited:&lt;BR /&gt;&lt;/STRONG&gt;When I extend my query against the event_log and calculate the number of concurrent running pipelines, the pipeline still execute a maximum of 16 pipelines at the same time:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JulianKrger_2-1737626698802.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14299iE1E0E1989391D8A9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="JulianKrger_2-1737626698802.png" alt="JulianKrger_2-1737626698802.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;So maybe the "num_task_slots" to not influence the number of concurrent running DLT's?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Open questions:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Does the “num_task_slots” have anything to do with the maximum parity?&lt;/LI&gt;&lt;UL&gt;&lt;LI&gt;If yes:&lt;/LI&gt;&lt;UL&gt;&lt;LI&gt;Why the pipeline still caps at 16 concurrent running DLT's while the number has increased to 40?&lt;/LI&gt;&lt;LI&gt;How does Databricks calculate the "num_task_slots"?&lt;/LI&gt;&lt;/UL&gt;&lt;LI&gt;If no:&lt;/LI&gt;&lt;UL&gt;&lt;LI&gt;What does the number "num_task_slots" tell me?&lt;/LI&gt;&lt;LI&gt;What else determine the maximum of concurrent running DLT's?&lt;/LI&gt;&lt;/UL&gt;&lt;/UL&gt;&lt;LI&gt;How can I increase the number of concurrent running pipelines? Is there any cluster conf or DLT conf I can provide?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I do not found any limitation or calculation out there to answer that questions.&lt;BR /&gt;&lt;BR /&gt;I hope that one of you champions out there can help me with this.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Best regards&lt;/P&gt;&lt;P&gt;Julian&lt;/P&gt;</description>
      <pubDate>Thu, 23 Jan 2025 10:07:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limited-concurrent-running-dlt-s-within-a-pipeline/m-p/106789#M42584</guid>
      <dc:creator>JulianKrüger</dc:creator>
      <dc:date>2025-01-23T10:07:01Z</dc:date>
    </item>
    <item>
      <title>Re: Limited concurrent running DLT's within a pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/limited-concurrent-running-dlt-s-within-a-pipeline/m-p/107722#M42901</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/116053"&gt;@JulianKrüger&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;• The "num_task_slots" parameter in Databricks Delta Live Tables (DLT) pipelines is related to the concurrency of tasks within a pipeline. It determines the number of concurrent tasks that can be executed. However, this parameter does not directly determine the maximum number of concurrent running DLT pipelines within a workspace.&lt;BR /&gt;• A pipeline might still be capped at 16 concurrent running DLTs even if the "num_task_slots" has been increased to 40 due to other limitations or configurations in the system, such as cluster configurations or workspace-level limits that are not directly influenced by the "num_task_slots" parameter.&lt;BR /&gt;• The "num_task_slots" is calculated based on the available resources and the specific configurations of the cluster, such as the number of workers and the instance types used. Enhanced autoscaling and other cluster settings can also impact how resources are allocated for task slots.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jan 2025 08:06:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limited-concurrent-running-dlt-s-within-a-pipeline/m-p/107722#M42901</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2025-01-30T08:06:52Z</dc:date>
    </item>
  </channel>
</rss>

