<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT issue - slow download speed in DLT clusters in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-issue-slow-download-speed-in-dlt-clusters/m-p/101424#M40658</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;H2 class="mb-2 mt-6 text-lg first:mt-3"&gt;Possible Causes and Solutions&lt;/H2&gt;
&lt;OL class="marker:text-textOff list-decimal pl-8"&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Network Configuration:&lt;/STRONG&gt;&lt;/SPAN&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;&lt;SPAN&gt;The private connectivity setup might be affecting DLT clusters differently.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Cluster Configuration:&lt;/STRONG&gt;&lt;/SPAN&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;&lt;SPAN&gt;Ensure DLT clusters are properly sized for the workload&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Consider using a larger driver node for complex transformations&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Dependency Management:&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;&lt;SPAN&gt;Consider using cluster pools to reduce startup times&lt;/SPAN&gt;
&lt;DIV class="inline-flex h-[1rem] min-w-[1rem] items-center justify-center rounded-full px-[0.3em] text-center font-mono text-[0.60rem] tabular-nums md:hover:text-white border-borderMain/50 ring-borderMain/50 divide-borderMain/50 dark:divide-borderMainDark/50  dark:ring-borderMainDark/50 dark:border-borderMainDark/50 transition duration-300 bg-offsetPlus dark:bg-offsetPlusDark md:hover:bg-super"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
    <pubDate>Mon, 09 Dec 2024 07:50:08 GMT</pubDate>
    <dc:creator>Sidhant07</dc:creator>
    <dc:date>2024-12-09T07:50:08Z</dc:date>
    <item>
      <title>DLT issue - slow download speed in DLT clusters</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-issue-slow-download-speed-in-dlt-clusters/m-p/97732#M39732</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I'm encountering some issues within my DLT pipelines. Summarized: it takes a long time to install the cluster libraries and dependencies (using %pip installs) due to horribly slow download speeds.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;These are the symptoms:&lt;BR /&gt;&lt;/STRONG&gt;- From all purpose clusters: 300-800 mb/s&lt;BR /&gt;- From job clusters: +- 300-800mb/s&lt;BR /&gt;- From dlt clusters: +- 5-25mb/s&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;These are the effects:&lt;/STRONG&gt;&lt;BR /&gt;- A single iteration of developing my dlt pipeline takes 10-15 minutes because that's what it takes to get in the dependencies. This is not a workable development flow.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;I have the following remarks:&lt;BR /&gt;&lt;/STRONG&gt;- The download speed is consistent accross the dependencies, it does not seem it is particularly slow for certain dependencies.&lt;BR /&gt;- Yes, I could trim some of the dependencies dependent on the flow Im working as workaround. This is not desired, and downloading pyspark (+-300 MB) would be a hassle with these speeds.&lt;BR /&gt;- The infra in setup in an environment with private connectivity.&lt;/P&gt;&lt;P&gt;I'm trying to get a grasp whether this is usual behavior, and if not what the problem might be.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does anyone have experience with something like this?&lt;/P&gt;&lt;P&gt;Please let me know what kind of information you would additionally like to help me out here.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Nov 2024 09:20:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-issue-slow-download-speed-in-dlt-clusters/m-p/97732#M39732</guid>
      <dc:creator>JesseSchouten</dc:creator>
      <dc:date>2024-11-05T09:20:55Z</dc:date>
    </item>
    <item>
      <title>Re: DLT issue - slow download speed in DLT clusters</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-issue-slow-download-speed-in-dlt-clusters/m-p/101424#M40658</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;H2 class="mb-2 mt-6 text-lg first:mt-3"&gt;Possible Causes and Solutions&lt;/H2&gt;
&lt;OL class="marker:text-textOff list-decimal pl-8"&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Network Configuration:&lt;/STRONG&gt;&lt;/SPAN&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;&lt;SPAN&gt;The private connectivity setup might be affecting DLT clusters differently.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Cluster Configuration:&lt;/STRONG&gt;&lt;/SPAN&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;&lt;SPAN&gt;Ensure DLT clusters are properly sized for the workload&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Consider using a larger driver node for complex transformations&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Dependency Management:&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;&lt;SPAN&gt;Consider using cluster pools to reduce startup times&lt;/SPAN&gt;
&lt;DIV class="inline-flex h-[1rem] min-w-[1rem] items-center justify-center rounded-full px-[0.3em] text-center font-mono text-[0.60rem] tabular-nums md:hover:text-white border-borderMain/50 ring-borderMain/50 divide-borderMain/50 dark:divide-borderMainDark/50  dark:ring-borderMainDark/50 dark:border-borderMainDark/50 transition duration-300 bg-offsetPlus dark:bg-offsetPlusDark md:hover:bg-super"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Mon, 09 Dec 2024 07:50:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-issue-slow-download-speed-in-dlt-clusters/m-p/101424#M40658</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2024-12-09T07:50:08Z</dc:date>
    </item>
  </channel>
</rss>

