cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT issue - slow download speed in DLT clusters

JesseSchouten
New Contributor

Hi all,

I'm encountering some issues within my DLT pipelines. Summarized: it takes a long time to install the cluster libraries and dependencies (using %pip installs) due to horribly slow download speeds.

These are the symptoms:
- From all purpose clusters: 300-800 mb/s
- From job clusters: +- 300-800mb/s
- From dlt clusters: +- 5-25mb/s

These are the effects:
- A single iteration of developing my dlt pipeline takes 10-15 minutes because that's what it takes to get in the dependencies. This is not a workable development flow.

I have the following remarks:
- The download speed is consistent accross the dependencies, it does not seem it is particularly slow for certain dependencies.
- Yes, I could trim some of the dependencies dependent on the flow Im working as workaround. This is not desired, and downloading pyspark (+-300 MB) would be a hassle with these speeds.
- The infra in setup in an environment with private connectivity.

I'm trying to get a grasp whether this is usual behavior, and if not what the problem might be. 

Does anyone have experience with something like this?

Please let me know what kind of information you would additionally like to help me out here.

1 REPLY 1

Sidhant07
Databricks Employee
Databricks Employee

Hi,

Possible Causes and Solutions

  1. Network Configuration:
    • The private connectivity setup might be affecting DLT clusters differently.
  2. Cluster Configuration:
    • Ensure DLT clusters are properly sized for the workload.
    • Consider using a larger driver node for complex transformations.
  3. Dependency Management:.
    • Consider using cluster pools to reduce startup times
       

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group