DLT pipeline slow streaming (root cause needs to be identified)

EDDatabricks
Databricks Partner

Dear support,

we have the following situation where a set of DLT pipelines are streaming with very low rate incoming data and we need to find the root cause of this delay.

In order to provide more insight about the setup of the DLT pipelines and some metrics regarding the source table :

-The source table has 63.000.077.072 records

-The source table has 2 partitions which are mapped to column values directly

-The source table has 4 partitions which are calculated values from column values

-The streaming query also filters the source table by filtering on one partitioned and one un-partitioned column at the same time

-Metrics of the partition targetted

partition,records

p1,2082775

p2,932645

p3,2808

p4,5

p5,2

p6,30990942

p7,80

p8,143623

p9,1735803700

p10,4819113815

p11,4749727822

p12,12491237547

p13,17198069143

p14,18333204664

p15,3638767501

-The partition of interest is p15 and holds 3.638.767.501 records

-The records of interest which need to be streamed after applying filtering on the partition column and the un-partitioned column are 76.929.237

-The following options are used while streaming :

option("maxBytesPerTrigger", 1024 * 1024 * <MB_PER_TRIGGER_PROPERTY>)

option("ignoreChanges", "true")

option("startingTimestamp", <CUT_OFF_PROPERTY>)

MB_PER_TRIGGER_PROPERTY=10

CUT_OFF_PROPERTY=a given date

-The DLT pipelines have the following specs in terms of processing power :

"node_type_id": "Standard_E8ds_v4",

"driver_node_type_id": "Standard_E8ds_v4",

"autoscale": {

   "min_workers": 1,

   "max_workers": 1,

   "mode": "LEGACY"

}

"photon": false

The problem observed is the following :

The speed with which data is stored on the destination tables is very low. For instance : 2 million records have reached the destination table in 50+hours of streaming.

Note : There are 4 DLT pipelines streaming concurrently from the same source table and appending to different destination tables.

Best regards,

EDDatabricks