Hi Team,
I am looking for some advice to perf tune my bronze layer using DLT.
I have the following code very simple and yet very effective.
@dlt.create_table(name="bronze_events",
comment = "New raw data ingested from storage account landing zone.")
def bronze_events():
df = (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.schema(schema)
.load("abfss://data@<some storage account>.dfs.core.windows.net/0_Landing")
)
return df
that generates this DAG.
Before it was executing quite fast but as days goes by it is becoming more and more slower like from 2 to 5 to 12 mins. Silver and Gold are all executing less than a minute. So wondering what performance tuning I should do with the bronze layer.
Cheers,
G