Databricks Community

PearceR · ‎06-01-2023

Good Morning,

I am having some issues with my DLT pipeline.

I have a scenario where I am loading in bronze-silver tables programatically from a SQL database (each row corresponds to a table to create). This leaves me in a situation where sometimes only half the silver tables are defined in the pipeline.

This is causing me some issues as my gold tables require maybe 3 silver tables to build. So, for example, if silver_A is not defined in the pipeline (it could be removed form the sql table thus not built) then my gold table fails!

I tried to get around this by using a try and except:

@dlt.table(name=f"gold")
    def live_gold():
        """Load data into the gold table."""
        try:
            # Read data
            data = dlt.read(f"silver_A")
        except:
            data = spark.createDataFrame([], schema=schema)
        
        df = (
                data
                .groupBy("`id`", "campaign_title")
                .count()
                .withColumnRenamed("id", "src_campaign_id")
                .withColumn("campaign_id", monotonically_increasing_id())
            )
        return df

if the table cant be read by dlt.read(), then it creates an empty dataframe to use instead.

This is causing me some weird issues. When I run my DLT pipeline, it doesn't pick up the hierarchy of tables, it loads gold in with no data (hitting the except). However, it also loads in my silver table with data, just not connected to the gold!

The only time I can get it to work is when I remove source data and load it back in (deleted from the storage account and then readded). My silver table is streaming from a bronze as standard practice. It works once with proper hierarcy/loading to gold, if i run it a second time, same issue returns.

To me it seems like the try and except just isn't functioning properly, I am unsure of how to inspect the logs further to investigate what DLT is doing under the hood, I know that it figures out hierarchy before processing so maybe its something to do with that.

I also have thought about creating empty silver table objects programatically rather than handling it in the gold table, any opinions on that would be great.

to summarise, my questions are:

Is try and except supported in dlt pipelines? (for loading tables)
How do I get to look at the logs to figure out what order my dlt code is being processed/what's going on?
is there a better way of handling missing tables when creating gold tables?

Thanks,

Robbie