Databricks Community

Dlt · ‎12-27-2023

Background.

I have created a DLT pipeline in which i am creating a Temorary table. There are 5 temporary tables as such. When i executed these in an independent notebook they all worked fine with DLT.

Now i have merged this notebook ( keeping same exact code) with other bronze layer notebook

Each temporary table is in seperate cell. But with this consolidated notebook i am getting above mentioned error.

So NOT sure what is issue here , if a DLT code which worked independently earlier why it would fail when combined with other bronze layer code.

This is happing due to a feature issue with DLT where we can add multiple notebook but cannot setup a sequence

Thanks

BR_DatabricksAI · ‎12-27-2023

Please uses the workflow and jobs option and associate the respective notebooks with respective to the job in order to enable the sequential process.

https://docs.databricks.com/en/workflows/jobs/create-run-jobs.html

This functionality is also avaiable for DLT tables as well though I have not used in DLT tables.

Wojciech_BUK · ‎12-27-2023

You probably messed up your code, or alternatively, the runtime has been upgraded, and it stopped working.

DLT is 100% declarative and never runs in any kind of sequence; instead, it is figuring out dependencies between tables and setting the execution DAG (putting code in separate notebooks is just a way of keeping your code clean).

Maybe you can attach your notebook and a screenshot of the error in the DLT pipeline.

There is also another thing you can try:

Create a new DLT pipeline.
Target a new schema.
Put your final bronze notebook.
Run.

If it runs okay, there is a chance that DLT bugged out.

Dlt · ‎12-27-2023

My Notebook is like below and there are 5 such by varying table. I had this earlier in seperate notebook which ran very well . But now i merged it up with Bronze layer notebook and i ran into issues

@Dlt.table(
name="Temp Table",
table_properties={"quality" : "silver"},
Temporary=True
)
@Dlt.expect_all(rules)
def Temp_Table():
return (
spark.sql("SELECT * FROM bronze_layer_table")
.withColumn("is_bad_data", expr(quarantine_rules)))

@Dlt.table(
name="Clean Table",
table_properties={"quality" : "silver"}
)
def get_clean_data():
return (
dlt.read("Temp Table")
.filter("is_bad_data=false")
)
@Dlt.table(
name="Bad Data",
table_properties={"quality" : "silver"}
)

def get_bad_data():
return (
dlt.read("Temp Table")
.filter("is_bad_data=true")
)

Error

Failed to resolve flow due to upstream failure.

Failed to read dataset 'Temp Table'. Dataset is not defined in the pipeline.

Wojciech_BUK · ‎12-27-2023

I don;t belive your code was working before, at least the one you pasted above that has spacebars in table names as DLT throw me errors that it could not register it.
I made some correction to code and and it works ok.

I have added underscored "_" to table names, changed decorators "@Dlt" to "@dlt" and changed "Temporary" to "temporary"

I had to drop your expectations and fake one column with static value.

import dlt

from pyspark.sql.functions import lit

@dlt.table(
name="Temp_Table",
table_properties={"quality" : "silver"},
temporary=True
)
def Temp_Table():
    return (
        spark.sql("SELECT * FROM priv_wojciech_bukowski.dss_gold.dim_dss_date")
        .withColumn("is_bad_data", lit('xxx'))
)


@dlt.table(
name="Clean_Table",
table_properties={"quality" : "silver"}
)
def get_clean_data():
    return (
        dlt.read("Temp_Table")
        .filter("is_bad_data='xxx'")
)

Dlt · ‎12-27-2023

Please note code had worked earlier when I was running it via seperate notebook , these errors are just typo

Considering code has no syntax issues what would went wrong with same code when its called below bronze layer notebook to have just one notebook instead of two.

Wojciech_BUK · ‎12-27-2023

Maybe you materialized the table and later substitute it with temporary table ( just guess).

There were some changes recently that you have only tables and materialized views only in DLT and they let use legacy syntax , so there is chance e.g. that something run on certain version of DLT pipeline and is not working on new version ( and you don't have control over version) .

Again that is just guess, as I did not saw your code and piepliens before and after changes and artifacts created by DLT.

When you Merge code and pipelines there is always chance something goes wrong , especially in DLT as you don't have control over objects like in classic approach 😕

I could not replicated your issue with code that you provided.

Dlt · ‎12-28-2023

Hello

I dropped all existing objects, deleted old DLT pipeline and creating a new one with same name but same problem is seen.

Not sure why it complains about Temporary tables those would be created at runtime , even i tried to remove temporay flag but same problems. Not sure what's wrong here, i am running out of options here.

Wojciech_BUK · ‎12-28-2023

You can attach your notebook with code there.
I did now sawy your code or full trace.

If I were you, I would get exact line of code where you have error and remove that etinre dlt table section and chcek if this will be working.
Then i would add it back trying to resolve the error one by one, maybe you will find pattern.
But you can also attach your code (as file), so somone can import it and help you..

Dlt · ‎01-02-2024

Issue is fixed now . I tried using live qualifier for all the tables I used and then it started working

Thanks for all your help

Thanks

Dlt · ‎12-28-2023

Hello ,

If i refer to above code you created then error is like below pyspark.errors.exceptions.AnalysisException: Failed to read dataset 'Temp_Table'. Dataset is not defined in the pipeline. for each of 5 Temp tables

Below is flow at high level for my DLT Pipeline.

Step1 - 5 Bronze level tables are created and loaded from JSON files

Step2 - 5 Temp tables are created from 5 bronze tables ( created in step 1) above with Boolean bad flag ( derived)

Step3 - 5 Clean and 5 Quarantine tables are created by seperating Good & Bad data based on Bad Flag.

Step4 - 5 Gold layer tables are created from 5 clean tables created in Step 3.

Earlier i had separate notebook for each step which worked great. But when i combined all these into one notebook i am running into issues which i am NOT able to understand. Each table is in separate cell in all steps as such.