<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Seeing results of materialized views while running notebooks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/seeing-results-of-materialized-views-while-running-notebooks/m-p/80476#M36030</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;, thank you for your response! I had a question about the suggestion in point 2. I did some research before this and I couldn't find a way to build a function that could recognize the context (pipeline or standalone). Could you expand on how that function could look and/or which commands I need to use?&lt;/P&gt;</description>
    <pubDate>Thu, 25 Jul 2024 07:39:34 GMT</pubDate>
    <dc:creator>AndreasB</dc:creator>
    <dc:date>2024-07-25T07:39:34Z</dc:date>
    <item>
      <title>Seeing results of materialized views while running notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/seeing-results-of-materialized-views-while-running-notebooks/m-p/79133#M35680</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;My team is currently trying out Delta Live Tables (DLT) for managing our ETL pipelines. An issue we're encountering is that we have notebooks that transform data using Spark SQL. We include these in a DLT pipeline, and we want to both run the pipeline as a whole, and go into a specific notebook, run that and be able to see the materialized views that we create (we use dlt.table()), without having to change the schemas in the FROM statements,&lt;/P&gt;&lt;P&gt;For example, say we have two notebooks, Customer and Sales in a DLT pipeline.&lt;/P&gt;&lt;P&gt;In Customer we have the following code :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table()
def CUSTOMER_TABLE():
    df = spark.sql("""  
        
        SELECT
            Customer_KEY AS CUSTOMER_SK,
            Name         AS CUSTOMER_NAME
        FROM CUSTOMER.EXTRACT_CUSTOMER

 
    """)
    return df&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CUSTOMER_TABLE gets saved to the schema DIMENSIONS in Unity Catalog.&lt;/P&gt;&lt;P&gt;In Sales we have:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table()
def SALES_TABLE():
    df = spark.sql("""  
        
        SELECT
            Item            AS ITEM_NAME,
            Amount          AS SALES_AMOUNT
			C.CUSTOMER_NAME
        FROM SALES.EXTRACT_SALES
		LEFT JOIN MAIN.DIMENSIONS.CUSTOMER_TABLE AS C
 
    """)
    return df&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We can run the notebook Sales and see the results from SALES_TABLE via display(df). However, if we trigger the DLT pipeline, we get the following warning:&lt;/P&gt;&lt;P&gt;"&lt;BR /&gt;Your query 'SALES_TABLE' reads from '&amp;lt;catalog&amp;gt;.&amp;lt;schema&amp;gt;.CUSTOMER_TABLE' but must read from 'LIVE.CUSTOMER_TABLE' instead.&lt;BR /&gt;Always use the LIVE keyword when referencing tables from the same pipeline so that DLT can track the dependencies in the pipeline.&lt;BR /&gt;"&lt;/P&gt;&lt;P&gt;If we then change from MAIN.DIMENSIONS.CUSTOMER_TABLE --&amp;gt; LIVE.CUSTOMER_TABLE, we can run the DLT pipeline. But running the Sales notebook&lt;BR /&gt;itself doesn't work, we get an error that says:&lt;/P&gt;&lt;P&gt;"&lt;BR /&gt;Failed to read dataset 'CUSTOMER_TABLE'. Dataset is not defined in the pipeline.&lt;BR /&gt;"&lt;/P&gt;&lt;P&gt;Right now we have a workaround that dynamically changes the schema names with the help of a parameter in the DLT pipeline&lt;BR /&gt;from (Settings --&amp;gt; Advanced --&amp;gt; Configuration).&lt;/P&gt;&lt;P&gt;Is there a better solution to this? It feels natural to be able to run DLT pipelines and the individual notebooks without&lt;BR /&gt;having to change schema names.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jul 2024 14:58:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/seeing-results-of-materialized-views-while-running-notebooks/m-p/79133#M35680</guid>
      <dc:creator>AndreasB</dc:creator>
      <dc:date>2024-07-17T14:58:26Z</dc:date>
    </item>
    <item>
      <title>Re: Seeing results of materialized views while running notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/seeing-results-of-materialized-views-while-running-notebooks/m-p/80476#M36030</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;, thank you for your response! I had a question about the suggestion in point 2. I did some research before this and I couldn't find a way to build a function that could recognize the context (pipeline or standalone). Could you expand on how that function could look and/or which commands I need to use?&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jul 2024 07:39:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/seeing-results-of-materialized-views-while-running-notebooks/m-p/80476#M36030</guid>
      <dc:creator>AndreasB</dc:creator>
      <dc:date>2024-07-25T07:39:34Z</dc:date>
    </item>
  </channel>
</rss>

