<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT Pipleline with only views in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104151#M41671</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/20512"&gt;@aliacovella&lt;/a&gt;,&lt;/P&gt;
&lt;P class="p1"&gt;The error message you received indicates that no tables are defined by the libraries of the pipeline, which typically occurs when all top-level definitions are views.&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;In Delta Live Tables, a pipeline must include at least one table definition. Views alone are not sufficient to define a pipeline. This is because views in DLT are meant to be derived from tables, and the pipeline needs at least one table to anchor the transformations.&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;To resolve this issue, you can define a table in your pipeline alongside your views. Here is an example of how you can modify your pipeline to include a table definition:&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;import dlt&lt;/P&gt;
&lt;P class="p1"&gt;from pyspark.sql.functions import *&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;# Define a table&lt;/P&gt;
&lt;P class="p1"&gt;@dlt.table&lt;/P&gt;
&lt;P class="p1"&gt;def users_table():&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;return spark.read.table("some_catalog.public.users")&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;# Define a view based on the table&lt;/P&gt;
&lt;P class="p1"&gt;@dlt.view&lt;/P&gt;
&lt;P class="p1"&gt;def users_view():&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;return dlt.read("users_table")&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;In this example, users_table is defined as a table, and users_view is a view that references the users_table. This ensures that your pipeline has at least one table definition, which should resolve the error you encountered.&lt;/P&gt;</description>
    <pubDate>Fri, 03 Jan 2025 20:16:34 GMT</pubDate>
    <dc:creator>Alberto_Umana</dc:creator>
    <dc:date>2025-01-03T20:16:34Z</dc:date>
    <item>
      <title>DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104149#M41669</link>
      <description>&lt;P&gt;I'm trying to create a pipeline containing a view from a federated source. In this case, I'd like to just create materialized views from the federation and and schedule the pipeline for execution. If I define a pipeline&amp;nbsp; with only something like the following:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;@&lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;view&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"users_view"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;users_view&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;table&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"some_catalog.public.users"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;I get the following error:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;com.databricks.pipelines.execution.core.ExecutionFailedException: [DLT ERROR CODE: NO_TABLES_IN_PIPELINE] No tables are defined by the libraries of this pipeline. This error usually occurs when flows defined without defining the table they target, or when all top-level definitions are views.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is it not possible to create a pipeline with just views or is there some other&amp;nbsp; way I should be doing this?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 20:12:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104149#M41669</guid>
      <dc:creator>aliacovella</dc:creator>
      <dc:date>2025-01-03T20:12:36Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104151#M41671</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/20512"&gt;@aliacovella&lt;/a&gt;,&lt;/P&gt;
&lt;P class="p1"&gt;The error message you received indicates that no tables are defined by the libraries of the pipeline, which typically occurs when all top-level definitions are views.&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;In Delta Live Tables, a pipeline must include at least one table definition. Views alone are not sufficient to define a pipeline. This is because views in DLT are meant to be derived from tables, and the pipeline needs at least one table to anchor the transformations.&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;To resolve this issue, you can define a table in your pipeline alongside your views. Here is an example of how you can modify your pipeline to include a table definition:&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;import dlt&lt;/P&gt;
&lt;P class="p1"&gt;from pyspark.sql.functions import *&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;# Define a table&lt;/P&gt;
&lt;P class="p1"&gt;@dlt.table&lt;/P&gt;
&lt;P class="p1"&gt;def users_table():&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;return spark.read.table("some_catalog.public.users")&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;# Define a view based on the table&lt;/P&gt;
&lt;P class="p1"&gt;@dlt.view&lt;/P&gt;
&lt;P class="p1"&gt;def users_view():&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;return dlt.read("users_table")&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;In this example, users_table is defined as a table, and users_view is a view that references the users_table. This ensures that your pipeline has at least one table definition, which should resolve the error you encountered.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 20:16:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104151#M41671</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-01-03T20:16:34Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104197#M41675</link>
      <description>&lt;P&gt;Thanks, that makes sense. So if I create a table against a federated data source, when changes occur in the source table, does it automatically handle the change data capture or does it perform a table scan on the federated source to determine changes and update the DLT?&lt;/P&gt;</description>
      <pubDate>Sat, 04 Jan 2025 13:44:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104197#M41675</guid>
      <dc:creator>aliacovella</dc:creator>
      <dc:date>2025-01-04T13:44:11Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104198#M41676</link>
      <description>&lt;P&gt;From what I can see, it looks like it queries all the records again. I tried looking into the apply changes API, but that seems to require a streaming table, and it appears that streaming tables are not supported from jdbc sources and in this case it happens to be a Postgres database. Is the only choice to support CDC to stream from something like kinesis in this case?&lt;/P&gt;</description>
      <pubDate>Sat, 04 Jan 2025 14:49:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104198#M41676</guid>
      <dc:creator>aliacovella</dc:creator>
      <dc:date>2025-01-04T14:49:45Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104200#M41677</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/20512"&gt;@aliacovella&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;That is correct:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;CDC with Delta Live Tables&lt;/STRONG&gt;: The &lt;CODE&gt;apply_changes&lt;/CODE&gt; API is designed to simplify CDC with Delta Live Tables by processing changes from a change data feed (CDF). This API requires a streaming table to function.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Streaming Table Requirement&lt;/STRONG&gt;: Since streaming tables are not supported from JDBC sources, you cannot directly use a Postgres database as a streaming source for the &lt;CODE&gt;apply_changes&lt;/CODE&gt; API.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Alternative Streaming Sources&lt;/STRONG&gt;: To implement CDC, you would need to use a streaming source that is supported by Databricks, such as Amazon Kinesis. This would involve setting up a Kinesis stream to capture changes from your Postgres database and then using this stream as the source for your Delta Live Tables pipeline.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Configuration and Metrics&lt;/STRONG&gt;: When using Kinesis, you can configure various options such as &lt;CODE&gt;maxFetchDuration&lt;/CODE&gt; and &lt;CODE&gt;minFetchPeriod&lt;/CODE&gt; to optimize the streaming query performance. Additionally, Kinesis provides metrics like &lt;CODE&gt;avgMsBehindLatest&lt;/CODE&gt;, &lt;CODE&gt;maxMsBehindLatest&lt;/CODE&gt;, and &lt;CODE&gt;minMsBehindLatest&lt;/CODE&gt; to monitor the streaming query's progress.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Jan 2025 16:07:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104200#M41677</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-01-04T16:07:07Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104215#M41683</link>
      <description>&lt;P&gt;Informative.&lt;/P&gt;</description>
      <pubDate>Sat, 04 Jan 2025 21:35:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104215#M41683</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2025-01-04T21:35:56Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104249#M41693</link>
      <description>&lt;P&gt;Thanks, I originally tried the Kinesis route, and it worked well but was hoping there was a simpler solution. I'll go with that then.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 05 Jan 2025 16:11:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104249#M41693</guid>
      <dc:creator>aliacovella</dc:creator>
      <dc:date>2025-01-05T16:11:24Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104262#M41696</link>
      <description>&lt;P&gt;No problem, if you have any other questions let me know!&lt;/P&gt;</description>
      <pubDate>Sun, 05 Jan 2025 23:39:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104262#M41696</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-01-05T23:39:21Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipleline with only views</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104398#M41725</link>
      <description>&lt;P&gt;Thanks, I do have one more question. In the following scenario:&lt;BR /&gt;&lt;BR /&gt;A new migration via AWS database migration service has been configured with the CDC going to Kinesis&lt;BR /&gt;A new DLT has been configured to read from the Kinesis stream&lt;BR /&gt;An initial load of potentially millions of records needs to occur&lt;BR /&gt;&lt;BR /&gt;Is the recommended approach to just allow the DLT process the entire stream, or is there a more efficient approach that I should consider.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jan 2025 16:30:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipleline-with-only-views/m-p/104398#M41725</guid>
      <dc:creator>aliacovella</dc:creator>
      <dc:date>2025-01-06T16:30:34Z</dc:date>
    </item>
  </channel>
</rss>

