<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I have created a DLT pipeline which  reads data from json files which are stored in databricks volum in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/i-have-created-a-dlt-pipeline-which-reads-data-from-json-files/m-p/61263#M31748</link>
    <description>&lt;P&gt;I have created a DLT pipeline which&amp;nbsp; reads data from json files which are stored in databricks volume and puts data into streaming table&amp;nbsp;&lt;BR /&gt;This was working fine.&lt;BR /&gt;when i tried to read the data that is inserted into the table and compare the values with the precalculated ones in the same dlt pipline its failing.&lt;/P&gt;&lt;P&gt;is it because dlt is treating this as an initialization stage and executing these comparism before setting up tables or inserting data into tables&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 20 Feb 2024 13:20:33 GMT</pubDate>
    <dc:creator>zero234</dc:creator>
    <dc:date>2024-02-20T13:20:33Z</dc:date>
    <item>
      <title>I have created a DLT pipeline which  reads data from json files which are stored in databricks volum</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-created-a-dlt-pipeline-which-reads-data-from-json-files/m-p/61263#M31748</link>
      <description>&lt;P&gt;I have created a DLT pipeline which&amp;nbsp; reads data from json files which are stored in databricks volume and puts data into streaming table&amp;nbsp;&lt;BR /&gt;This was working fine.&lt;BR /&gt;when i tried to read the data that is inserted into the table and compare the values with the precalculated ones in the same dlt pipline its failing.&lt;/P&gt;&lt;P&gt;is it because dlt is treating this as an initialization stage and executing these comparism before setting up tables or inserting data into tables&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Feb 2024 13:20:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-created-a-dlt-pipeline-which-reads-data-from-json-files/m-p/61263#M31748</guid>
      <dc:creator>zero234</dc:creator>
      <dc:date>2024-02-20T13:20:33Z</dc:date>
    </item>
    <item>
      <title>Re: I have created a DLT pipeline which  reads data from json files which are stored in databricks v</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-created-a-dlt-pipeline-which-reads-data-from-json-files/m-p/61304#M31758</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99891"&gt;@zero234&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, your assumption looks aligned to mine y&lt;SPAN&gt;our pipeline reads data from JSON files,&lt;/SPAN&gt;&lt;SPAN&gt; inserts it into a streaming table,&lt;/SPAN&gt;&lt;SPAN&gt; and then tries to compare values in the table with pre-calculated values before any data has been written.&lt;/SPAN&gt;&lt;SPAN&gt; This leads to a comparison with an empty table,&lt;/SPAN&gt;&lt;SPAN&gt; resulting in the error.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Possible Solution:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN&gt;Don't perform the comparison within the same notebook as table creation.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Create a separate notebook or trigger that runs after the table has received data.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;This ensures comparison happens only when there's actual data to compare. You can also set this using a spark job which can help you trigger the dlt pipeline first and comparison&amp;nbsp;afterwards.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Before comparing,&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;modify the comparison code to explicitly check if the streaming table has received any data.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;You can use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;table.isEmpty()&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or similar logic to confirm if there's data before proceeding.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Feb 2024 02:01:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-created-a-dlt-pipeline-which-reads-data-from-json-files/m-p/61304#M31758</guid>
      <dc:creator>Palash01</dc:creator>
      <dc:date>2024-02-21T02:01:31Z</dc:date>
    </item>
    <item>
      <title>Re: I have created a DLT pipeline which  reads data from json files which are stored in databricks v</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-created-a-dlt-pipeline-which-reads-data-from-json-files/m-p/61357#M31769</link>
      <description>&lt;P&gt;Keep your DLT code separate from your comparison code, and run your comparison code once your DLT data has been ingested.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Feb 2024 12:24:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-created-a-dlt-pipeline-which-reads-data-from-json-files/m-p/61357#M31769</guid>
      <dc:creator>AmanSehgal</dc:creator>
      <dc:date>2024-02-21T12:24:47Z</dc:date>
    </item>
  </channel>
</rss>

