<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT CDC SCD-1 pipeline not showing stats when reading from parquet file in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-cdc-scd-1-pipeline-not-showing-stats-when-reading-from/m-p/51648#M29224</link>
    <description>&lt;P&gt;My bad - waiting a bit and doing a proper screen refresh does show the numbers.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 14 Nov 2023 16:46:33 GMT</pubDate>
    <dc:creator>JVesely</dc:creator>
    <dc:date>2023-11-14T16:46:33Z</dc:date>
    <item>
      <title>DLT CDC SCD-1 pipeline not showing stats when reading from parquet file</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cdc-scd-1-pipeline-not-showing-stats-when-reading-from/m-p/51043#M28942</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I followed the tutorial here:&amp;nbsp;&lt;A href="https://docs.databricks.com/en/delta-live-tables/cdc.html#how-is-cdc-implemented-with-delta-live-tables" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/delta-live-tables/cdc.html#how-is-cdc-implemented-with-delta-live-tables&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The only change I did is that data is not appended to a table but is read from a parquet file. In practice this means:&lt;/P&gt;&lt;P&gt;Original:&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;@dlt&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;view&lt;/SPAN&gt;
&lt;SPAN class=""&gt;def&lt;/SPAN&gt; &lt;SPAN class=""&gt;users&lt;/SPAN&gt;&lt;SPAN class=""&gt;():&lt;/SPAN&gt;
  &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN class=""&gt;spark&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;readStream&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;format&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"delta"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;table&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"cdc_data.users"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;My code&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;@dlt.view
def vcp_analyte_source():
  return (
    spark.readStream.format("cloudFiles") \
      .option("cloudFiles.format", "parquet") \
      .option("cloudFiles.schemaEvolutionMode", "none") \
      .schema(vcp_analytes_schema) \
      .load(vcp_analytes_data_path)
  )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This works, the hidden "_apply_changes_storage_" table is filled with data from the parquet files and the resulting "gold" view gives expected number of records.&lt;/P&gt;&lt;P&gt;However, when I go to the delta live tables dashboard where the streaming tables are rendered (see attached file), the number of "upserted" and "deleted" records is not available even though 2000 records have been ingested.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is that a "feature" of working with parquet files, known bug or something I have to enable elsewhere? If this is how it is, is there anywhere else to look for good record ingestion performance statistics?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Nov 2023 12:49:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cdc-scd-1-pipeline-not-showing-stats-when-reading-from/m-p/51043#M28942</guid>
      <dc:creator>JVesely</dc:creator>
      <dc:date>2023-11-13T12:49:45Z</dc:date>
    </item>
    <item>
      <title>Re: DLT CDC SCD-1 pipeline not showing stats when reading from parquet file</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cdc-scd-1-pipeline-not-showing-stats-when-reading-from/m-p/51648#M29224</link>
      <description>&lt;P&gt;My bad - waiting a bit and doing a proper screen refresh does show the numbers.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Nov 2023 16:46:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cdc-scd-1-pipeline-not-showing-stats-when-reading-from/m-p/51648#M29224</guid>
      <dc:creator>JVesely</dc:creator>
      <dc:date>2023-11-14T16:46:33Z</dc:date>
    </item>
  </channel>
</rss>

