<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Streaming with Delta table source- definition of &amp;quot;File&amp;quot;? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14877#M9292</link>
    <description>&lt;P&gt;Hi @Michael Galli​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The &lt;B&gt;maxFilesPerTrigger&lt;/B&gt; will measure how many new files to be considered in every micro-batch. The default is 1000. These will be the files associated to your Delta table. So, technically, these will be the parquet files. &lt;/P&gt;</description>
    <pubDate>Tue, 05 Jul 2022 16:59:14 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2022-07-05T16:59:14Z</dc:date>
    <item>
      <title>Streaming with Delta table source- definition of "File"?</title>
      <link>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14876#M9291</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I have a Delta Table as a Spark Streaming source.&lt;/P&gt;&lt;P&gt;This table contains signals on row level -&amp;gt; each signal is one append to the source table that creates a new version in the delta transaction history.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am not really sure now how Spark streaming works if I define&lt;/P&gt;&lt;P&gt;spark&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.readStream&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.format("delta")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("startingVersion", "latest")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("maxFilesPerTrigger ", 100)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are those the last 100 transactions from the delta transaction history? Or the last 100 parquet files?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;Michael&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jul 2022 09:11:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14876#M9291</guid>
      <dc:creator>Michael_Galli</dc:creator>
      <dc:date>2022-07-04T09:11:21Z</dc:date>
    </item>
    <item>
      <title>Re: Streaming with Delta table source- definition of "File"?</title>
      <link>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14877#M9292</link>
      <description>&lt;P&gt;Hi @Michael Galli​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The &lt;B&gt;maxFilesPerTrigger&lt;/B&gt; will measure how many new files to be considered in every micro-batch. The default is 1000. These will be the files associated to your Delta table. So, technically, these will be the parquet files. &lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2022 16:59:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14877#M9292</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-07-05T16:59:14Z</dc:date>
    </item>
    <item>
      <title>Re: Streaming with Delta table source- definition of "File"?</title>
      <link>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14878#M9293</link>
      <description>&lt;P&gt;Thx @Jose Gonzalez​&amp;nbsp;, this makes sense..&lt;/P&gt;&lt;P&gt;What I do not fully understand is the role of the Delta table transaction log in this matter.&lt;/P&gt;&lt;P&gt;E.g. maxFilesPerTrigger&amp;nbsp;is set to 100 files for each micro-batch.&lt;/P&gt;&lt;P&gt;If the Delta transactions of the streaming source looks somewhat like this:&lt;span class="lia-inline-image-display-wrapper" image-alt="Unbenannt"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1741i2C83732F9755C801/image-size/large?v=v2&amp;amp;px=999" role="button" title="Unbenannt" alt="Unbenannt" /&gt;&lt;/span&gt;E.g. there are 70 files per transaction. Will the micro batch 1 contain files from version 0 and 1, micro batch 2 contain files from version 1 and 2, and so on? So the Delta Table version is not really relevant for the streaming micro badge sizing?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2022 05:58:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14878#M9293</guid>
      <dc:creator>Michael_Galli</dc:creator>
      <dc:date>2022-07-06T05:58:17Z</dc:date>
    </item>
    <item>
      <title>Re: Streaming with Delta table source- definition of "File"?</title>
      <link>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14879#M9294</link>
      <description>&lt;P&gt;Hey there @Michael Galli​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2022 08:56:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/streaming-with-delta-table-source-definition-of-quot-file-quot/m-p/14879#M9294</guid>
      <dc:creator>Vidula</dc:creator>
      <dc:date>2022-08-31T08:56:50Z</dc:date>
    </item>
  </channel>
</rss>

