<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta file question in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2725#M47</link>
    <description>&lt;P&gt;But i don't understand. For example, i have 3 files:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/51iD9A2BC374868B664/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When i upload the files using autoloader, 3 files are generated:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/49i4D9CA747D6D2E638/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;Why databricks doesn't put them all into 1 file? &lt;/P&gt;</description>
    <pubDate>Thu, 22 Jun 2023 13:36:08 GMT</pubDate>
    <dc:creator>apiury</dc:creator>
    <dc:date>2023-06-22T13:36:08Z</dc:date>
    <item>
      <title>Delta file question</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2723#M45</link>
      <description>&lt;P&gt;Hi! Im using Autoloader to ingest Binary files into delta format. I have 7 binary files but delta generate 3 files and the format is part-0000, part-0001... Why generate this files with format part-000... &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/56iA8A6AB4E34EAD520/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jun 2023 10:39:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2723#M45</guid>
      <dc:creator>apiury</dc:creator>
      <dc:date>2023-06-22T10:39:47Z</dc:date>
    </item>
    <item>
      <title>Re: Delta file question</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2724#M46</link>
      <description>&lt;P&gt;Hi @Alejandro Piury Pinzón​&amp;nbsp;, The Delta table manages the size of the file being written to the table. The no. of files being written in the Delta table depends upon the total volume of the data being written to the table and not the no. of files at the source location. &lt;/P&gt;&lt;P&gt;The file format part-000 is generated because of the use of a hash algorithm to divide the no. of rows into different files.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jun 2023 13:13:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2724#M46</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-06-22T13:13:43Z</dc:date>
    </item>
    <item>
      <title>Re: Delta file question</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2725#M47</link>
      <description>&lt;P&gt;But i don't understand. For example, i have 3 files:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/51iD9A2BC374868B664/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When i upload the files using autoloader, 3 files are generated:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/49i4D9CA747D6D2E638/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;Why databricks doesn't put them all into 1 file? &lt;/P&gt;</description>
      <pubDate>Thu, 22 Jun 2023 13:36:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2725#M47</guid>
      <dc:creator>apiury</dc:creator>
      <dc:date>2023-06-22T13:36:08Z</dc:date>
    </item>
    <item>
      <title>Re: Delta file question</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2726#M48</link>
      <description>&lt;P&gt;As spark processes the data by dividing the data into multiple partitions, so when writing the data no. of part files created will be equal to no. of partitions. If you are doing this outside Autoloader, you can use coalesce to control the no. of partitions but in Autolader, I am not sure if we can use coalesce.&lt;/P&gt;&lt;P&gt; However, you can run optimize command on the delta table to compact the file.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jun 2023 18:05:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2726#M48</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-06-22T18:05:03Z</dc:date>
    </item>
    <item>
      <title>Re: Delta file question</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2727#M49</link>
      <description>&lt;P&gt;Hi @Alejandro Piury Pinzón​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We haven't heard from you since the last response from @Lakshay Goel​&amp;nbsp;r​, and I was checking back to see if her suggestions helped you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Or else, If you have any solution, please share it with the community, as it can be helpful to others.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2023 05:24:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-file-question/m-p/2727#M49</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-23T05:24:32Z</dc:date>
    </item>
  </channel>
</rss>

