<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to get the count of files/partition for a Delta table? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-files-partition-for-a-delta-table/m-p/21048#M14286</link>
    <description>&lt;P&gt;I have a delta table and I run optimize command regularly. However, I still see a large number of files in the table. I wanted to get a break up of the files in each partition and identify which partition has more files. What is the easiest way to get this information? &lt;/P&gt;</description>
    <pubDate>Wed, 23 Jun 2021 23:23:30 GMT</pubDate>
    <dc:creator>brickster_2018</dc:creator>
    <dc:date>2021-06-23T23:23:30Z</dc:date>
    <item>
      <title>How to get the count of files/partition for a Delta table?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-files-partition-for-a-delta-table/m-p/21048#M14286</link>
      <description>&lt;P&gt;I have a delta table and I run optimize command regularly. However, I still see a large number of files in the table. I wanted to get a break up of the files in each partition and identify which partition has more files. What is the easiest way to get this information? &lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 23:23:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-files-partition-for-a-delta-table/m-p/21048#M14286</guid>
      <dc:creator>brickster_2018</dc:creator>
      <dc:date>2021-06-23T23:23:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to get the count of files/partition for a Delta table?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-files-partition-for-a-delta-table/m-p/21049#M14287</link>
      <description>&lt;P&gt;The below code snippet will give details about the file count per partition&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import com.databricks.sql.transaction.tahoe.DeltaLog
import org.apache.hadoop.fs.Path
&amp;nbsp;
val deltaPath = "&amp;lt;table_path&amp;gt;"
val deltaLog = DeltaLog(spark, new Path(deltaPath + "/_delta_log"))
val currentFiles = deltaLog.snapshot.allFiles
display(currentFiles.groupBy("partitionValues.col").count().orderBy($"count".desc))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 23:57:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-files-partition-for-a-delta-table/m-p/21049#M14287</guid>
      <dc:creator>brickster_2018</dc:creator>
      <dc:date>2021-06-23T23:57:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to get the count of files/partition for a Delta table?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-files-partition-for-a-delta-table/m-p/21050#M14288</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;how to install this library 'import com.databricks.sql.transaction.tahoe.DeltaLog' in databricks cluster? as I am getting module not find error.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;TQ&lt;/P&gt;&lt;P&gt;BR&lt;/P&gt;&lt;P&gt;Saurabh&lt;/P&gt;</description>
      <pubDate>Wed, 07 Dec 2022 11:03:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-files-partition-for-a-delta-table/m-p/21050#M14288</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2022-12-07T11:03:35Z</dc:date>
    </item>
  </channel>
</rss>

