<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I get the size of files cleaned up by a vacuum for a Delta table. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-do-i-get-the-size-of-files-cleaned-up-by-a-vacuum-for-a/m-p/22395#M15330</link>
    <description>&lt;P&gt;The output of the optimized command produces the following metrics: &lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Number of Files Added&lt;/LI&gt;&lt;LI&gt;Number of Files Removed&lt;/LI&gt;&lt;LI&gt;min, max, avg, total files, and total size of files added&lt;/LI&gt;&lt;LI&gt;min, max, avg, total files, and total size of files removed&lt;/LI&gt;&lt;LI&gt;number of partitions optimized&lt;/LI&gt;&lt;LI&gt;z order stats&lt;/LI&gt;&lt;LI&gt;number of batches &lt;/LI&gt;&lt;LI&gt;total files considered &lt;/LI&gt;&lt;LI&gt;total files skipped &lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If that information does not provide the details required then you would need to scan the file system before and after running the command to collect and analyze the data yourself. &lt;/P&gt;</description>
    <pubDate>Mon, 21 Jun 2021 18:15:26 GMT</pubDate>
    <dc:creator>Ryan_Chynoweth</dc:creator>
    <dc:date>2021-06-21T18:15:26Z</dc:date>
    <item>
      <title>How do I get the size of files cleaned up by a vacuum for a Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-get-the-size-of-files-cleaned-up-by-a-vacuum-for-a/m-p/22394#M15329</link>
      <description />
      <pubDate>Fri, 18 Jun 2021 21:07:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-get-the-size-of-files-cleaned-up-by-a-vacuum-for-a/m-p/22394#M15329</guid>
      <dc:creator>User16826987838</dc:creator>
      <dc:date>2021-06-18T21:07:31Z</dc:date>
    </item>
    <item>
      <title>Re: How do I get the size of files cleaned up by a vacuum for a Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-get-the-size-of-files-cleaned-up-by-a-vacuum-for-a/m-p/22395#M15330</link>
      <description>&lt;P&gt;The output of the optimized command produces the following metrics: &lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Number of Files Added&lt;/LI&gt;&lt;LI&gt;Number of Files Removed&lt;/LI&gt;&lt;LI&gt;min, max, avg, total files, and total size of files added&lt;/LI&gt;&lt;LI&gt;min, max, avg, total files, and total size of files removed&lt;/LI&gt;&lt;LI&gt;number of partitions optimized&lt;/LI&gt;&lt;LI&gt;z order stats&lt;/LI&gt;&lt;LI&gt;number of batches &lt;/LI&gt;&lt;LI&gt;total files considered &lt;/LI&gt;&lt;LI&gt;total files skipped &lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If that information does not provide the details required then you would need to scan the file system before and after running the command to collect and analyze the data yourself. &lt;/P&gt;</description>
      <pubDate>Mon, 21 Jun 2021 18:15:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-get-the-size-of-files-cleaned-up-by-a-vacuum-for-a/m-p/22395#M15330</guid>
      <dc:creator>Ryan_Chynoweth</dc:creator>
      <dc:date>2021-06-21T18:15:26Z</dc:date>
    </item>
    <item>
      <title>Re: How do I get the size of files cleaned up by a vacuum for a Delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-get-the-size-of-files-cleaned-up-by-a-vacuum-for-a/m-p/22396#M15331</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;def getVaccumSize(table: String): Long = {
  val listFiles = spark.sql(s"VACUUM $table DRY RUN").select("path").collect().map(_(0)).toList
  var sum = 0L
  listFiles.foreach(x =&amp;gt; sum += dbutils.fs.ls(x.toString)(0).size)
  sum 
}
&amp;nbsp;
getVaccumSize("&amp;lt;your-table-name&amp;gt;")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;You could use this function to get the size of files cleaned up&lt;/P&gt;</description>
      <pubDate>Mon, 21 Jun 2021 21:06:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-get-the-size-of-files-cleaned-up-by-a-vacuum-for-a/m-p/22396#M15331</guid>
      <dc:creator>sajith_appukutt</dc:creator>
      <dc:date>2021-06-21T21:06:16Z</dc:date>
    </item>
  </channel>
</rss>

