<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merge version data files of Delta table in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/merge-version-data-files-of-delta-table/m-p/49305#M1574</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Here 256th version data files is of the CDF where I am querying the data for this version by following code:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;display(spark.read.format("delta") \
  .option("readChangeFeed", "true") \
  .option("startingVersion", 256) \
  .option("endingVersion", 256) \
  .table("catalogName.schemaName.tableName"))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Current version of the table is 300 but I wants to see what are the updates in 256th version. But what I found is that, this version is having 50 data files, reading which my code is displaying the data to me. As it is having 50 files, it is taking time to return the result. How to optimize the 256th version of the table when my current table version is 300?&lt;/P&gt;&lt;P&gt;OPTIMIZE command will optimize the current version (snapshot) of the table and creates the new version with lesser number of file.&lt;/P&gt;</description>
    <pubDate>Mon, 16 Oct 2023 13:22:31 GMT</pubDate>
    <dc:creator>Data_Analytics1</dc:creator>
    <dc:date>2023-10-16T13:22:31Z</dc:date>
    <item>
      <title>Merge version data files of Delta table</title>
      <link>https://community.databricks.com/t5/get-started-discussions/merge-version-data-files-of-delta-table/m-p/49012#M1544</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am having one CDC enabled Delta table. In 256th version, table is having 50 data files. I want all to merge and create a single file. How can I merge all 50 data file and when I query for 256th version, I should get 1 data file? Is there any command which can optimize the file size?&lt;/P&gt;</description>
      <pubDate>Thu, 12 Oct 2023 09:27:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/merge-version-data-files-of-delta-table/m-p/49012#M1544</guid>
      <dc:creator>Data_Analytics1</dc:creator>
      <dc:date>2023-10-12T09:27:11Z</dc:date>
    </item>
    <item>
      <title>Re: Merge version data files of Delta table</title>
      <link>https://community.databricks.com/t5/get-started-discussions/merge-version-data-files-of-delta-table/m-p/49305#M1574</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Here 256th version data files is of the CDF where I am querying the data for this version by following code:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;display(spark.read.format("delta") \
  .option("readChangeFeed", "true") \
  .option("startingVersion", 256) \
  .option("endingVersion", 256) \
  .table("catalogName.schemaName.tableName"))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Current version of the table is 300 but I wants to see what are the updates in 256th version. But what I found is that, this version is having 50 data files, reading which my code is displaying the data to me. As it is having 50 files, it is taking time to return the result. How to optimize the 256th version of the table when my current table version is 300?&lt;/P&gt;&lt;P&gt;OPTIMIZE command will optimize the current version (snapshot) of the table and creates the new version with lesser number of file.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Oct 2023 13:22:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/merge-version-data-files-of-delta-table/m-p/49305#M1574</guid>
      <dc:creator>Data_Analytics1</dc:creator>
      <dc:date>2023-10-16T13:22:31Z</dc:date>
    </item>
    <item>
      <title>Re: Merge version data files of Delta table</title>
      <link>https://community.databricks.com/t5/get-started-discussions/merge-version-data-files-of-delta-table/m-p/49338#M1581</link>
      <description>&lt;P&gt;Hi, ae you talking about merging CSV files?&amp;nbsp;&lt;A href="https://community.databricks.com/t5/machine-learning/merge-12-csv-files-in-databricks/td-p/3551#:~:text=Use%20Union()%20method%20to,from%20the%20specified%20set%2Fs." target="_blank"&gt;https://community.databricks.com/t5/machine-learning/merge-12-csv-files-in-databricks/td-p/3551#:~:text=Use%20Union()%20method%20to,from%20the%20specified%20set%2Fs.&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2023 04:44:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/merge-version-data-files-of-delta-table/m-p/49338#M1581</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-10-17T04:44:38Z</dc:date>
    </item>
  </channel>
</rss>

