<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Operations applied when running fs.write_table to overwrite existing feature table in hive metastore in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/operations-applied-when-running-fs-write-table-to-overwrite/m-p/6237#M2434</link>
    <description>&lt;P&gt;@Direo Direo​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When you use deltaTable.write() method to write a DataFrame into a Delta table, it actually triggers the Delta write operation internally. This operation performs two actions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;It writes the new data to disk in the Delta format, and&lt;/LI&gt;&lt;LI&gt;It atomically updates the table metadata in the transaction log.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The CREATE OR REPLACE TABLE AS SELECT statement is used to create or replace a table with the data returned by a query. In Delta Lake, this statement is used to create or replace a Delta table with the results of a query.&lt;/P&gt;&lt;P&gt;The WRITE operation that you see in the Delta table history corresponds to the first action of the Delta &lt;/P&gt;&lt;P&gt;write operation: writing the new data to disk. This operation is recorded in the transaction log and can be used to replay the transaction in case of a failure.&lt;/P&gt;&lt;P&gt;So, the WRITE operation records the actual data being written to the Delta table, while the CREATE OR REPLACE TABLE AS SELECT statement records the metadata update for the Delta table.&lt;/P&gt;&lt;P&gt;In summary, when you write to a Delta table, two operations are triggered: WRITE to write the actual data to disk, and CREATE OR REPLACE TABLE AS SELECT to update the table metadata in the transaction log.&lt;/P&gt;</description>
    <pubDate>Mon, 10 Apr 2023 12:58:47 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2023-04-10T12:58:47Z</dc:date>
    <item>
      <title>Operations applied when running fs.write_table to overwrite existing feature table in hive metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/operations-applied-when-running-fs-write-table-to-overwrite/m-p/6236#M2433</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;there was a need to query an older snapshot of a table. Therefore ran:&lt;/P&gt;&lt;P&gt;deltaTable = DeltaTable.forPath(spark, 'dbfs:/&amp;lt;path&amp;gt;') &lt;/P&gt;&lt;P&gt;display(deltaTable.history())&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and noticed that every fs.write_table run triggers two operations:&lt;/P&gt;&lt;P&gt;Write and CREATE OR REPLACE TABLE AS SELECT. In both cases operation mode is "append".&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/391iA849C3F5C4365A68/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;Would be interesting to know why two operations are triggered and what does WRITE operation do?&lt;/P&gt;</description>
      <pubDate>Fri, 07 Apr 2023 12:38:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/operations-applied-when-running-fs-write-table-to-overwrite/m-p/6236#M2433</guid>
      <dc:creator>Direo</dc:creator>
      <dc:date>2023-04-07T12:38:07Z</dc:date>
    </item>
    <item>
      <title>Re: Operations applied when running fs.write_table to overwrite existing feature table in hive metastore</title>
      <link>https://community.databricks.com/t5/data-engineering/operations-applied-when-running-fs-write-table-to-overwrite/m-p/6237#M2434</link>
      <description>&lt;P&gt;@Direo Direo​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When you use deltaTable.write() method to write a DataFrame into a Delta table, it actually triggers the Delta write operation internally. This operation performs two actions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;It writes the new data to disk in the Delta format, and&lt;/LI&gt;&lt;LI&gt;It atomically updates the table metadata in the transaction log.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The CREATE OR REPLACE TABLE AS SELECT statement is used to create or replace a table with the data returned by a query. In Delta Lake, this statement is used to create or replace a Delta table with the results of a query.&lt;/P&gt;&lt;P&gt;The WRITE operation that you see in the Delta table history corresponds to the first action of the Delta &lt;/P&gt;&lt;P&gt;write operation: writing the new data to disk. This operation is recorded in the transaction log and can be used to replay the transaction in case of a failure.&lt;/P&gt;&lt;P&gt;So, the WRITE operation records the actual data being written to the Delta table, while the CREATE OR REPLACE TABLE AS SELECT statement records the metadata update for the Delta table.&lt;/P&gt;&lt;P&gt;In summary, when you write to a Delta table, two operations are triggered: WRITE to write the actual data to disk, and CREATE OR REPLACE TABLE AS SELECT to update the table metadata in the transaction log.&lt;/P&gt;</description>
      <pubDate>Mon, 10 Apr 2023 12:58:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/operations-applied-when-running-fs-write-table-to-overwrite/m-p/6237#M2434</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-10T12:58:47Z</dc:date>
    </item>
  </channel>
</rss>

