<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How should we think about physical data storage when using Delta Lake? Will data be duplicated or saved within AWS ? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24604#M17125</link>
    <description>&lt;P&gt;The  data will be saved in S3 AWS only , delta in itself doesn't store anything, it is just a supporting format that keeps additional logs to achieve ACID transactions like  traditional sql. &lt;/P&gt;</description>
    <pubDate>Fri, 18 Jun 2021 12:55:18 GMT</pubDate>
    <dc:creator>User16826994223</dc:creator>
    <dc:date>2021-06-18T12:55:18Z</dc:date>
    <item>
      <title>How should we think about physical data storage when using Delta Lake? Will data be duplicated or saved within AWS ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24603#M17124</link>
      <description />
      <pubDate>Tue, 15 Jun 2021 02:46:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24603#M17124</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-06-15T02:46:59Z</dc:date>
    </item>
    <item>
      <title>Re: How should we think about physical data storage when using Delta Lake? Will data be duplicated or saved within AWS ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24604#M17125</link>
      <description>&lt;P&gt;The  data will be saved in S3 AWS only , delta in itself doesn't store anything, it is just a supporting format that keeps additional logs to achieve ACID transactions like  traditional sql. &lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 12:55:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24604#M17125</guid>
      <dc:creator>User16826994223</dc:creator>
      <dc:date>2021-06-18T12:55:18Z</dc:date>
    </item>
    <item>
      <title>Re: How should we think about physical data storage when using Delta Lake? Will data be duplicated or saved within AWS ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24605#M17126</link>
      <description>&lt;P&gt;&lt;A href="https://delta.io" alt="https://delta.io" target="_blank"&gt;Delta&lt;/A&gt; itself is a file format, consisting of Parquet files for the actual data, and a JSON transaction log to maintain the ACID transactions, among other benefits. It will live in whatever store (e.g. S3) you choose in AWS.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Data is only duplicated if you choose to write another table, generally as part of data engineering best practices. For example, as you clean and enrich data, you might write another table so the clean data is readily available and removes the need to reprocess the transformations.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jun 2021 17:26:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24605#M17126</guid>
      <dc:creator>KBarlow</dc:creator>
      <dc:date>2021-06-18T17:26:47Z</dc:date>
    </item>
    <item>
      <title>Re: How should we think about physical data storage when using Delta Lake? Will data be duplicated or saved within AWS ?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24606#M17127</link>
      <description>&lt;P&gt;And to the earlier comment of Delta being an extension of Parquet. You can start with a dataset in Parquet format in S3 and do an in-place conversion to Delta without having to duplicate the data. See - &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-convert-to-delta.html" target="test_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-convert-to-delta.html&lt;/A&gt; for details&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 04:03:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-should-we-think-about-physical-data-storage-when-using-delta/m-p/24606#M17127</guid>
      <dc:creator>aladda</dc:creator>
      <dc:date>2021-06-23T04:03:41Z</dc:date>
    </item>
  </channel>
</rss>

