<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Access historical injected data of COPY INTO command in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/access-historical-injected-data-of-copy-into-command/m-p/75293#M34917</link>
    <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks for your detailed answer. As you said, unfortunately this doesn't solve my issue, given that what you post is about Snowflake COPY INTO, and not Databricks one.&lt;/P&gt;&lt;P&gt;Unless this can used also in the databricks version, and I didn't get it.&lt;/P&gt;&lt;P&gt;Moreover, regarding the retention period, I quite don't get what you wrote. Snowflake has 64 or 14 retention days period?&lt;/P&gt;</description>
    <pubDate>Fri, 21 Jun 2024 08:57:56 GMT</pubDate>
    <dc:creator>N_M</dc:creator>
    <dc:date>2024-06-21T08:57:56Z</dc:date>
    <item>
      <title>Access historical injected data of COPY INTO command</title>
      <link>https://community.databricks.com/t5/data-engineering/access-historical-injected-data-of-copy-into-command/m-p/54838#M30172</link>
      <description>&lt;P&gt;Dear Community,&lt;/P&gt;&lt;P&gt;I'm using the COPY INTO command to automate the staging of files that I get in an S3 bucket into specific delta tables (with some transformation on the fly).&lt;/P&gt;&lt;P&gt;The command works smoothly, and files are indeed inserted only once (writing idempotency works fine). The documentation says that filenames are written in a form of key:value in some RockDB.&lt;/P&gt;&lt;P&gt;The fact is that I need to access the (new) staged filenames in the workflow, and the idea is to look into the metadata or transactional logs rather than in the table itself (that is huge). Unfortunately, the table history does not contain this information. So my questions are:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;is it possible to access the inserted filenames history metadata?&lt;/LI&gt;&lt;LI&gt;how long is the retention period of such information? (I'm asking because apparently SNOWFLAKE has the same COPY INTO command with identical features, but the documentation clearly says that historical information is stored for 64 days, after that, they are forgotten and COPY INTO will re-stage the files, if found...)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Can you help me?&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2023 08:32:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/access-historical-injected-data-of-copy-into-command/m-p/54838#M30172</guid>
      <dc:creator>N_M</dc:creator>
      <dc:date>2023-12-07T08:32:50Z</dc:date>
    </item>
    <item>
      <title>Re: Access historical injected data of COPY INTO command</title>
      <link>https://community.databricks.com/t5/data-engineering/access-historical-injected-data-of-copy-into-command/m-p/75293#M34917</link>
      <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks for your detailed answer. As you said, unfortunately this doesn't solve my issue, given that what you post is about Snowflake COPY INTO, and not Databricks one.&lt;/P&gt;&lt;P&gt;Unless this can used also in the databricks version, and I didn't get it.&lt;/P&gt;&lt;P&gt;Moreover, regarding the retention period, I quite don't get what you wrote. Snowflake has 64 or 14 retention days period?&lt;/P&gt;</description>
      <pubDate>Fri, 21 Jun 2024 08:57:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/access-historical-injected-data-of-copy-into-command/m-p/75293#M34917</guid>
      <dc:creator>N_M</dc:creator>
      <dc:date>2024-06-21T08:57:56Z</dc:date>
    </item>
  </channel>
</rss>

