cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Access historical injected data of COPY INTO command

N_M
New Contributor III

Dear Community,

I'm using the COPY INTO command to automate the staging of files that I get in an S3 bucket into specific delta tables (with some transformation on the fly).

The command works smoothly, and files are indeed inserted only once (writing idempotency works fine). The documentation says that filenames are written in a form of key:value in some RockDB.

The fact is that I need to access the (new) staged filenames in the workflow, and the idea is to look into the metadata or transactional logs rather than in the table itself (that is huge). Unfortunately, the table history does not contain this information. So my questions are:

  • is it possible to access the inserted filenames history metadata?
  • how long is the retention period of such information? (I'm asking because apparently SNOWFLAKE has the same COPY INTO command with identical features, but the documentation clearly says that historical information is stored for 64 days, after that, they are forgotten and COPY INTO will re-stage the files, if found...)

Can you help me?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @N_M Let’s dive into your questions regarding the COPY INTO command and the associated metadata in Snowflake:

  1. Accessing Inserted Filenames History Metadata:

    • When using the COPY INTO command in Snowflake, the filenames of staged files are indeed stored in metadata. However, accessing this information directly from the metadata or transactional logs can be a bit tricky.
    • The primary way to access the filenames is by querying the table itself after the data has been loaded. Unfortunately, as you mentioned, querying a large table might not be ideal.
    • Snowflake does not provide direct access to historical filenames metadata beyond what’s available in the table history.
  2. Retention Period for Filenames Information:

  3. Comparison with SNOWFLAKE:

    • It’s interesting to note that SNOWFLAKE (the COPY INTO command) retains historical information for 64 days before forgetting it. During this period, if the same files are encountered, they won’t be re-staged.
    • In contrast, Snowflake’s retention period is shorter (14 days), but it still provides a window for managing the loading process.

In summary, while direct access to filenames history metadata isn’t readily available, querying the table or using the LIST command within the 14-day retention period should help you monitor and manage your data loading workflow. If you have any further questions or need assistance, feel free to ask! 😊

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.