652852
New Contributor III

According to the documentation, COPY INTO should not be inserting duplicated records, it should load a file only once. A caveat seems to be that you have to have loaded all initial data with COPY INTO, rather than CREATE the table with SELECT on some initial batch of files. I at least tried to update an existing table with new parquet files from the same S3 storage location, and the first run of COPY INTO duplicated everything.

Further, MERGE INTO does not appear to support merging from parquet files, so if I want to use that, I likely have to create a staging table in Delta. Sigh.