โ09-15-2025 11:36 PM
Hi All,
I have been using COPYINTO for ingesting the data from managed volumes and my destination is a managed delta table .I would like to know where is it storing the metadata information or a checkpoint location to maintain its idempotent feature.Well i have been pointing to directory not individual files .Please help me to understand about it.I tried everything to search the checkpoint file but am not getting it
COPY INTO dev.final_test.invoice
FROM "/Volumes/workspace/default/hustest/invoice/"
FILEFORMAT = CSV
FORMAT_OPTIONS ('header' = 'true')
COPY_OPTIONS ('mergeSchema'='true')
Regards,
Husna
โ09-16-2025 12:19 AM - edited โ09-16-2025 12:20 AM
Hello @HW413
Good day!
Lets say I have a copy into for a table from a volume, once I do this,
I ran describe table
I could follow the path associated to this table. if you follow the path: You can check the delta logs for metadata and file size, file modified dates etc.,
โ09-16-2025 12:43 AM
@Khaja_Zaffer my delta table is a managed table i have not given any location path during table creation.
When i run your query i get the below
โ09-16-2025 04:24 AM
Hello @HW413
That is expected behavior because Unity Catalog fully manages and abstracts the underlying storage for these tables, handling all aspects of read, write, storage, and optimization automatically.
This design ensures centralized governance, security enforcement (e.g., preventing direct file system access that could bypass access controls), and lifecycle managementโsuch as automatic data deletion after DROP TABLE (with a 7-day soft-delete retention for recovery) and built-in optimizations like auto-compactionโwithout exposing internal paths.
You can also refer this community document for the same.
https://community.databricks.com/t5/data-engineering/viewing-managed-delta-table-files/td-p/125741
โ09-16-2025 12:37 AM
Hi @HW413 ,
You won't find checkpoint. COPY INTO does not use checkpoint like autoloader or spark structured streaming.
The COPY INTO command retrieves metadata about all files in the specified source directory/prefix . So, every time you run copy into, the command first will create in-memory index of all files. You can see it yourself in SPARK UI:
Then it will compare which files has been already loaded using delta log and this in-memory index of files build in previous step.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now