cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Copy into checkpoint location not able to find

HW413
New Contributor

Hi All,

 

I have been using COPYINTO for ingesting the data from managed volumes  and my destination is a managed delta table .I would like to know where is it storing the metadata information or a checkpoint location to maintain its idempotent feature.Well i have been pointing to directory not individual files .Please help me to understand about it.I tried everything to search the checkpoint file but am not getting it 

COPY INTO dev.final_test.invoice

FROM "/Volumes/workspace/default/hustest/invoice/"

FILEFORMAT = CSV

FORMAT_OPTIONS ('header' = 'true')

COPY_OPTIONS ('mergeSchema'='true')

Regards,

Husna

 
4 REPLIES 4

Khaja_Zaffer
Contributor

Hello @HW413 

Good day!

Khaja_Zaffer_0-1758006491809.png

Lets say I have a copy into for a table from a volume, once I do this, 

I ran describe table 

Khaja_Zaffer_1-1758006563838.png

I could follow the path associated to this table. if you follow the path: You can check the delta logs for metadata and file size, file modified dates etc., 

 

@Khaja_Zaffer  my delta table is a managed table i have not given any location path during table creation.

When i run your query i get the below

 

Hello @HW413 

That is expected behavior because Unity Catalog fully manages and abstracts the underlying storage for these tables, handling all aspects of read, write, storage, and optimization automatically.

 

This design ensures centralized governance, security enforcement (e.g., preventing direct file system access that could bypass access controls), and lifecycle managementโ€”such as automatic data deletion after DROP TABLE (with a 7-day soft-delete retention for recovery) and built-in optimizations like auto-compactionโ€”without exposing internal paths. 

You can also refer this community document for the same. 

https://community.databricks.com/t5/data-engineering/viewing-managed-delta-table-files/td-p/125741

szymon_dybczak
Esteemed Contributor III

Hi @HW413 ,

You won't find checkpoint. COPY INTO does not use checkpoint like autoloader or spark structured streaming. 

The COPY INTO command retrieves metadata about all files in the specified source directory/prefix . So, every time you run copy into, the command first will create in-memory index of all files. You can see it yourself in SPARK UI:

szymon_dybczak_0-1758007836681.png

Then it will compare which files has been already loaded using delta log and this in-memory index of files build in previous step.

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now