Palash01
Valued Contributor

Hey @jcozar

Thanks for bringing up your concerns, always happy to help 😁

Let's take a look at your concerns: 

1. External Locations and Data Deletion:

  • If you delete a table in Unity Catalog that references an external location (e.g., Azure Storage Account), the data in the external location is NOT deleted. Unity Catalog only manages metadata, not the underlying data itself. The data remains in the external location until you manually delete it through the storage service's management interface or other tools.
  • If you don't use an external location and store the data directly in Unity Catalog tables, the data will indeed be deleted after 30 days of inactivity. This is due to Unity Catalog's automatic cleanup policies.

In a nutshell, the metadata is removed immediately. The underlying data is deleted asynchronously and permanently after 30 days. (You can find more details on this topic answered by @Retired_mod our community manager here Databricks Community - Permanently delete dropped table (Unity Catalog) )

2. Schema Paths and Management:

  • Unity Catalog managing paths in external locations can be convenient, but it's not always the best practice. Advantages include automatic path updates during schema changes and centralized metadata management.
  • Specifying custom paths provides more control and flexibility. You can define paths based on your specific needs and data organization logic. This might be more suitable if you have complex data structures or prefer manual control over file locations.

3. Delta File and Cloud files Format

This concern is a little unclear to me at this time but I'll try my best to answer, delta is a file format that has the same file extension as .parquet so when you load your bronze you use .parquet but at the same time while sharing the data between the layers through DLT Pipeline like bronze and silver we have cloud files think of this as data sharing between layers instead of reading and writing data at every layer.  Take a look at this image , if you were looking for more details on file systems databricks can be found in the documentation. Please do follow up if I misunderstood this one!!

Recommendation:

  • For raw data (Delta files): Using an external location is always recommended, and Unity Catalog managing the paths is generally fine. This ensures data persistence and consistency with schema changes.
  • Consider your specific needs: If using custom paths simplifies your logic or provides more control, go for it. However, if Unity Catalog managing paths aligns with your organization and doesn't cause issues, it can be a convenient option.
  • Ensure proper access control and security for your data, both in the Unity Catalog and the external location.
  • Evaluate your specific use case and team preferences when deciding between custom paths and Unity Catalog management. 

Leave a like if this helps! 

 

Leave a like if this helps! Kudos,
Palash

View solution in original post