Sidhant07
Databricks Employee
Databricks Employee

Using CTAS (CREATE TABLE AS SELECT) might be a more robust solution for your use case:

  1. Independence: CTAS creates a new, independent copy of the data, avoiding dependencies on the source table
  2. Simplified access control: Access rights can be managed solely within the target environment.
  3. Flexibility: You can easily modify the table structure or apply transformations during the copy process.

Optimizing the Cloning Process

  1. Use Delta Lake's CLONE command for efficient copying when appropriate.
  2. Implement incremental updates to minimize data transfer and processing time for subsequent refreshes.
  3. Consider using Databricks Workflows to automate and schedule the cloning process
    7
    .

Addressing Open Questions

  1. The proposed approach is viable, but consider using CTAS instead of shallow clones for better isolation and simpler access management.
  2. Access rights to underlying data files are indeed a concern with shallow clones. CTAS avoids this issue by creating independent copies.