cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

CLONE not supported on delta table with Liquid Clustering

seydouHR
New Contributor III

Hello all,

We are building a data warehouse on Unity Catalog and we use the SHALLOW CLONE command to allow folks to spin up their own dev environments by light copying the prod tables. 

We also started using Liquid Clustering on our feature tables, though we are running into an error when trying to shallow clone these tables. It seems CLONE command is not allowed for tables that have been liquid clustered. 

We'd like to know if this was a temporary restriction / is it in the roadmap at some point in the future to allow a cloning operations on tables that have been liquid clustered ? 

Exception : 

 

 

UnsupportedOperationException: CLONE is not supported for Delta table with Liquid clustering.

 

 

Runtime : 13.3

Thanks !

Seydou

1 ACCEPTED SOLUTION

Accepted Solutions

seydouHR
New Contributor III

Thanks Kaniz for your reply. I was able to get it make it work using runtime 14.0.

Regards,

 

View solution in original post

4 REPLIES 4

Kaniz_Fatma
Community Manager
Community Manager

Hi @seydouHR, The SHALLOW CLONE command in Unity Catalog allows you to create lightweight copies of production tables for development purposes. These clones share metadata with the original tables but do not duplicate the underlying data files, which helps save storage space. It’s a handy feature for creating dev environments.

 

Now, let’s address the issue you’re encountering with Liquid Clustering. Liquid clustering is a powerful feature available in Delta Lake that simplifies data layout decisions and optimizes query performance. It allows you to redefine clustering keys without rewriting existing data, which is beneficial for evolving analytic needs over time.

 

However, there are some important considerations:

 

Liquid Clustering Benefits:

  • High Cardinality Columns: Liquid clustering is recommended for tables often filtered by high cardinality columns.
  • Data Distribution Skew: It’s useful for tables with significant skew in data distribution.
  • Fast-Growing Tables: Tables that grow quickly and require maintenance and tuning effort can benefit from liquid clustering.
  • Concurrent Writes: If your tables have concurrent write requirements, liquid clustering provides flexibility.
  • Changing Access Patterns: Tables where access patterns change over time can leverage liquid clustering.
  • Partition Key Optimization: Liquid clustering helps avoid having too many or too few partitions based on typical partition keys.

Enabling Liquid Clustering:

  • When creating a table, add the CLUSTER BY phrase to the table creation statement. For example:CREATE TABLE <catalog-name>.<schema-name>.<target-table-name> SHALLOW CLONE <catalog-name>.<schema-name>.<source-table-name>
  • Liquid clustering is not compatible with partitioning or ZORDER. The Databricks client manages all layout and optimization operations for data in the table.
  • Once enabled, you can run OPTIMIZE jobs to incrementally cluster data.

Important Notes:

  • Databricks Runtime 13.3 LTS and above is required to create, write, or optimize Delta tables with liquid clustering enabled.
  • Row-level concurrency is supported in Databricks Runtime 13.3 LTS and above for tables with liquid clustering.
  • Tables created with liquid clustering enabled have specific Delta table features enabled at creation.

Limitations:

  • CLONE Command: Unfortunately, the CLONE command is not supported for Delta tables with liquid clustering. This restriction prevents you from shallow cloning tables that have undergone liquid clustering.
  • Readability: Tables with clustering enabled are not readable by Delta Lake clients that do not support all enabled Delta reader protocol table features.

 

In summary, while liquid clustering offers significant benefits, it currently restricts the use of the CLONE command. Keep an eye on updates from Databricks for any changes in this regard! 🚀🔍

seydouHR
New Contributor III

Thanks Kaniz for your reply. I was able to get it make it work using runtime 14.0.

Regards,

 

Wolfoflag
New Contributor II

Hi @seydouHR  @Kaniz_Fatma where you able to shallow clone or deep clone the liquid clustering table in runtime 14.0?

seydouHR
New Contributor III

Hi @Wolfoflag ,

Yes, whithin Databricks I was able to shallow clone using runtime 14.0. 

Also, outside Databricks, it worked when using the databricks sdk with a SQL Warehouse of version 2023.50 or above.

Example with the sdk:

WORKSPACE_CLIENT.statement_execution.execute_statement(
warehouse_id=sql_warehouse_id, statement=statement
)
Where sql_warehouse_id is the id of your sql warehouse and
your statement could be something like :
f"CREATE OR REPLACE TABLE {target_table_full_name} SHALLOW CLONE {source_table_full_name}"
 
WORKSPACE_CLIENT is the workspace client from databricks sdk (
i.e from databricks.sdk import WorkspaceClient
)
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!