cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Liquid clustering on and dynamic overwrites

radix
New Contributor II

I use the following option to write from multiple tasks to the same table with overwrite (in Pyspark)

.option("partitionOverwriteMode", "dynamic")

The table was created with partition by so it works as expected.
I read about liquid clustering and it's benefits over table partitioning
Is it possible to perform the same use case of writing from multiple tasks (simultaneously)  to the same table with overwrite on a liquid clustered table?

Thanks

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @radixLiquid clustering is an innovative approach that simplifies data layout decisions and optimizes query performance in Delta tables.

  1. What is Liquid Clustering?

    • Liquid clustering replaces traditional table partitioning and ZORDER techniques.
    • It allows you to redefine clustering keys without rewriting existing data.
    • This flexibility enables data layout to evolve alongside analytic needs over time.
  2. Benefits of Liquid Clustering:

    • Improved Query Performance: By organizing data based on certain columns, queries can scan fewer rows, leading to better performance.
    • Adaptability: Liquid clustering makes it easier to adapt to changing query patterns.
    • Reduced Maintenance Effort: It simplifies data layout decisions, reducing the need for manual tuning.
    • Concurrency Support: Tables with liquid clustering enabled support row-level concurrency.
  3. How to Enable Liquid Clustering:

    • When creating a Delta table, add the CLUSTER BY phrase to the table creation statement.
    • Example:
      CREATE TABLE my_table (col1 INT, col2 STRING) USING DELTA CLUSTER BY (col1);
      
  4. Incrementally Clustering Data:

    • Run OPTIMIZE jobs as usual to incrementally cluster data.
    • Liquid clustering allows you to redefine clustering keys without rewriting existing data.
  5. Compatibility and Requirements:

    • Databricks Runtime 13.3 LTS and above are required to create, write, or optimize Delta tables with liquid clustering.
    • Tables with liquid clustering support row-level concurrency in Databricks Runtime 13.3 LTS and above.
  6. Considerations:

    • Liquid clustering requires more storage compared to traditional partitioning.
    • Itโ€™s recommended for all new Delta tables.

In summary, liquid clustering provides a powerful way to optimize data layout and adapt to changing requirements. If youโ€™re working with Delta tables, consider leveraging this feature for improved performance and f...123.

 

Ranjeet1981
New Contributor II

No it doesn't support partition overwrite. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group