Re: Selective overwrite on Partition and Liquid cl...

balajij8 · Saturday

Hi, the current way is not optimal. You can follow below

INSERT query ran with mostly 43 tasks, creating 43 output files. Since the Liquid clustered table has no organization (clusterBy "[]") - dates are randomly scattered across files.

Partition table did a clean partition-level swap as all date='2026-06-02' data is isolated by partitioning
In Liquid Clustering with Auto - Data is scattered - mostly both dates mixed across files. REPLACE USING scanned all 86 files, identified which 43 files contained date='2026-06-02' followed by rewrite. It's an expensive operation (numDeletedRows: 30M)

You can follow below to reduce file count during writes

1. Control in Configuration

SET spark.sql.shuffle.partitions = 16

INSERT INTO <target> REPLACE USING (BED)
SELECT <cols> FROM <table>

2. Control in Repartition

INSERT INTO <target> REPLACE USING (BED)
SELECT /*+ REPARTITION(16) */ <col> FROM <table>;

Partitioning is simpler and faster than liquid clustering in this case (With 2 date values and full replacements). Liquid clustering wins when you have high cardinality or multi-dimensional queries.