Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Saturday
Hi, the current way is not optimal. You can follow below
INSERT query ran with mostly 43 tasks, creating 43 output files. Since the Liquid clustered table has no organization (clusterBy "[]") - dates are randomly scattered across files.
- Partition table did a clean partition-level swap as all date='2026-06-02' data is isolated by partitioning
- In Liquid Clustering with Auto - Data is scattered - mostly both dates mixed across files. REPLACE USING scanned all 86 files, identified which 43 files contained date='2026-06-02' followed by rewrite. It's an expensive operation (numDeletedRows: 30M)
You can follow below to reduce file count during writes
1. Control in Configuration
SET spark.sql.shuffle.partitions = 16
INSERT INTO <target> REPLACE USING (BED)
SELECT <cols> FROM <table>2. Control in Repartition
INSERT INTO <target> REPLACE USING (BED)
SELECT /*+ REPARTITION(16) */ <col> FROM <table>;Partitioning is simpler and faster than liquid clustering in this case (With 2 date values and full replacements). Liquid clustering wins when you have high cardinality or multi-dimensional queries.