Hi @data-engineer-d, First I would like to explain Liquid Clustering:-
Now, seeing your problem,
- You mentioned that after enabling liquid clustering, the number of files increased from around 4300 to 7900, even though the table size remained similar.
- This behaviour is expected due to the way liquid clustering works. When you optimize a table, it reorganizes the data into ZCubes. Some files may be split or merged to form these ZCubes.
- The increase in the number of files doesn’t necessarily mean a decrease in data per file. Instead, it reflects the new organization of data into more efficient clusters.
To justify this increase, consider the following factors:
- If your data has skewed distributions (e.g., some values are more frequent than others), liquid clustering might create more files to evenly distribute the data.
- The clustering columns matter. If the two columns used for clustering have high cardinality (many distinct values), it could lead to more files.
- Despite the increased file count, query performance should improve due to better data skipping and locality.
- Check if the new files are compressed efficiently. Sometimes, smaller files can still hold a significant amount of data due to better compression.
- Monitor query performance after liquid clustering. If it’s improved, the increase in files is likely beneficial.
Remember that liquid clustering optimizes query efficiency, and the increase in files is a trade-off for better performance. If your queries are faster, it’s a sign that the approach is working as intended!