I'm currently working on optimizing a Delta table in Databricks. As part of this, I’ve increased the target file size from the (~33MB) to 100MB using the OPTIMIZE command. However, after running the OPTIMIZE operation, I still observe a large number of small files (e.g., 5KB, 10KB, 100KB, 3MB, etc.) within certain partitions.
I'm trying to understand the possible reasons why these small files are not being merged into larger files, despite the new target file size. Specifically:
Why are small files still present after optimization?
What conditions or limitations might prevent these files from being compacted into larger ones?
Shouldn’t even small partitions be compacted into a single file, even if they don’t reach the target size?
I would appreciate any insights or clarifications.
Thanks in advance for your support!