Brahmareddy
Esteemed Contributor II

Hi pooja_bhumandla,

Great question! How are you doing today? Even after running the OPTIMIZE command with a higher target file size like 100MB, it’s common to still see some small files in your Delta table—especially in partitions with very little data. This happens because Databricks only compacts files if doing so actually improves performance. For example, if a partition contains just a few megabytes total, it may already be efficient and won’t be merged further just to hit the target size. Also, files created recently or still being written to (e.g., by a streaming job) might be skipped by OPTIMIZE. Another factor could be that certain partitions weren’t included in the OPTIMIZE run if a WHERE clause was used. Lastly, some small files might be kept if they contain special data like change data feed metadata or unique schema versions. If small files are spread across many partitions and hurting performance, consider automating OPTIMIZE for only the latest active partitions (like last 7 days), which balances performance and cost better. Let me know if you'd like a sample maintenance script.

Regards,

Brahma

View solution in original post