ilir_nuredini
Honored Contributor

Hello Pooja

Target File Size (TFS) is a Delta Lake table property (delta.targetFileSize) that provides the flexibility to specify the desired size of the data files in the root Delta Lake table directory. It ensures Delta Lake tables are written to storage with the specified, approximate, file size. So definitely it is important, but Delta Lake does not guarantee that all output files after OPTIMIZE will be exactly targetFileSize. It instead aims to:

1. Avoid small files
2. Avoid splitting rows or complex data types mid-record
3. and so on

Thats why you see variation on the min and max, and on the percentiles stats.
While the maxFileSize and minFileSize are based on these criterias (not only):

1. targetFileSize (as a guideline)
2. Partition Size & Skew
3. Row and schema characteristics
4. ...

Best, Ilir