Databricks Community

William_Scardua · ‎10-06-2021

How do you work to fixing the small/big file problem ? what you suggest ?

jose_gonzalez · ‎10-07-2021

Hi @William Scardua ,

I will recommend to use Delta to avoid having small/big files issues. For example, Auto Optimize is an optional set of features that automatically compact small files during individual writes to a Delta table. Paying a small cost during writes offers significant benefits for tables that are queried actively. For more details and examples please check the following link

Auto optimize will create files of 128 MB each. If you would like to compress and optimize further, then I will recommend to use "Optimize" command on your Delta tables. It will compress and create files of 1 GB in size, by default. For more details on this optimize feature, please check the following link

Thank you.

View solution in original post

jose_gonzalez · ‎10-07-2021

Hi @William Scardua ,

I will recommend to use Delta to avoid having small/big files issues. For example, Auto Optimize is an optional set of features that automatically compact small files during individual writes to a Delta table. Paying a small cost during writes offers significant benefits for tables that are queried actively. For more details and examples please check the following link

Auto optimize will create files of 128 MB each. If you would like to compress and optimize further, then I will recommend to use "Optimize" command on your Delta tables. It will compress and create files of 1 GB in size, by default. For more details on this optimize feature, please check the following link

Thank you.