Hi @Fnazar
When dealing with streaming data, you might end up with many small files, which can be inefficient. Use Delta Lake's OPTIMIZE command to compact files into larger ones and ZORDER to colocate related information in the same set of files. This is particularly useful for columns that are often queried together.
Select a column that results in evenly distributed data. Common choices include dates (for time-based data) or some form of categorical data that is well balanced.
When creating or writing to a Delta table, you can specify the partitioning using the PARTITION BY clause. For instance, if you're partitioning by a date column: df.write.format("delta").partitionBy("date_column").save("/mnt/delta/my_table")
This command will create partitions in the Delta table based on unique values in the date_column
If you're ingesting streaming data into Delta Lake, consider using Auto Loader for efficient and incremental processing of new data.
https://docs.delta.io/latest/best-practices.html
https://docs.databricks.com/en/sql/language-manual/delta-optimize.html