Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:SET spark.da...
Hi @Lakshmi devi , crc file is basically a checksum file that contains the stats for the respective version file. It is used for snapshot verification in the backend.
Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.
@Chhavi Bansal :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...
Hello team!As per the documentation, I understand that the table statistics can be fetched through the delta log (eg min, max, count) in order to not read the underlying data of a delta table.This is the case for numerical types, and timestamp is sup...
Deleting the Delta log directory would cause you to lose the underlying transaction history on the delta table and other delta related optimizations. In effect the table would be converted to a Parquet table at that point