cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

elgeo
by Valued Contributor II
  • 2790 Views
  • 6 replies
  • 8 kudos

Clean up _delta_log files

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:SET spark.da...

  • 2790 Views
  • 6 replies
  • 8 kudos
Latest Reply
Brad
Contributor II
  • 8 kudos

Awesome, thanks for response.

  • 8 kudos
5 More Replies
nlakshmidevi125
by New Contributor
  • 2637 Views
  • 2 replies
  • 1 kudos

about .crc file in delta transaction log

why .crc file will create along with delta log files

  • 2637 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Hi @Lakshmi devi​ , crc file is basically a checksum file that contains the stats for the respective version file. It is used for snapshot verification in the backend.

  • 1 kudos
1 More Replies
chhavibansal
by New Contributor III
  • 838 Views
  • 1 replies
  • 0 kudos

What is the upper bound limit for dataSkippingNumIndexedCols, to keeps stats in delta log file?

Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.

  • 838 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Chhavi Bansal​ :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...

  • 0 kudos
pantelis_mare
by Contributor III
  • 5229 Views
  • 6 replies
  • 2 kudos

Resolved! Delta log statistics - timestamp type not working

Hello team!As per the documentation, I understand that the table statistics can be fetched through the delta log (eg min, max, count) in order to not read the underlying data of a delta table.This is the case for numerical types, and timestamp is sup...

max value image.png
  • 5229 Views
  • 6 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

are you sure the timestamp column is a valid spark-timestamp-type?

  • 2 kudos
5 More Replies
brickster_2018
by Databricks Employee
  • 1937 Views
  • 1 replies
  • 0 kudos
  • 1937 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Deleting the Delta log directory would cause you to lose the underlying transaction history on the delta table and other delta related optimizations. In effect the table would be converted to a Parquet table at that point

  • 0 kudos
Labels