cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nlakshmidevi125
by New Contributor
  • 1198 Views
  • 2 replies
  • 1 kudos

about .crc file in delta transaction log

why .crc file will create along with delta log files

  • 1198 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 1 kudos

Hi @Lakshmi devi​ , crc file is basically a checksum file that contains the stats for the respective version file. It is used for snapshot verification in the backend.

  • 1 kudos
1 More Replies
chhavibansal
by New Contributor II
  • 454 Views
  • 1 replies
  • 0 kudos

What is the upper bound limit for dataSkippingNumIndexedCols, to keeps stats in delta log file?

Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.

  • 454 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Chhavi Bansal​ :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...

  • 0 kudos
elgeo
by Valued Contributor II
  • 1361 Views
  • 0 replies
  • 4 kudos

Clean up _delta_log files

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:SET spark.da...

  • 1361 Views
  • 0 replies
  • 4 kudos
pantelis_mare
by Contributor III
  • 2841 Views
  • 6 replies
  • 2 kudos

Resolved! Delta log statistics - timestamp type not working

Hello team!As per the documentation, I understand that the table statistics can be fetched through the delta log (eg min, max, count) in order to not read the underlying data of a delta table.This is the case for numerical types, and timestamp is sup...

max value image.png
  • 2841 Views
  • 6 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

are you sure the timestamp column is a valid spark-timestamp-type?

  • 2 kudos
5 More Replies
User16869510359
by Esteemed Contributor
  • 1085 Views
  • 1 replies
  • 0 kudos
  • 1085 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anand_Ladda
Honored Contributor II
  • 0 kudos

Deleting the Delta log directory would cause you to lose the underlying transaction history on the delta table and other delta related optimizations. In effect the table would be converted to a Parquet table at that point

  • 0 kudos
Labels