cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to get the count of files/partition for a Delta table?

User16869510359
Esteemed Contributor

I have a delta table and I run optimize command regularly. However, I still see a large number of files in the table. I wanted to get a break up of the files in each partition and identify which partition has more files. What is the easiest way to get this information?

1 ACCEPTED SOLUTION

Accepted Solutions

User16869510359
Esteemed Contributor

The below code snippet will give details about the file count per partition

import com.databricks.sql.transaction.tahoe.DeltaLog
import org.apache.hadoop.fs.Path
 
val deltaPath = "<table_path>"
val deltaLog = DeltaLog(spark, new Path(deltaPath + "/_delta_log"))
val currentFiles = deltaLog.snapshot.allFiles
display(currentFiles.groupBy("partitionValues.col").count().orderBy($"count".desc))

View solution in original post

2 REPLIES 2

User16869510359
Esteemed Contributor

The below code snippet will give details about the file count per partition

import com.databricks.sql.transaction.tahoe.DeltaLog
import org.apache.hadoop.fs.Path
 
val deltaPath = "<table_path>"
val deltaLog = DeltaLog(spark, new Path(deltaPath + "/_delta_log"))
val currentFiles = deltaLog.snapshot.allFiles
display(currentFiles.groupBy("partitionValues.col").count().orderBy($"count".desc))

Hi,

how to install this library 'import com.databricks.sql.transaction.tahoe.DeltaLog' in databricks cluster? as I am getting module not find error.

TQ

BR

Saurabh

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.