cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

I have several thousands of Delta tables in my Production, what is the best way to get counts

Srikanth_Gupta_
Valued Contributor

if I might need a dashboard to see increase in number of rows on day to day basis, also a dashboard that shows size of Parquet/Delta files in my Lake?

2 REPLIES 2

sajith_appukutt
Honored Contributor II

The  history operation on a delta table returns a collection of operations metrics which includes number of rows and files added/removed between different operations. You could use this in our sql-endpoint workspace and build a dashboard that displays the desired info.

DESCRIBE HISTORY delta.`/data/events/`

User16869510359
Esteemed Contributor
val db = "database_name"
spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x.provider.toString().toLowerCase.contains("delta"))

The above code snippet will give the name of all the Delta tables in a database. If the intention is to create a dashboard, then

  1. iterate over all the databases
  2. identify the Delta tables
  3. Call the "DESCRIBE HISTORY" command on each of the delta tables
  4. The details of all the tables and row count can be stored in a Delta table.
  5. Dashboard can query the delta table used to store these details in #4
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.