cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I have several thousands of Delta tables in my Production, what is the best way to get counts

Srikanth_Gupta_
Databricks Employee
Databricks Employee

if I might need a dashboard to see increase in number of rows on day to day basis, also a dashboard that shows size of Parquet/Delta files in my Lake?

2 REPLIES 2

sajith_appukutt
Honored Contributor II

The  history operation on a delta table returns a collection of operations metrics which includes number of rows and files added/removed between different operations. You could use this in our sql-endpoint workspace and build a dashboard that displays the desired info.

DESCRIBE HISTORY delta.`/data/events/`

brickster_2018
Databricks Employee
Databricks Employee
val db = "database_name"
spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x.provider.toString().toLowerCase.contains("delta"))

The above code snippet will give the name of all the Delta tables in a database. If the intention is to create a dashboard, then

  1. iterate over all the databases
  2. identify the Delta tables
  3. Call the "DESCRIBE HISTORY" command on each of the delta tables
  4. The details of all the tables and row count can be stored in a Delta table.
  5. Dashboard can query the delta table used to store these details in #4

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now