cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I have several thousands of Delta tables in my Production, what is the best way to get counts

Srikanth_Gupta_
Valued Contributor

if I might need a dashboard to see increase in number of rows on day to day basis, also a dashboard that shows size of Parquet/Delta files in my Lake?

2 REPLIES 2

sajith_appukutt
Honored Contributor II

The  history operation on a delta table returns a collection of operations metrics which includes number of rows and files added/removed between different operations. You could use this in our sql-endpoint workspace and build a dashboard that displays the desired info.

DESCRIBE HISTORY delta.`/data/events/`

brickster_2018
Databricks Employee
Databricks Employee
val db = "database_name"
spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x.provider.toString().toLowerCase.contains("delta"))

The above code snippet will give the name of all the Delta tables in a database. If the intention is to create a dashboard, then

  1. iterate over all the databases
  2. identify the Delta tables
  3. Call the "DESCRIBE HISTORY" command on each of the delta tables
  4. The details of all the tables and row count can be stored in a Delta table.
  5. Dashboard can query the delta table used to store these details in #4

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group