cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Measure size of all tables in Azure databricks

Devsql
New Contributor III

Hi Team,

Currently I am trying to find size of all tables in my Azure databricks, as i am trying to get idea of current data loading trends, so i can plan for data forecast ( i.e. Last 2 months, approx 100 GB data came-in, so in next 2-3 months there should be 150 GB data coming-in)

My production Azure databricks environment is using Unity Catalog, which hosts:

a- All Bronze Tables

b- All Silver Tables

c- All Gold Tables.

d- Some extra Delta-Live-Tables, acting as Temp table holding results of intermediate calculation.

e- Some tables made via EXCEL sheet data.

Above tables are Delta-Live-Tables, made via DLT based pipelines/Jobs.

So i am looking for a script/code/solution which gives me total size in GB for all tables in given database.

Solution based on SQL is good one but even having answer based on Python/scala would be ok.

Also in tradition Relation-DBMS world, there used to be several built-in reports which gives idea of data loading trends via Charts or some graphs...So do we have such built-in feature with Azure Databricks ?

Thanks in Advance

Devsql

1 REPLY 1

Devsql
New Contributor III

Hi @Retired_mod,

1-  Regarding this issue i had found below link:

https://kb.databricks.com/sql/find-size-of-table#:~:text=You%20can%20determine%20the%20size,stats%20...

Now to try above link, I need to decide: Delta-Table Vs Non-Delta-Table.

Few of my tables are Materialized Views while some are Streaming Tables. So I am Not able to understand about which one to try: Delta-Table Vs Non-Delta-Table.

Can you please help me in that ?

2-  Also in above link, for Delta-Table, for below line of code:

DeltaLog.forTable(spark, "dbfs:/<path-to-delta-table>")

I do Not know path: "dbfs:/<path-to-delta-table>".

From where I can get this path, UI Or Command ?

As i am working for corporate company, Admin have given only necessary (minimum) access. So if you guide me than I can ask Admin for permission.

3- When I opened your above given link and tried your solution, I got below error:

Cannot query a Streaming Table `DB`.`Schema`.`Table_append_raw` from an Assigned or No isolation shared cluster, please use a SHARED cluster or a Databricks SQL warehouse instead.

So from above I understood that we need to use SHARED cluster. Is there any other option we have ?

Thank you

Devsql

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group