Measure size of all tables in Azure databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-20-2024 05:27 AM
Hi Team,
Currently I am trying to find size of all tables in my Azure databricks, as i am trying to get idea of current data loading trends, so i can plan for data forecast ( i.e. Last 2 months, approx 100 GB data came-in, so in next 2-3 months there should be 150 GB data coming-in)
My production Azure databricks environment is using Unity Catalog, which hosts:
a- All Bronze Tables
b- All Silver Tables
c- All Gold Tables.
d- Some extra Delta-Live-Tables, acting as Temp table holding results of intermediate calculation.
e- Some tables made via EXCEL sheet data.
Above tables are Delta-Live-Tables, made via DLT based pipelines/Jobs.
So i am looking for a script/code/solution which gives me total size in GB for all tables in given database.
Solution based on SQL is good one but even having answer based on Python/scala would be ok.
Also in tradition Relation-DBMS world, there used to be several built-in reports which gives idea of data loading trends via Charts or some graphs...So do we have such built-in feature with Azure Databricks ?
Thanks in Advance
Devsql
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-21-2024 02:47 AM
Hi @Retired_mod,
1- Regarding this issue i had found below link:
Now to try above link, I need to decide: Delta-Table Vs Non-Delta-Table.
Few of my tables are Materialized Views while some are Streaming Tables. So I am Not able to understand about which one to try: Delta-Table Vs Non-Delta-Table.
Can you please help me in that ?
2- Also in above link, for Delta-Table, for below line of code:
DeltaLog.forTable(spark, "dbfs:/<path-to-delta-table>")
I do Not know path: "dbfs:/<path-to-delta-table>".
From where I can get this path, UI Or Command ?
As i am working for corporate company, Admin have given only necessary (minimum) access. So if you guide me than I can ask Admin for permission.
3- When I opened your above given link and tried your solution, I got below error:
Cannot query a Streaming Table `DB`.`Schema`.`Table_append_raw` from an Assigned or No isolation shared cluster, please use a SHARED cluster or a Databricks SQL warehouse instead.
So from above I understood that we need to use SHARED cluster. Is there any other option we have ?
Thank you
Devsql

