- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Friday
Hi @RGSLCA
DESCRIBE DETAIL is the best starting point if you're comparing Delta table sizes, but it's important to understand what it reports. The sizeInBytes value represents only the latest active snapshot of the table, not the total storage consumed by the table on cloud storage.
If you're looking at the backing storage (ADLS, S3, GCS), the actual footprint can be significantly larger because it may also contain:
Historical Parquet files retained for time travel
Tombstoned files awaiting VACUUM
Delta transaction log files (_delta_log)
Change Data Feed files (_change_data) when CDF is enabled
There isn't currently a built-in SQL command or Information Schema view that directly reports Delta log size or CDF overhead. To measure those, you'll need to inspect the table's storage location and calculate the sizes of the _delta_log and _change_data folders separately.
One thing to keep in mind: CDF overhead is workload-dependent. Some changes are materialized as files under _change_data, while others can be derived from the transaction log, so there isn't a simple fixed percentage overhead.
In practice, I usually use DESCRIBE DETAIL for comparing the active table size and then validate the actual storage consumption at the cloud storage level, especially for tables with frequent updates/merges or where VACUUM has not been run recently.