Hey @notwarte,
Using the __databricks_internal catalog to trace the underlying storage location is a solid approach for investigating their footprint.
Regarding your question about storage duplication: yes, materialized views in Databricks do store a physical representation of the data independently from the source tables. Even if a materialized view is built on top of a Delta Live Table (like your silver_raw), Databricks does not re-use the same storage files. Instead, it maintains its own optimized version of the result set defined by the materialized view.
So if your silver_raw table is 30GB, and the silver_publish materialized view weighs almost 1TB, that suggests one of the following:
The transformation logic in the view significantly expands the data (e.g., joins, explodes, denormalization).
The view is accumulating data due to the way it is refreshed or retained.
Keep in mind that materialized views are optimized for query performance, not necessarily for storage efficiency. If the view is auto-refreshed and not using a proper partitioning or cleanup mechanism, it may grow continuously.
Recommendations:
Review whether the materialized view is necessary, or if the same goal could be achieved via an incremental table maintained with DLT or MERGE.
If you keep it, make sure it is partitioned appropriately and has some lifecycle or retention management in place.
Monitor the storage usage and refresh logic regularly, especially if you’re on S3 or ADLS where storage costs matter.
If you believe this answer is correct, please mark it as the solution for future users.
Hope this helps 🙂
Isi