Databricks Community

Faisal · ‎10-11-2023

I am trying to ingest incremental parquet files data to bronze streaming table, how much history data should be retained ideally in bronze layer as a general best practise considering I will be only using bronze to ingest source data and move it to silver streaming tables using APPLY_CHANGES_INTO?

MuthuLakshmi · ‎11-08-2023

The amount of history data that should be retained in the bronze layer depends on your specific use case and requirements. As a general best practice, you should retain enough history data to support your downstream analytics and machine learning workloads, while also considering the cost and performance implications of storing and processing large amounts of data.

One approach to managing historical data in the bronze layer is to use partitioning and time-based data retention policies. For example, you can partition your data by date or time, and then use a retention policy to automatically delete or archive old partitions after a certain period of time. This can help you manage the size of your data lake and reduce storage costs, while still retaining enough historical data to support your use cases.

Databricks Community

DLT bronze tables

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences