Hello @ArjunGopinath96 ,
Greetings!
Change Data Feed (CDF) in Delta Lake provides an efficient way to track changes in a table, including appends. It works by recording row-level changes between versions of a Delta table, capturing both the row data and metadata to indicate whether a row was inserted, deleted, or updated. However, please note that CDF is forward-looking and only records changes that occur after it is enabled.
CDF is capable of handling a table with around 1 million rows and approximately 20,000 appends per week. If your primary interest is in tracking appends, you might want to consider using the "APPLY CHANGES API" in Delta Live Tables. This API simplifies change data capture (CDC) and can be used to directly update records while retaining history for updated records.
Regarding costs, enabling CDF does lead to a slight increase in storage costs for a table. The change data records are generated as the query runs and are generally much smaller than the total size of rewritten files. The exact cost would depend on your specific usage and the pricing details of your Databricks plan.
Docs:
https://docs.databricks.com/en/delta/delta-change-data-feed.html
https://docs.databricks.com/en/delta-live-tables/cdc.html