โ07-12-2024 02:04 AM - edited โ07-12-2024 03:11 AM
Hi Team,
I would like to know difference between _RAW tables and _APPEND_RAW tables of Bronze-Layer.
As both are STREAMING Tables then why we need 2 separate tables.
Note: we are following Medalion Architecture. Also above tables are created via Delta Live Tables pipeline. so they are basically DLT Tables.
Thanks
Devsql
โ07-15-2024 09:42 AM - edited โ07-15-2024 10:05 AM
Hi @Devsql,
_RAW Tables:
_RAW
tables represent the raw, unprocessed data ingested into your system. They typically contain the original data as it arrives, without any transformations or modifications._RAW
tables are useful for auditing, lineage tracking, and maintaining a historical record of the raw data._APPEND_RAW Tables:
_APPEND_RAW
tables also store raw data, but they allow for both inserts and updates. In addition to new records, they can capture changes to existing records (updates)._RAW
tables, _APPEND_RAW
tables can handle both insert and update operations._APPEND_RAW
tables are a better fit.Why Separate Tables?
_RAW
and _APPEND_RAW
tables depends on your use case and architecture._RAW
tables._APPEND_RAW
tables.Both types of tables are created via the Delta Live Tables pipeline, making them part of the Delta Lake ecosystem.
Feel free to ask if you need further clarification or have additional questions! ๐
โ07-12-2024 02:30 AM
I don't exactly understand your question, so let me try to give you a generic answer. You don't need to do anything, if you're fine with working with one table, then just go with one.
An append-only table, as the name suggests, will only contain insert operations. By using the table property "delta.appendOnly" you can also force it.
And "a not-only append" table, obviously, might contain next to inserts, also updates and deletes.
In general, there's no right or wrong answer, as it highly depends on your use case/architecture.
โ07-12-2024 03:12 AM
Hi @Witold , I updated last line of my above post, added below lines: Also above tables are created via Delta Live Tables pipeline. so they are basically DLT Tables. Hope this gives you proper idea.
โ07-12-2024 04:00 AM
Hi @Kaniz_Fatma , I saw your replies to other posts, so thought to ask you....would you like to help me on this...!!!
โ07-15-2024 09:42 AM - edited โ07-15-2024 10:05 AM
Hi @Devsql,
_RAW Tables:
_RAW
tables represent the raw, unprocessed data ingested into your system. They typically contain the original data as it arrives, without any transformations or modifications._RAW
tables are useful for auditing, lineage tracking, and maintaining a historical record of the raw data._APPEND_RAW Tables:
_APPEND_RAW
tables also store raw data, but they allow for both inserts and updates. In addition to new records, they can capture changes to existing records (updates)._RAW
tables, _APPEND_RAW
tables can handle both insert and update operations._APPEND_RAW
tables are a better fit.Why Separate Tables?
_RAW
and _APPEND_RAW
tables depends on your use case and architecture._RAW
tables._APPEND_RAW
tables.Both types of tables are created via the Delta Live Tables pipeline, making them part of the Delta Lake ecosystem.
Feel free to ask if you need further clarification or have additional questions! ๐
โ07-16-2024 07:01 AM
Thank you very much @Kaniz_Fatma for this excellent answer.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group