cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What is difference between _RAW tables and _APPEND_RAW tables of Bronze-Layer of Azure Databricks

Devsql
New Contributor III

Hi Team,

I would like to know difference between _RAW tables and _APPEND_RAW tables of Bronze-Layer.

As both are STREAMING Tables then why we need 2 separate tables.

Note: we are following Medalion Architecture. Also above tables are created via Delta Live Tables pipeline. so they are basically DLT Tables.

Thanks
Devsql

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @Devsql

  1. _RAW Tables:

    • _RAW tables represent the raw, unprocessed data ingested into your system. They typically contain the original data as it arrives, without any transformations or modifications.
    • These tables are append-only, meaning that new records are continuously added to them as data streams in. Updates or deletions are not allowed.
    • _RAW tables are useful for auditing, lineage tracking, and maintaining a historical record of the raw data.
    • In the context of Delta Lake, you can enforce the append-only behavior by setting the delta.appendOnly property on these tables.
  2. _APPEND_RAW Tables:

    • _APPEND_RAW tables also store raw data, but they allow for both inserts and updates. In addition to new records, they can capture changes to existing records (updates).
    • These tables are suitable when you need to track changes over time, such as capturing incremental updates from streaming sources.
    • Unlike _RAW tables, _APPEND_RAW tables can handle both insert and update operations.
    • In scenarios where you want to maintain a historical record of changes, _APPEND_RAW tables are a better fit.
  3. Why Separate Tables?

    • The decision to use separate _RAW and _APPEND_RAW tables depends on your use case and architecture.
    • If you only need the raw data without any modifications, stick with _RAW tables.
    • If you want to capture changes over time (including updates), use _APPEND_RAW tables.
    • Separating them allows you to manage data differently based on your specific requirements.

Both types of tables are created via the Delta Live Tables pipeline, making them part of the Delta Lake ecosystem. 

Feel free to ask if you need further clarification or have additional questions! ๐Ÿ˜Š

View solution in original post

5 REPLIES 5

Witold
Contributor III

I don't exactly understand your question, so let me try to give you a generic answer. You don't need to do anything, if you're fine with working with one table, then just go with one.

An append-only table, as the name suggests, will only contain insert operations. By using the table property "delta.appendOnly" you can also force it.

And "a not-only append" table, obviously, might contain next to inserts, also updates and deletes.

In general, there's no right or wrong answer, as it highly depends on your use case/architecture.

Devsql
New Contributor III

Hi @Witold , I updated last line of my above post, added below lines: Also above tables are created via Delta Live Tables pipeline. so they are basically DLT Tables. Hope this gives you proper idea.

Devsql
New Contributor III

Hi @Kaniz_Fatma , I saw your replies to other posts, so thought to ask you....would you like to help me on this...!!!

Hi @Devsql

  1. _RAW Tables:

    • _RAW tables represent the raw, unprocessed data ingested into your system. They typically contain the original data as it arrives, without any transformations or modifications.
    • These tables are append-only, meaning that new records are continuously added to them as data streams in. Updates or deletions are not allowed.
    • _RAW tables are useful for auditing, lineage tracking, and maintaining a historical record of the raw data.
    • In the context of Delta Lake, you can enforce the append-only behavior by setting the delta.appendOnly property on these tables.
  2. _APPEND_RAW Tables:

    • _APPEND_RAW tables also store raw data, but they allow for both inserts and updates. In addition to new records, they can capture changes to existing records (updates).
    • These tables are suitable when you need to track changes over time, such as capturing incremental updates from streaming sources.
    • Unlike _RAW tables, _APPEND_RAW tables can handle both insert and update operations.
    • In scenarios where you want to maintain a historical record of changes, _APPEND_RAW tables are a better fit.
  3. Why Separate Tables?

    • The decision to use separate _RAW and _APPEND_RAW tables depends on your use case and architecture.
    • If you only need the raw data without any modifications, stick with _RAW tables.
    • If you want to capture changes over time (including updates), use _APPEND_RAW tables.
    • Separating them allows you to manage data differently based on your specific requirements.

Both types of tables are created via the Delta Live Tables pipeline, making them part of the Delta Lake ecosystem. 

Feel free to ask if you need further clarification or have additional questions! ๐Ÿ˜Š

Devsql
New Contributor III

Thank you very much @Kaniz_Fatma for this excellent answer.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group