cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Iceberg native table Streaming in databricks

xwu
New Contributor II

Hi ! 

Iโ€™ve been exploring the new Managed Iceberg tables integration and noticed a potential discrepancy between the documentation and actual behavior regarding streaming/incremental workloads.

According to the official limitations, managed Iceberg tables do not currently support streaming or incremental batching. This seems particularly tied to declarative LakeFlow Streaming pipelines.

 
 

xwu_5-1773939524300.png

 
 
 
 
 

Capture dโ€™รฉcran 2026-03-19 175915.png

 

However, during my testing, I found a workaround:

  1. Pre-declare the table in Unity Catalog (UC).

  2. Execute the pipeline using pure Structured Streaming in Spark (decoupled from SQL Iceberg table creation syntax).

By following a specific syntax (similar to the one discussed in this community thread), the streaming process actually works.

My questions for the community and the Databricks team:

  • Is this "pure structure streaming" approach considered a supported workaround, or is it an unintended behavior (loophole) that might be restricted in future updates?

  • Given that the documentation explicitly lists streaming as a limitation, should we avoid this pattern for production workloads?

Thanks ! 

#Lakeflow #Declaratif pipeline #Streaming

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @xwu,

Given that managed Iceberg and many of its features are still in Public Preview and explicitly "subject to change," you should treat this as a preview or advanced usage, not as a contractually supported workaround. In other words, it is not exactly a loophole, but also not something you can rely on long-term without revalidating it with each runtime upgrade.

For production workloads, the conservative and officially documented choice remains Delta + CDF as the upstream source, where Structured Streaming + CDF is explicitly recommended for incremental processing. Managed Iceberg lacks CDF, and the documentation highlights that this is why it cannot currently be used as an incremental source for those services.

If you decide to run your "pure Structured Streaming to/from managed Iceberg" pattern in production anyway, you should pin and test specific DBR versions before upgrading, and validate behaviour across schema evolution, deletes/overwrites, and compaction. You also need to be prepared for a future runtime or documentation update that may change the official stance, semantics, or options around Iceberg streaming, which could affect your production loads if you choose a workaround. This is not something I would personally recommend. 

Hope this gives you some clarity.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @xwu,

Given that managed Iceberg and many of its features are still in Public Preview and explicitly "subject to change," you should treat this as a preview or advanced usage, not as a contractually supported workaround. In other words, it is not exactly a loophole, but also not something you can rely on long-term without revalidating it with each runtime upgrade.

For production workloads, the conservative and officially documented choice remains Delta + CDF as the upstream source, where Structured Streaming + CDF is explicitly recommended for incremental processing. Managed Iceberg lacks CDF, and the documentation highlights that this is why it cannot currently be used as an incremental source for those services.

If you decide to run your "pure Structured Streaming to/from managed Iceberg" pattern in production anyway, you should pin and test specific DBR versions before upgrading, and validate behaviour across schema evolution, deletes/overwrites, and compaction. You also need to be prepared for a future runtime or documentation update that may change the official stance, semantics, or options around Iceberg streaming, which could affect your production loads if you choose a workaround. This is not something I would personally recommend. 

Hope this gives you some clarity.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***