Greetings @SaugatMukherjee , I did some research and this is what I found.
Youโre running into a real (and documented) Databricks limitation here: managed Iceberg tables cannot be used as a streaming source today. Thatโs true even though upstream Apache Iceberg documents a Spark Structured Streaming read API.
Letโs unpack whatโs going on and why your code behaves the way it does.
Letโs dig inโฆ
What the docs actually say
Upstream Apache Iceberg documentation shows support for Spark Structured Streaming reads using:
spark.readStream.format(โicebergโ)
This includes options like stream-from-timestamp, along with guardrails such as skipping overwrite or delete snapshots. In open-source Spark + Iceberg, that story is real.
However, Databricksโ own Managed Iceberg table limitations documentation is explicit about the gap:
Iceberg does not support Change Data Feed (CDF). As a result, incremental processing is not supported when reading managed Iceberg tables as a source for materialized views and streaming tables.
That single sentence explains the behavior youโre seeing.
Even though:
โฆthe absence of CDF means Databricks does not allow managed Iceberg tables to participate in incremental or streaming reads as a source.
Why you see โdata source iceberg does not support streamed readingโ
On Databricks, the Iceberg connector shipped with the runtime does not expose a streaming source implementation.
Databricksโ streaming and incremental features lean heavily on Change Data Feed as the underlying mechanism. Because OSS Iceberg doesnโt have CDF today, Databricks intentionally disables streaming reads from managed Iceberg tables.
Operationally, that shows up exactly as the error you hit:
โdata source iceberg does not support streamed readingโ
Even though the options you set (stream-from-timestamp, streaming-skip-overwrite-snapshots, etc.) are valid in upstream Iceberg, Databricks will reject the read regardless. This is a platform constraint, not a configuration issue.
What works vs. what doesnโt on Databricks today
Hereโs the clean line in the sand:
-
Streaming reads from managed Iceberg tables as a source are not supported.
-
Incremental features like streaming tables, materialized views, and similar services cannot read managed Iceberg input today.
-
Upstream Iceberg examples showing streaming reads and writes do not apply to Databricks-managed Iceberg yet.
The sink side (writing to Iceberg) is not your problem. The read side is.
Practical options and workarounds
If you need true streaming or CDC-style incremental processing today, the supported path is:
That pattern is battle-tested and fully supported on Databricks.
If your source must remain Iceberg, the viable alternative is a scheduled batch micro-batch pattern:
-
Read the Iceberg table in batch:
spark.read.table(โcatalog.schema.tableโ)
-
Transform and append to your destination (Iceberg or Delta) using batch writes
-
Persist a watermark (timestamp or snapshot ID) in a control table
-
Filter โnewโ data on each run
This gives you near-real-time behavior via frequent Jobs runs, but it is still batch, not streaming. There is no Databricks-managed Iceberg streaming source to cite today.
If your downstream targets include materialized views, streaming tables, Lakehouse Monitoring, or other incremental services, the practical guidance is to land or replicate into Delta so those services can leverage CDF and row tracking.
Notes on your code specifically
The failure is happening right here:
spark.readStream.format(โicebergโ).load(source_table)
On Databricks, that call will error for managed Iceberg sources every time. Swapping options wonโt change the outcome.
Your choices are:
-
Replace this with a batch read and schedule it, or
-
Change the source to Delta with CDF if you need a true streaming pipeline
Again, your sink choice (Iceberg vs Delta) is not the issueโthe source read is.
Minimal batch example: Iceberg โ Iceberg on Databricks
Hereโs the simplest supported pattern:
from pyspark.sql import functions as F
SOURCE_TABLE = "engagement.`sandbox-client-feedback`.dummy_iceberg_source_stream"
TARGET_TABLE = "engagement.`sandbox-client-feedback`.dummy_iceberg_destination"
df = (
spark.read.table(SOURCE_TABLE)
.withColumn("processing_time", F.current_timestamp())
.withColumn("processing_date", F.current_date())
)
(df.write
.format("iceberg")
.mode("append")
.saveAsTable(TARGET_TABLE))
Schedule this with Jobs every N minutes, and persist a โlast processedโ watermark to avoid reprocessing. Itโs batch, but itโs the closest operational equivalent available today.
Bottom line
If you need streaming today, Delta + CDF is the supported answer on Databricks.
If you need to stay on Iceberg, the path forward is scheduled batch with explicit state managementโuntil Databricks enables an Iceberg-compatible incremental source.
Hope this provides some guidance.
Regars, Louis.