yesterday
Hi!
I'm facing an error related to Checkpoint whenever I try to display a dataframe using auto Loader in Databricks free edition. Please refer the screenshot. To combat this, I have to delete the checkpoint folder and then execute the display or writestream command. Can someone help me understand the root cause and how can I overcome this?
10 hours ago
Hi @AanchalSoni,
No problem asking questions. That's what this forum is for.
You don’t need a brand‑new checkpoint for every tiny code change, but you should treat a checkpoint as belonging to one specific logical stream configuration.
A more precise rule of thumb is that it is safe to reuse the same checkpointLocation when the query is logically the same, such as having the same input, stateful operators (agg/join/dedup), output mode, keys, and watermarks. Alternatively, it is safe when you are just restarting the cluster or rerunning the same notebook cell.
Use a new checkpointLocation (or delete the old one) when you change output mode (append ↔ complete/update) or add/remove stateful operations (aggregations, stream‑stream joins, mapGroupsWithState, dedup with watermark), or when you significantly alter the query shape in a way that affects state.
In your specific use case ("I’m just validating transformations, trying out different versions"):
A practical pattern to keep it manageable:
/Volumes/.../checkpoints/accounts/ # base
display_v1/
display_v2_with_agg/
write_to_delta_append/
write_to_delta_complete/
yesterday
Hi @AanchalSoni,
I can’t see the full history of your notebook, so I’m not sure of the exact cause. But the behaviour strongly suggests that an earlier version of the stream used complete mode against the same checkpointLocation, and that configuration is what’s causing the error now.
Your current call is display(accounts_df, output_mode="append", checkpointLocation=".../Checkpoint/")
The error, however, says Invalid streaming output mode: complete. This output mode is not supported for no streaming aggregations...
For a non‑aggregated stream in append mode, Spark wouldn’t complain about complete unless it was reading that mode from somewhere else. In Structured Streaming, the only source of this is the checkpoint metadata. The checkpoint stores the original query plan, including the output mode. When you reuse the same checkpoint path with a changed query (no agg + append), Spark detects a mismatch between the stored configuration (complete) and the new query, and throws STREAMING_OUTPUT_MODE.UNSUPPORTED_OPERATION. When you delete the checkpoint, you erase that metadata, and the stream starts clean, which is why it fixes the issue.
The recommendation is not to reuse a checkpoint path across different query shapes or output modes. Give each logical stream (and each output mode) its own checkpointLocation.
Can you confirm whether you have any other processing steps before the cell shown in the snapshot?
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
11 hours ago
Hi Ashwin!
Thanks for your response. Before the screenshot step, I'm just reading the file with explicit schema.
When you say 'The recommendation is not to reuse a checkpoint path across different query shapes or output modes. Give each logical stream (and each output mode) its own checkpointLocation.'- does this mean, that even if I'm validating the output, I should create a new checkpoint location? Wouldn't this be an overhead while working on multiple transformations?
Please bear with me, my questions come from a naive background and I'm trying to understand these showstoppers at my best.
10 hours ago
Hi @AanchalSoni,
No problem asking questions. That's what this forum is for.
You don’t need a brand‑new checkpoint for every tiny code change, but you should treat a checkpoint as belonging to one specific logical stream configuration.
A more precise rule of thumb is that it is safe to reuse the same checkpointLocation when the query is logically the same, such as having the same input, stateful operators (agg/join/dedup), output mode, keys, and watermarks. Alternatively, it is safe when you are just restarting the cluster or rerunning the same notebook cell.
Use a new checkpointLocation (or delete the old one) when you change output mode (append ↔ complete/update) or add/remove stateful operations (aggregations, stream‑stream joins, mapGroupsWithState, dedup with watermark), or when you significantly alter the query shape in a way that affects state.
In your specific use case ("I’m just validating transformations, trying out different versions"):
A practical pattern to keep it manageable:
/Volumes/.../checkpoints/accounts/ # base
display_v1/
display_v2_with_agg/
write_to_delta_append/
write_to_delta_complete/
8 hours ago
Thanks Ashwin! And yes your explanation about Checkpoints was clear. I could comprehend the relevance of Checkpoints.