02-18-2026 05:00 AM
We have two Databricks workspaces (staging and production) attached to the same Unity Catalog metastore. Both workspaces run DLT pipelines that use Auto Loader with cloudFiles.useManagedFileEvents = "true" to ingest from the same
external location (same S3 path).
Each pipeline has its own separate checkpoint location.
The documentation states that managed file events uses "a single file notification queue for all streams that process files from a given external location" and that streams discover new files by "reading directly from cache using
stored read position."
Our concern: If the staging pipeline runs first and reads new files from the file events cache, will the production pipeline still see those same files when it runs later? Or does one pipeline's read advance a shared cursor that
causes the other to miss files?
Specifically, we'd like to clarify:
1. Is the "stored read position" scoped per pipeline/stream (each pipeline independently tracks its own position in the cache) or is it shared across all consumers of the external location?
2. Is the file events cache designed to support multiple independent consumers reading the same file events without interference — similar to how Kafka consumer groups each maintain their own offset?
Our current understanding is that each pipeline maintains its own read position via its checkpoint, making this safe. But we couldn't find explicit documentation confirming this for cross-workspace, same-metastore scenarios.
Any clarification would be appreciated. Thanks!
02-26-2026 02:01 PM
Hi @raimundovidal,
You’re safe to run both staging and production Lakeflow Spark Declarative Pipelines with cloudFiles.useManagedFileEvents = "true" against the same external location (same S3 path) and same Unity Catalog metastore, as long as each pipeline uses its own checkpoint location.
A few key points:
Putting that together...
The stored read position is scoped per stream, not shared across all consumers of the external location. Each Auto Loader stream (including each DLT pipeline) keeps its own position in the file events cache inside its own checkpoint. That position is obtained during the initial run and then reused on subsequent runs of that specific stream.
So in your scenario:
2. Can multiple independent consumers read the same file events without interfering (Kafka‑like semantics)?
Yes. That’s the intent.
A few implementation details that are relevant to your concerns:
This design is explicitly used and supported when:
So from a correctness perspective, having both a staging and production pipeline consume from the same external location and path is supported and will not cause one to steal events from the other.
Caveats to be aware of (These don’t change the “no shared cursor” answer, but are worth calling out:)
Direct answers to your questions
Your current understanding - “each pipeline maintains its own read position via its checkpoint, making this safe even across workspaces” - is correct.
Hope this helps!
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
Regards,
02-26-2026 02:01 PM
Hi @raimundovidal,
You’re safe to run both staging and production Lakeflow Spark Declarative Pipelines with cloudFiles.useManagedFileEvents = "true" against the same external location (same S3 path) and same Unity Catalog metastore, as long as each pipeline uses its own checkpoint location.
A few key points:
Putting that together...
The stored read position is scoped per stream, not shared across all consumers of the external location. Each Auto Loader stream (including each DLT pipeline) keeps its own position in the file events cache inside its own checkpoint. That position is obtained during the initial run and then reused on subsequent runs of that specific stream.
So in your scenario:
2. Can multiple independent consumers read the same file events without interfering (Kafka‑like semantics)?
Yes. That’s the intent.
A few implementation details that are relevant to your concerns:
This design is explicitly used and supported when:
So from a correctness perspective, having both a staging and production pipeline consume from the same external location and path is supported and will not cause one to steal events from the other.
Caveats to be aware of (These don’t change the “no shared cursor” answer, but are worth calling out:)
Direct answers to your questions
Your current understanding - “each pipeline maintains its own read position via its checkpoint, making this safe even across workspaces” - is correct.
Hope this helps!
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
Regards,