Hi @AanchalSoni .
This is a well-known Unity Catalog constraint. Let me explain in detail.
The error INVALID_PARAMETER_VALUE.LOCATION_OVERLAP is thrown because Auto Loader's checkpoint/schema location overlaps with UC-managed storage. Specifically:
UC Volumes are backed by managed S3 paths under Databricks' internal storage (dbstorage-prod-*/uc/.../).
Auto Loader writes its _dlt_metadata/_autoloader checkpoint directory into that same managed path space.
UC's CheckPathAccess guard explicitly blocks any process from writing into managed storage paths it doesn't own โ including Auto Loader's internal bookkeeping.
This is not a permissions issue you can grant your way out of. It's a hard architectural constraint in Unity Catalog.
The Fix: Separate the Checkpoint Location
You don't need to move your source files to an external location. You just need to point the checkpoint and schema location somewhere outside UC-managed storage.
Option 1 โ External Location (Recommended for Production)
df = (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "parquet") # or json, csv, etc.
.option("cloudFiles.schemaLocation", "s3://your-external-bucket/checkpoints/schema/pipeline_x")
.load("/Volumes/your_catalog/<schema>/<volume>/raw/") # UC Volume path โ fine here
.writeStream
.option("checkpointLocation", "s3://your-external-bucket/checkpoints/pipeline_x")
.table("your_catalog.<schema>.target_table")
)
The external bucket must be registered as a UC External Location with CREATE EXTERNAL LOCATION and appropriate storage credentials.
Option 2 โ Use DLT (Cleanest for UC)
DLT manages its own checkpoint state completely outside your control path, so you never hit this conflict:
import dlt
@dlt.table
def bronze_raw():
return (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option("cloudFiles.schemaLocation",
"/Volumes/your_catalog/<schema>/<volume>/autoloader_schema/")
.load("/Volumes/your_catalog/<schema>/<volume>/raw/")
)
Note that with DLT, the schemaLocation can live inside the Volume (it's only the checkpoint that conflicts, not the schema inference directory in all cases โ though keeping it external is cleaner).
Summary Recommendation
Your source files staying in the UC Volume is perfectly fine and correct. The only change needed is routing your checkpointLocation and schemaLocation to a registered UC External Location on S3. If this pipeline is already in a DLT context (given your medallion setup in your catalog), the DLT option is the cleanest path with zero checkpoint management overhead.
LR