6 hours ago
Hi!
I'm trying to create an ETL pipeline. It reads data from a UC volume, however, Databricks is not allowing me to do so. The following error is generated:
AnalysisException: [RequestId=a11e017b-61db-4c30-a03a-d7cce55e5aea ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 's3://dbstorage-prod-6ubki/uc/670643ac-88ac-4f51-8bb0-2311c001fab6/6b491f6f-d67e-44fe-9e04-bad30ec7a8cc/__unitystorage/catalogs/5f4192b5-79f2-415f-bfe8-729b201e40b9/tables/ea03463f-90af-4941-b2a6-47782054b3c9/_dlt_metadata/_autoloader' overlaps with managed storage within 'CheckPathAccess' call. .
Is it not possible to read directly from a volume using Auto Loader? Should the raw files be read from an external location only? Please guide.
4 hours ago
You can absolutely use Auto Loader with files from volume. The issue is a path conflict in your case. Managed areas of a table or volume are not to be touched to ensure data integrity and security governed by UC.
schema = "id INT"
df = (spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.schemaLocation", "/Volumes/workspace/default/sys/schema")
.load("/Volumes/workspace/dev/input/") # UC Volume Path
.writeStream
.format("delta")
.option("checkpointLocation", "/Volumes/workspace/default/sys/checkpoint")
.option("mergeSchema", "true")
.trigger(availableNow=True)
.toTable("uc.default.json_files"))4 hours ago
Hi @AanchalSoni .
This is a well-known Unity Catalog constraint. Let me explain in detail.
The error INVALID_PARAMETER_VALUE.LOCATION_OVERLAP is thrown because Auto Loader's checkpoint/schema location overlaps with UC-managed storage. Specifically:
UC Volumes are backed by managed S3 paths under Databricks' internal storage (dbstorage-prod-*/uc/.../).
Auto Loader writes its _dlt_metadata/_autoloader checkpoint directory into that same managed path space.
UC's CheckPathAccess guard explicitly blocks any process from writing into managed storage paths it doesn't own — including Auto Loader's internal bookkeeping.
This is not a permissions issue you can grant your way out of. It's a hard architectural constraint in Unity Catalog.
The Fix: Separate the Checkpoint Location
You don't need to move your source files to an external location. You just need to point the checkpoint and schema location somewhere outside UC-managed storage.
Option 1 — External Location (Recommended for Production)
df = (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "parquet") # or json, csv, etc.
.option("cloudFiles.schemaLocation", "s3://your-external-bucket/checkpoints/schema/pipeline_x")
.load("/Volumes/your_catalog/<schema>/<volume>/raw/") # UC Volume path — fine here
.writeStream
.option("checkpointLocation", "s3://your-external-bucket/checkpoints/pipeline_x")
.table("your_catalog.<schema>.target_table")
)The external bucket must be registered as a UC External Location with CREATE EXTERNAL LOCATION and appropriate storage credentials.
Option 2 — Use DLT (Cleanest for UC)
DLT manages its own checkpoint state completely outside your control path, so you never hit this conflict:
import dlt
@dlt.table
def bronze_raw():
return (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option("cloudFiles.schemaLocation",
"/Volumes/your_catalog/<schema>/<volume>/autoloader_schema/")
.load("/Volumes/your_catalog/<schema>/<volume>/raw/")
)
Note that with DLT, the schemaLocation can live inside the Volume (it's only the checkpoint that conflicts, not the schema inference directory in all cases — though keeping it external is cleaner).
Summary Recommendation
Your source files staying in the UC Volume is perfectly fine and correct. The only change needed is routing your checkpointLocation and schemaLocation to a registered UC External Location on S3. If this pipeline is already in a DLT context (given your medallion setup in your catalog), the DLT option is the cleanest path with zero checkpoint management overhead.
2 hours ago
Thanks @BalaS @lingareddy_Alva for your quick responses.
I've updated the schema location to:
2 hours ago
Hi ,
You need to set schemaLocation in following way (don't ommit cloudFiles prefix)
.option("cloudFiles.schemaLocation", "<path-to-schema>")
an hour ago
Hello @szymon_dybczak ,
That's the root cause right there — Databricks Free Edition.
Even with corrected schemaLocation and checkpointLocation paths, the Free Edition has a fundamental constraint:
So no matter where inside a Volume you point your checkpoint, it still lands in UC-managed storage, and the CheckPathAccess guard fires.
Only the checkpointLocation needs to go to DBFS on Free Edition. schemaLocation can stay in your Volume.
df = (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("cloudFiles.schemaLocation", "/Volumes/workspace/capstone/schema/") # Volume is fine
.load("/Volumes/workspace/capstone/raw/")
.writeStream
.option("checkpointLocation", "dbfs:/tmp/checkpoints/capstone") # DBFS needed
.toTable("workspace.capstone.target_table")
)