topic Re: Unable to read files using Auto Loader in Data Engineering

Unable to read files using Auto Loader

AanchalSoni — Tue, 07 Apr 2026 12:14:08 GMT

Hi!

I'm trying to create an ETL pipeline. It reads data from a UC volume, however, Databricks is not allowing me to do so. The following error is generated:

AnalysisException: [RequestId=a11e017b-61db-4c30-a03a-d7cce55e5aea ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 's3://dbstorage-prod-6ubki/uc/670643ac-88ac-4f51-8bb0-2311c001fab6/6b491f6f-d67e-44fe-9e04-bad30ec7a8cc/__unitystorage/catalogs/5f4192b5-79f2-415f-bfe8-729b201e40b9/tables/ea03463f-90af-4941-b2a6-47782054b3c9/_dlt_metadata/_autoloader' overlaps with managed storage within 'CheckPathAccess' call. .

Is it not possible to read directly from a volume using Auto Loader? Should the raw files be read from an external location only? Please guide.

Re: Unable to read files using Auto Loader

balajij8 — Tue, 07 Apr 2026 14:48:08 GMT

You can absolutely use Auto Loader with files from volume. The issue is a path conflict in your case. Managed areas of a table or volume are not to be touched to ensure data integrity and security governed by UC.

You can use the Unity Catalog Volume path in the Auto Loader. Here is the Auto Loader implementation using the recommended Volume path. This ensures the conflicts are avoided.

# Defined Schema (Ensure this matches your JSON structure)

schema = "id INT"

df = (spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("cloudFiles.schemaLocation", "/Volumes/workspace/default/sys/schema")
    .load("/Volumes/workspace/dev/input/") # UC Volume Path
    .writeStream
    .format("delta")
    .option("checkpointLocation", "/Volumes/workspace/default/sys/checkpoint")
    .option("mergeSchema", "true")
    .trigger(availableNow=True)
    .toTable("uc.default.json_files"))

Re: Unable to read files using Auto Loader

lingareddy_Alva — Tue, 07 Apr 2026 15:03:02 GMT

Hi @AanchalSoni .

This is a well-known Unity Catalog constraint. Let me explain in detail.

The error INVALID_PARAMETER_VALUE.LOCATION_OVERLAP is thrown because Auto Loader's checkpoint/schema location overlaps with UC-managed storage. Specifically:
UC Volumes are backed by managed S3 paths under Databricks' internal storage (dbstorage-prod-*/uc/.../).
Auto Loader writes its _dlt_metadata/_autoloader checkpoint directory into that same managed path space.
UC's CheckPathAccess guard explicitly blocks any process from writing into managed storage paths it doesn't own — including Auto Loader's internal bookkeeping.

This is not a permissions issue you can grant your way out of. It's a hard architectural constraint in Unity Catalog.

The Fix: Separate the Checkpoint Location
You don't need to move your source files to an external location. You just need to point the checkpoint and schema location somewhere outside UC-managed storage.

Option 1 — External Location (Recommended for Production)

df = ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", "parquet") # or json, csv, etc. .option("cloudFiles.schemaLocation", "s3://your-external-bucket/checkpoints/schema/pipeline_x") .load("/Volumes/your_catalog/<schema>/<volume>/raw/") # UC Volume path — fine here .writeStream .option("checkpointLocation", "s3://your-external-bucket/checkpoints/pipeline_x") .table("your_catalog.<schema>.target_table") )

The external bucket must be registered as a UC External Location with CREATE EXTERNAL LOCATION and appropriate storage credentials.

Option 2 — Use DLT (Cleanest for UC)
DLT manages its own checkpoint state completely outside your control path, so you never hit this conflict:

import dlt @dlt.table def bronze_raw(): return ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", "parquet") .option("cloudFiles.schemaLocation", "/Volumes/your_catalog/<schema>/<volume>/autoloader_schema/") .load("/Volumes/your_catalog/<schema>/<volume>/raw/") )

Note that with DLT, the schemaLocation can live inside the Volume (it's only the checkpoint that conflicts, not the schema inference directory in all cases — though keeping it external is cleaner).

Summary Recommendation
Your source files staying in the UC Volume is perfectly fine and correct. The only change needed is routing your checkpointLocation and schemaLocation to a registered UC External Location on S3. If this pipeline is already in a DLT context (given your medallion setup in your catalog), the DLT option is the cleanest path with zero checkpoint management overhead.

Re: Unable to read files using Auto Loader

AanchalSoni — Tue, 07 Apr 2026 16:08:58 GMT

Thanks @BalaS @lingareddy_Alva for your quick responses.

I've updated the schema location to:

option("schemaLocation", "/Volumes/workspace/capstone/schema")

and checkpoint location to:

/Volumes/workspace/capstone/checkpoint/1/

however, I'm still getting the same error. I'm using Databricks free version to develop a test pipeline.

Re: Unable to read files using Auto Loader

szymon_dybczak — Tue, 07 Apr 2026 16:37:36 GMT

Hi ,

You need to set schemaLocation in following way (don't ommit cloudFiles prefix)

.option("cloudFiles.schemaLocation", "<path-to-schema>")

Re: Unable to read files using Auto Loader

lingareddy_Alva — Tue, 07 Apr 2026 17:05:34 GMT

Hello @szymon_dybczak ,

That's the root cause right there — Databricks Free Edition.
Even with corrected schemaLocation and checkpointLocation paths, the Free Edition has a fundamental constraint:
So no matter where inside a Volume you point your checkpoint, it still lands in UC-managed storage, and the CheckPathAccess guard fires.

Only the checkpointLocation needs to go to DBFS on Free Edition. schemaLocation can stay in your Volume.

df = ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", "csv") .option("cloudFiles.schemaLocation", "/Volumes/workspace/capstone/schema/") # Volume is fine .load("/Volumes/workspace/capstone/raw/") .writeStream .option("checkpointLocation", "dbfs:/tmp/checkpoints/capstone") # DBFS needed .toTable("workspace.capstone.target_table") )

Re: Unable to read files using Auto Loader

balajij8 — Tue, 07 Apr 2026 18:59:36 GMT

@AanchalSoni

You can update code to use the volumes already available or create the volumes (volume1 & volume2) and use below for auto loader on json files

df = (spark.readStream

.format("cloudFiles")

.option("cloudFiles.format", "json")

.option("cloudFiles.schemaLocation", "/Volumes/workspace/default/volume2/schema")

.load("/Volumes/workspace/default/volume1/input")

.writeStream

.format("delta")

.option("checkpointLocation", "/Volumes/workspace/default/volume2/checkpoint")

.option("mergeSchema", "true")

.trigger(availableNow=True)

.toTable("workspace.default.json_files"))