Databricks Community

mattstyl-ff · ‎09-12-2025

Hello,

I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in those posts, the error was related to the destination table already being registered as managed.

In my case, it appears that the error is related to the event log table associated with the AutoLoader. I tried re-creating the pipeline but it didn't help. Any idea how to resolve this?

Error:

AnalysisException: Traceback (most recent call last):
File "/Users/name.surname@domain.se/.bundle/Testproject_2/dev/files/src/notebook", cell 4, line 11
      2 csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/dummy.csv"
      3 schema_location = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/_schema8/"
      4 df = (
      5     session.readStream
      6     .format("cloudFiles")
      7     .option("cloudFiles.format", "csv")
      8     .option("header", "true")
      9     .option("inferSchema", "true")
     10     .option("cloudFiles.schemaLocation", schema_location)
---> 11     .load(csv_file_path)
     12 )

AnalysisException: [RequestId=3ef8b745-48dc-4ae1-b2f6-9afaaf442c3b ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://unity-catalog-storage@devdomaindatalakesc01.dfs.core.windows.net/dev-data-domain/__unitystorage/catalogs/cf3123b2-b661-48d9-9baa-a0b0214d5a29/tables/3775a194-3db0-48a6-8c0e-cce43c26c9e7/_dlt_metadata/_autoloader' overlaps with managed storage within 'CheckPathAccess' call. .

Relevant code:

from pyspark.sql.functions import *
csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/dummy.csv"
schema_location = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/_schema8/"
df = (
    session.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .option("cloudFiles.schemaLocation", schema_location)
    .load(csv_file_path)
)

checkpoint_path = "/Volumes/dev-data-domain/bronze/test/_checkpoint5"  

query = (
    df.writeStream
    .format("delta")
    .option("checkpointLocation", checkpoint_path)
    .outputMode("append")
    .trigger(once=True)
    .toTable("`dev-data-domain`.bronze.delta_table_pipeline3")
)

Khaja_Zaffer · ‎09-14-2025

Hello @mattstyl-ff

As you can see the error :

ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP]

Databricks automatically manages the storage location under the UC catalog’s storage root.

either you don’t need to (and shouldn’t) set schemaLocation or checkpointLocation.

or

you must explicitly set them to an external ADLS path (outside UC) like below:

schema_location = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/autoloader/schema/testproject"
checkpoint_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/autoloader/checkpoints/testproject"

df = (
session.readStream
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("header", "true")
.option("inferSchema", "true")
.option("cloudFiles.schemaLocation", schema_location)
.load(csv_file_path)
)

query = (
df.writeStream
.format("delta")
.option("checkpointLocation", checkpoint_path)
.outputMode("append")
.trigger(once=True)
.toTable("`dev-data-domain`.bronze.delta_table_pipeline3")
)

try to update the code and Clean Up Existing Artifacts.

I hope this will help you.

mattstyl-ff · ‎09-15-2025

I tried removing the paths completely, but I still get the same error.

I also ensured that both the checkpoint and the schema path are on an external storage and set them explicitly, but I still get the same error. I have tested reading from the same path without AutoLoader, and that works without any issue.

The following example with the same container name and storage account name works:

df = spark.read.format("csv").option("header", "true").load(f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/")

Khaja_Zaffer · ‎09-15-2025

Hello @mattstyl-ff

Before doing this: try test by dropping the table, delete pysical files as well also,

Clean Any Custom/Residual Paths

paths are :

abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/_schema8/

/Volumes/dev-data-domain/bronze/test/_checkpoint5

please also monitor the event logs.

Lets dont need to set schemaLocation or checkpointLocation. As DLT automatically manages both under its _dlt_metadata directory.

df = (
session.readStream
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("header", "true")
.option("inferSchema", "true")
.load(csv_file_path)
)

query = (
df.writeStream
.format("delta")
.outputMode("append")
.trigger(once=True)
.toTable("`dev-data-domain`.bronze.delta_table_pipeline3")
)

Khaja_Zaffer · ‎09-15-2025

I am open to solution from other contributors on this.

mattstyl-ff · ‎09-15-2025

There is no table created yet. I tried deleting the pipeline and creating a new one, with new file names, it still fails.

I noticed that the same error happens if I try to read from the event log location, using spark.read().

Example:

path = "abfss://unity-catalog-storage@devdmdatalakesc01.dfs.core.windows.net/dev-data-dm/__unitystorage/catalogs/cf3123b2-b661-48d9-9baa-a0b0214d5a29/tables/3775a194-3db0-48a6-8c0e-cce43c26c9e7/part-00000-00805a51-0fde-44e7-bdea-c6125cec5796-c000.snappy.parquet"
spark.read.format("parquet").load(path).display()

This gives me the same exact LOCATION OVERLAP error as the one in the original post above.

Khaja_Zaffer · ‎09-15-2025

If you are available we can join a call after an hour @mattstyl-ff

saurabh18cs · ‎09-18-2025

Hi,

input path you provided to .load() overlaps with a path that is managed by Unity Catalog or Delta Live Tables (DLT). This is not allowed because Databricks prevents you from using Autoloader (cloudFiles) to read from or write to directories that are managed by Unity Catalog or DLT, to avoid data corruption or conflicts.

Is this your case? maybe input path is used by another managed catalog/schema?

saurabh18cs · ‎09-18-2025

also try doing this :

csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/raw_data/dummy.csv"

add another folder for file

Databricks Community

Error with AutoLoader pipeline ingesting from external location: LOCATION_OVERLAP

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐