cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Auto Loader on UC Volumes stopped resolving wildcards

yuta666
New Contributor

The following spark.readStream / cloudFiles configuration was confirmed working on
2026-04-30, but stopped working on 2026-05-26. No code or config changes were made
between these dates, so I assume something was changed implicitly on the Databricks side.

Environment

  • Job Compute Serverless v5 (reproduced on v4 as well)
  • Source: UC Volume, parquet
  • Mode: directory listing (`useNotifications=false`)

Code

load_path = "/Volumes/<catalog>/<schema>/<volume>/<id>/<sub>/<log_source>/<version>/userId=*/groupId=*/day={20260210,20260211}/"


df = (
    spark.readStream.format("cloudFiles")
    .schema(_source_schema)
    .option("cloudFiles.format", "parquet")
    .option("cloudFiles.useNotifications", "false")
    .option("cloudFiles.includeExistingFiles", "true")
    .option("pathGlobFilter", "*.gz.parquet")
    .option("cloudFiles.schemaLocation", checkpoint_base_path)
    .option("ignoreCorruptFiles", "true")
    .option("ignoreMissingFiles", "true")
    .option("badRecordsPath", bad_records_path)
    .load(load_path)
)

 

Ask

If anyone has experienced the same issue, please let me know - and how you addressed it.

Auto Loader does work when the path is fully literal (no wildcards, no brace expansion).

1 ACCEPTED SOLUTION

Accepted Solutions

saravjeet
Databricks Partner

We are facing a similar issue, not limited to Autoloader but also affecting DLT pipelines and classic ETL job. The behavior is intermittent, jobs run fine and then fail unexpectedly, though they typically succeed on retry if retries are enabled. We tested with both absolute and relative paths, but the issue persists regardless.

I escalated this to our Databricks contact, and the suggested solutions are:

  • Switch the channel from "Preview" to "Current" in the Databricks configuration, or
  • Raise a support ticket with Microsoft or Databricks (depending on your hyperscaler), providing your workspace ID.

In our case, the issue appears to be specific to the West Europe region, so it seems Databricks is already aware of it and a fix is expected in an upcoming release.

View solution in original post

2 REPLIES 2

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @yuta666,

Thanks for sharing the details. Since the same cloudFiles configuration worked for you previously, and you did not make any code or config changes between 2026-04-30 and 2026-05-26, this does look like a likely regression rather than expected behaviour.

The fact that Auto Loader still works when the path is fully literal, but stops working when the source path uses wildcard/brace expansion, points in the same direction.

As a temporary workaround, if you need to keep moving, you could switch to fully literal paths for now. But I would not position that as the real fix here.

I would recommend opening a Databricks support ticket so the team can investigate this as a possible regression. It would help to include:

  • the exact load_path
  • whether this reproduces consistently
  • workspace/cloud/region
  • Job Compute Serverless version
  • a run URL or query ID if available
  • the first failing date and the last known-good date

If you do raise a ticket, please feel free to share the case number here as well. That may help others who run into the same issue. I can also use it to escalate internally.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

saravjeet
Databricks Partner

We are facing a similar issue, not limited to Autoloader but also affecting DLT pipelines and classic ETL job. The behavior is intermittent, jobs run fine and then fail unexpectedly, though they typically succeed on retry if retries are enabled. We tested with both absolute and relative paths, but the issue persists regardless.

I escalated this to our Databricks contact, and the suggested solutions are:

  • Switch the channel from "Preview" to "Current" in the Databricks configuration, or
  • Raise a support ticket with Microsoft or Databricks (depending on your hyperscaler), providing your workspace ID.

In our case, the issue appears to be specific to the West Europe region, so it seems Databricks is already aware of it and a fix is expected in an upcoming release.