cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Auto Loader on UC Volumes stopped resolving wildcards

yuta666
Visitor

The following spark.readStream / cloudFiles configuration was confirmed working on
2026-04-30, but stopped working on 2026-05-26. No code or config changes were made
between these dates, so I assume something was changed implicitly on the Databricks side.

Environment

  • Job Compute Serverless v5 (reproduced on v4 as well)
  • Source: UC Volume, parquet
  • Mode: directory listing (`useNotifications=false`)

Code

load_path = "/Volumes/<catalog>/<schema>/<volume>/<id>/<sub>/<log_source>/<version>/userId=*/groupId=*/day={20260210,20260211}/"


df = (
    spark.readStream.format("cloudFiles")
    .schema(_source_schema)
    .option("cloudFiles.format", "parquet")
    .option("cloudFiles.useNotifications", "false")
    .option("cloudFiles.includeExistingFiles", "true")
    .option("pathGlobFilter", "*.gz.parquet")
    .option("cloudFiles.schemaLocation", checkpoint_base_path)
    .option("ignoreCorruptFiles", "true")
    .option("ignoreMissingFiles", "true")
    .option("badRecordsPath", bad_records_path)
    .load(load_path)
)

 

Ask

If anyone has experienced the same issue, please let me know - and how you addressed it.

Auto Loader does work when the path is fully literal (no wildcards, no brace expansion).

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @yuta666,

Thanks for sharing the details. Since the same cloudFiles configuration worked for you previously, and you did not make any code or config changes between 2026-04-30 and 2026-05-26, this does look like a likely regression rather than expected behaviour.

The fact that Auto Loader still works when the path is fully literal, but stops working when the source path uses wildcard/brace expansion, points in the same direction.

As a temporary workaround, if you need to keep moving, you could switch to fully literal paths for now. But I would not position that as the real fix here.

I would recommend opening a Databricks support ticket so the team can investigate this as a possible regression. It would help to include:

  • the exact load_path
  • whether this reproduces consistently
  • workspace/cloud/region
  • Job Compute Serverless version
  • a run URL or query ID if available
  • the first failing date and the last known-good date

If you do raise a ticket, please feel free to share the case number here as well. That may help others who run into the same issue. I can also use it to escalate internally.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***