Re: Databricks Standard SharePoint Connector Perfo...

Yogasathyandrun · ‎06-22-2026

I think your diagnosis is likely correct.

One thing that stands out is that you’re only reading A1:Z2 from each workbook. Given that the operation is still taking 40+ minutes, the bottleneck is unlikely to be the Excel parsing itself and more likely to be file discovery.

With ~5,000 directories and a multilevel wildcard (ABC*/files/ABC*.xlsm), the connector may be spending most of its time resolving the matching paths before it ever starts reading data.

I’d also be cautious about relying on pathGlobFilter here. Even if it helps narrow file selection, the expensive part appears to be discovering the files in the first place.

As a quick validation, I’d try reading a few known paths explicitly and compare the runtime. If that drops significantly, then wildcard resolution is likely the dominant cost, and a manifest-driven or staged ingestion pattern may be a better long-term approach.

Data Engineer | Apache Spark | Delta Lake | Databricks