- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2026 07:44 AM
Hi @mits1
Since you're using Databricks Free Edition with Serverless and reading from a Unity Catalog Volume (/Volumes/workspace/dev/input/), the issue is likely:
Volumes Directory Scan — Autoloader reads the directory, not just the file
When Autoloader scans /Volumes/workspace/dev/input/, it may be picking up additional hidden files in that directory.
Run this in your Databricks notebook:
# Check exactly what files Autoloader sees
dbutils.fs.ls("/Volumes/workspace/dev/input/")
Also check for hidden files:
%sh ls -la /Volumes/workspace/dev/input/
If Extra Files Are Found — Fix
spark.readStream \
.format("cloudFiles") \
.option("cloudFiles.format", "json") \
.option("cloudFiles.schemaLocation", "...") \
.option("pathGlobFilter", "*.json") \ # <-- ONLY pick .json files
.load('/Volumes/workspace/dev/input/')
pathGlobFilter forces Autoloader to ignore all non-JSON files in the directory, which would eliminate the null rows.
Could you run dbutils.fs.ls("/Volumes/workspace/dev/input/") and share what it returns? That should pinpoint the exact cause.