lingareddy_Alva
Esteemed Contributor

Hi @mits1 

Since you're using Databricks Free Edition with Serverless and reading from a Unity Catalog Volume (/Volumes/workspace/dev/input/), the issue is likely:
Volumes Directory Scan — Autoloader reads the directory, not just the file
When Autoloader scans /Volumes/workspace/dev/input/, it may be picking up additional hidden files in that directory.
Run this in your Databricks notebook:
# Check exactly what files Autoloader sees
dbutils.fs.ls("/Volumes/workspace/dev/input/")

Also check for hidden files:
%sh ls -la /Volumes/workspace/dev/input/

If Extra Files Are Found — Fix
spark.readStream \
.format("cloudFiles") \
.option("cloudFiles.format", "json") \
.option("cloudFiles.schemaLocation", "...") \
.option("pathGlobFilter", "*.json") \ # <-- ONLY pick .json files
.load('/Volumes/workspace/dev/input/')

pathGlobFilter forces Autoloader to ignore all non-JSON files in the directory, which would eliminate the null rows.

Could you run dbutils.fs.ls("/Volumes/workspace/dev/input/") and share what it returns? That should pinpoint the exact cause.

 

 

LR

View solution in original post