<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autoloader inserts null rows in delta table while reading json file in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153059#M53923</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since you're using Databricks Free Edition with Serverless and reading from a Unity Catalog Volume (/Volumes/workspace/dev/input/), the issue is likely:&lt;BR /&gt;Volumes Directory Scan — Autoloader reads the directory, not just the file&lt;BR /&gt;When Autoloader scans /Volumes/workspace/dev/input/, it may be picking up additional hidden files in that directory.&lt;BR /&gt;Run this in your Databricks notebook:&lt;BR /&gt;# Check exactly what files Autoloader sees&lt;BR /&gt;dbutils.fs.ls("/Volumes/workspace/dev/input/")&lt;/P&gt;&lt;P&gt;Also check for hidden files:&lt;BR /&gt;%sh ls -la /Volumes/workspace/dev/input/&lt;/P&gt;&lt;P&gt;If Extra Files Are Found — Fix&lt;BR /&gt;spark.readStream \&lt;BR /&gt;.format("cloudFiles") \&lt;BR /&gt;.option("cloudFiles.format", "json") \&lt;BR /&gt;.option("cloudFiles.schemaLocation", "...") \&lt;BR /&gt;.option("pathGlobFilter", "*.json") \ # &amp;lt;-- ONLY pick .json files&lt;BR /&gt;.load('/Volumes/workspace/dev/input/')&lt;/P&gt;&lt;P&gt;pathGlobFilter forces Autoloader to ignore all non-JSON files in the directory, which would eliminate the null rows.&lt;/P&gt;&lt;P&gt;Could you run dbutils.fs.ls("/Volumes/workspace/dev/input/") and share what it returns? That should pinpoint the exact cause.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 02 Apr 2026 14:44:45 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2026-04-02T14:44:45Z</dc:date>
    <item>
      <title>Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152914#M53898</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am exploring Schema inference and Schema evolution using Autoloader.I am reading a single line json file and writing in a delta table which does not exist already (creating it on the fly), using pyspark (below is the code).&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;STRONG&gt;Code :&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.readStream.\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"json"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/workspace/default/sys/schema1"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'/Volumes/workspace/dev/input/'&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .writeStream\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"delta"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"mergeSchema"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"checkpointLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/workspace/default/sys/checkpoint1"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;trigger&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;once&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;toTable&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"workspace.dev.infer_json_new1"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;Input File : &lt;/STRONG&gt;user.json contains only 2 records as mentioned below WITHOUT having blank lines.Attached is the screenshot.&lt;/DIV&gt;&lt;DIV&gt;{"Name":"Alfred","Gender":"M","Age":14}&lt;BR /&gt;{"Name":"John","Gender":"M","Age":12}&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;Target table :&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;Screenshot is attached.&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;Problem :&amp;nbsp;&lt;/STRONG&gt;Upon running the code very first time,It loads 33 null records along with actual data rows.So total 35 rows get inserted.However, after re running the code with new file,it does not inserts any null rows.&lt;/P&gt;&lt;P&gt;I dont understand WHY 33 null rows at first run only.&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 01 Apr 2026 13:54:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152914#M53898</guid>
      <dc:creator>mits1</dc:creator>
      <dc:date>2026-04-01T13:54:08Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152920#M53899</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;This looks like the same issue I covered recently. Please see &lt;A href="https://community.databricks.com/t5/data-engineering/null-rows-getting-inserted-in-delta-table-schema-mismatch/m-p/151750#M53703" target="_blank"&gt;here&lt;/A&gt;.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The issue is that Autoloader is ingesting your checkpoint files as data.&amp;nbsp;Because Checkpoint/ lives inside the data directory, Autoloader picks up those checkpoint JSONs. They don’t match your explicit schema, so all your business columns (and _metadata after cast) become NULL, and their content goes into _rescued_data.&lt;/P&gt;
&lt;P&gt;To fix this, consider&amp;nbsp;moving the checkpoint location outside the source path..&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Apr 2026 15:16:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152920#M53899</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-04-01T15:16:25Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152927#M53900</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;his is a classic Autoloader schema inference artifact. Here's exactly what's happening:&lt;BR /&gt;Why 33 Null Rows?&lt;BR /&gt;When Autoloader runs for the first time with cloudFiles.format = "json", it performs a two-phase operation:&lt;/P&gt;&lt;P&gt;Phase 1 — Schema Inference (Sampling)&lt;BR /&gt;Autoloader samples the input file(s) to infer the schema before actually reading the data. Internally, Spark reads the JSON file using a default byte-range or row-sampling mechanism. For a tiny file like yours, the sampler reads the raw bytes and creates placeholder/empty partitions — these manifest as null rows.&lt;BR /&gt;The number 33 is not random — it comes from Spark's default minimum partition count. Spark uses spark.default.parallelism or the number of cores × some multiplier to determine how many tasks to create. With a very small file, most partitions are empty byte ranges, but they still get written as null rows in the first commit.&lt;BR /&gt;Schema Inference (Sampling)&lt;BR /&gt;Autoloader samples the input file to infer the schema, creates the schema file in schemaLocation, and in doing so generates those ghost/empty partitions → 33 null rows written&lt;/P&gt;&lt;P&gt;Phase 2 — Actual Data Read (Stream Processing)&lt;BR /&gt;Autoloader now uses the inferred schema from Phase 1 to actually read and process the data as a stream → 2 real rows written (Alfred &amp;amp; John)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The 33 null rows are Spark's empty partition artifacts from the schema inference sampling pass on first run. Once the schema is saved to schemaLocation, subsequent runs skip inference entirely, which is why you only see it once. The cleanest long-term fix is to provide the schema explicitly.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;The Fix
Option 1 — Pre-define the schema (recommended for production)

from pyspark.sql.types import StructType, StructField, StringType, IntegerType

schema = StructType([
    StructField("Name", StringType()),
    StructField("Gender", StringType()),
    StructField("Age", IntegerType())
])

spark.readStream 
    .format("cloudFiles") 
    .option("cloudFiles.format", "json") 
    .option("cloudFiles.schemaLocation", "/Volumes/workspace/default/sys/schema1") 
    .schema(schema) # explicitly provide schema
    .load('/Volumes/workspace/dev/input/') 
    .writeStream 
    ...

Option 2 — Use cloudFiles.inferColumnTypes
.option("cloudFiles.inferColumnTypes", "true")
This makes inference more precise and avoids the ghost partition issue.

Option 3 — Filter nulls at write time (quick workaround)
.load('/Volumes/workspace/dev/input/') \
.filter("Name IS NOT NULL") \   # &amp;lt;-- drop null rows
.writeStream \&lt;/LI-CODE&gt;</description>
      <pubDate>Wed, 01 Apr 2026 15:58:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152927#M53900</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2026-04-01T15:58:42Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152949#M53903</link>
      <description>&lt;P&gt;Hi Ashwin_DSA,&lt;/P&gt;&lt;P&gt;Thank you for your response.&lt;/P&gt;&lt;P&gt;As you can see that input and checkpoint locations are different.So this could not be the reason.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Apr 2026 18:01:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152949#M53903</guid>
      <dc:creator>mits1</dc:creator>
      <dc:date>2026-04-01T18:01:53Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152952#M53904</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Sorry. I jumped to conclusions based on the post header and its relation to the other one. However, I don't think &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;'s reason is accurate either. This is because there is a distinction between schema inference and the actual stream processing. During schema inference, Auto Loader samples up to 50 GB or 1,000 files and generates a schema JSON file, which is stored in _schemas under cloudFiles.schemaLocation. In the actual stream processing phase, it uses the inferred schema to read files and write to your Delta table. During schema inference, Auto Loader does not write any data rows to your target table. It only inspects files and saves the inferred schema. Nothing about that phase creates "placeholder partitions" that get materialized as NULL rows. Empty partitions in Spark simply produce zero output rows. They don’t generate rows full of NULLs.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;In contrast, Spark's JSON reader, including Auto Loader, operates in permissive mode, treating each line as a single JSON record. In this mode, any malformed, blank, or non-JSON lines result in records with actual columns set to null. Also, the raw text from these lines is stored in the _corrupt_record or rescued-data column. I'm guessing these are the rows you’re seeing.&lt;/P&gt;
&lt;P&gt;So the more likely explanation is that on the first run, Auto Loader processes all pre‑existing files in /Volumes/workspace/dev/input/. Some of those lines/files are empty, whitespace‑only, or otherwise invalid JSON --&amp;gt; 33 NULL rows.&amp;nbsp;Those files are now marked processed in the checkpoint and schema location, so subsequent runs never re‑read them, hence no more null rows.&lt;/P&gt;
&lt;P&gt;Just to narrow this down, can you run a batch read against the exact same path and share what you see?&lt;/P&gt;
&lt;DIV class="l8rrz21 _1ibi0s3do" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python p8i6j0e hljs language-python _12n1b832"&gt;df = (spark.read
      .&lt;SPAN class="hljs-built_in"&gt;format&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"json"&lt;/SPAN&gt;)
      .option(&lt;SPAN class="hljs-string"&gt;"columnNameOfCorruptRecord"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"_corrupt_record"&lt;/SPAN&gt;)
      .load(&lt;SPAN class="hljs-string"&gt;"/Volumes/workspace/dev/input/"&lt;/SPAN&gt;))

df.select(&lt;SPAN class="hljs-string"&gt;"*"&lt;/SPAN&gt;).where(&lt;SPAN class="hljs-string"&gt;"_corrupt_record IS NOT NULL"&lt;/SPAN&gt;).show(truncate=&lt;SPAN class="hljs-literal"&gt;False&lt;/SPAN&gt;)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="p8i6j01 paragraph"&gt;Can you also double‑check&amp;nbsp;to ensure there aren’t extra or zero‑byte files in that directory.&lt;/P&gt;
&lt;DIV class="l8rrz21 _1ibi0s3do" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python p8i6j0e hljs language-python _12n1b832"&gt;display(dbutils.fs.ls(&lt;SPAN class="hljs-string"&gt;"/Volumes/workspace/dev/input/"&lt;/SPAN&gt;))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="l8rrz23 _1ibi0s3d7 _1ibi0s332 _1ibi0s3dp _1ibi0s3bm _1ibi0s3ce"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Apr 2026 18:56:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152952#M53904</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-04-01T18:56:47Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152954#M53905</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thank you for your response.&lt;/P&gt;&lt;P&gt;Just to inform you that&lt;/P&gt;&lt;P&gt;1. I am using Databrick's free edition to execute code using Serverless which doesnt allow me to get the partition numbers.&lt;/P&gt;&lt;P&gt;2. I intentionaly did not want to use/specify schema to know the schema inference behaviour.&lt;/P&gt;&lt;P&gt;3. As mention in your reply,&lt;/P&gt;&lt;PRE&gt;Option 2 — Use cloudFiles.inferColumnTypes&lt;/PRE&gt;&lt;P&gt;I have configured this property too but no good.&lt;/P&gt;&lt;P&gt;4.I did try option 1 but looks like it still creates 33 partitions.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;My code :&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql.types &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; StructType, StructField, StringType, IntegerType&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;schema &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;StructType&lt;/SPAN&gt;&lt;SPAN&gt;([&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"Name"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"Gender"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;()),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"Age"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;IntegerType&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.readStream.\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"json"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/workspace/default/sys/schema5"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;STRONG&gt; .&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;STRONG&gt;schema(schema)\&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'/Volumes/workspace/dev/input/'&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .writeStream\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"delta"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"checkpointLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/workspace/default/sys/checkpoint5"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"mergeSchema"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;trigger&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;availableNow&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;toTable&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"workspace.default.json_null"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;Table Output : Attached&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#FF0000"&gt;I don't find google answers helpful too.&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 01 Apr 2026 19:16:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152954#M53905</guid>
      <dc:creator>mits1</dc:creator>
      <dc:date>2026-04-01T19:16:05Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152989#M53909</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;&amp;nbsp;can you try adding this option as well:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;"multiLine"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2026 07:22:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/152989#M53909</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2026-04-02T07:22:54Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153016#M53913</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thank you for your reply.&lt;/P&gt;&lt;P&gt;I did try this.&lt;/P&gt;&lt;P&gt;It inserts only 1 row with nulls, however,doesn't load all 2 records.Only 1st row gets inserted.&lt;/P&gt;&lt;P&gt;My json file is anyway not a multiline format.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Code :&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.readStream.\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"json"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"multiLine"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/workspace/default/sys/schema"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;schema&lt;/SPAN&gt;&lt;SPAN&gt;(schema)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'/Volumes/workspace/dev/input/'&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .writeStream\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"delta"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"checkpointLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/workspace/default/sys/checkpoint"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"mergeSchema"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;trigger&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;availableNow&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)\&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;toTable&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"workspace.default.json_null_ml"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;STRONG&gt;Output :&lt;/STRONG&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 506px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/25616i045DFDC326FB8A0B/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 02 Apr 2026 09:57:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153016#M53913</guid>
      <dc:creator>mits1</dc:creator>
      <dc:date>2026-04-02T09:57:12Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153059#M53923</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since you're using Databricks Free Edition with Serverless and reading from a Unity Catalog Volume (/Volumes/workspace/dev/input/), the issue is likely:&lt;BR /&gt;Volumes Directory Scan — Autoloader reads the directory, not just the file&lt;BR /&gt;When Autoloader scans /Volumes/workspace/dev/input/, it may be picking up additional hidden files in that directory.&lt;BR /&gt;Run this in your Databricks notebook:&lt;BR /&gt;# Check exactly what files Autoloader sees&lt;BR /&gt;dbutils.fs.ls("/Volumes/workspace/dev/input/")&lt;/P&gt;&lt;P&gt;Also check for hidden files:&lt;BR /&gt;%sh ls -la /Volumes/workspace/dev/input/&lt;/P&gt;&lt;P&gt;If Extra Files Are Found — Fix&lt;BR /&gt;spark.readStream \&lt;BR /&gt;.format("cloudFiles") \&lt;BR /&gt;.option("cloudFiles.format", "json") \&lt;BR /&gt;.option("cloudFiles.schemaLocation", "...") \&lt;BR /&gt;.option("pathGlobFilter", "*.json") \ # &amp;lt;-- ONLY pick .json files&lt;BR /&gt;.load('/Volumes/workspace/dev/input/')&lt;/P&gt;&lt;P&gt;pathGlobFilter forces Autoloader to ignore all non-JSON files in the directory, which would eliminate the null rows.&lt;/P&gt;&lt;P&gt;Could you run dbutils.fs.ls("/Volumes/workspace/dev/input/") and share what it returns? That should pinpoint the exact cause.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2026 14:44:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153059#M53923</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2026-04-02T14:44:45Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153246#M53949</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Bingooo!!!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;.option("pathGlobFilter", "*.json") \ # &amp;lt;-- ONLY pick .json files &lt;FONT color="#FF0000"&gt;WORKED FOR ME.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I read the documents throrughly and understood what&amp;nbsp;("cloudFiles.format", "json") it actually does.It tells autoloader&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;to parse the incoming file as json but pathGlobFilter picks up only specified format out of all formats (csv,xml etc).In the input directory I have .csv file with 33 records.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Now it inserts only 2 records (without nulls)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks a lot for your time and efforts to solve this issue.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Apr 2026 22:01:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153246#M53949</guid>
      <dc:creator>mits1</dc:creator>
      <dc:date>2026-04-03T22:01:45Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153254#M53950</link>
      <description>&lt;P&gt;Hi&amp;nbsp;,&lt;/P&gt;&lt;P&gt;The extra rows could have been caused by various reasons:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Extra files in the directory&lt;/LI&gt;&lt;LI&gt;Empty or corrupt records&lt;/LI&gt;&lt;LI&gt;Non-JSON content being picked up on the first run&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;You could make sure that your input path contains only valid JSON files or you could modify your script to include only JSON files.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Apr 2026 23:55:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153254#M53950</guid>
      <dc:creator>karthickrs</dc:creator>
      <dc:date>2026-04-03T23:55:56Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader inserts null rows in delta table while reading json file</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153268#M53951</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/198217"&gt;@mits1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's my absolute pleasure! — That distinction between cloudFiles.format and pathGlobFilter trips up a lot of people. format tells Autoloader *how to parse*, while pathGlobFilter controls *what gets picked up* in the first place. Two very different layers. Happy ingesting!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Apr 2026 03:44:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-inserts-null-rows-in-delta-table-while-reading-json/m-p/153268#M53951</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2026-04-04T03:44:04Z</dc:date>
    </item>
  </channel>
</rss>

