Hi @KristiLogos ,
Try first to add .trigger(availableNow=True). This ensures all the data is being processed.
Without the option, as per documentation, it will run the query as fast as possible, which is equivalent to setting the trigger to processingTime='0 seconds'.
When you're running the streaming query in a notebook where the cell execution might terminate before all data is processed, the query may not have enough time to ingest all your files. This could result in only a fraction of your data (e.g., 200 rows) being written to your Delta table.
df_autoloader.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", checkpoint_dir)
.trigger(availableNow=True)
.table("tablename")
Check this setting and let us know if it works.