- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-03-2023 09:51 AM
I have been exploring Autoloader to ingest gzipped JSON files from an S3 source.
The notebook fails in the first run due to schema mismatch, after re-running the notebook, the schema evolves and the ingestion runs successfully.
On analysing the schema for the delta table created as a result of the ingestion, I found there are two new columns `id` and `optionsDefaults`.
These columns are not there in the original data, nor do they contain any value and are just nulls.
Is there something I might be missing out on...?
- Labels:
-
Autoloader
-
Columns
-
Delta table
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2023 02:26 AM
Hi @Debayan Mukherjee , @Kaniz Fatma
Thank you for replying to my question.
I was able to figure out the issue. I was creating the schema and checkpoint folders in the same path as the source location for the autoloader. This caused the schema to change every time the autoloader notebook ran as the source data now included schema and checkpoint metadata as well.
I fixed this by providing a location for schema and checkpoint different from the source location.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2023 12:05 PM
Hi, Could you please provide a screenshot (before and after) and also, if possible, notebook content?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-13-2023 02:26 AM
Hi @Debayan Mukherjee , @Kaniz Fatma
Thank you for replying to my question.
I was able to figure out the issue. I was creating the schema and checkpoint folders in the same path as the source location for the autoloader. This caused the schema to change every time the autoloader notebook ran as the source data now included schema and checkpoint metadata as well.
I fixed this by providing a location for schema and checkpoint different from the source location.

