Schema inference with auto loader (non-DLT and DLT)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2023 03:19 PM - edited 11-21-2023 03:27 PM
Hi.
Another question, this time about schema inference and column types. I have dabbled with DLT and structured streaming with auto loader (as in, not DLT). My data source use case is json files, which contain nested structures.
I noticed that in the resulting streaming DLT table, all columns were strings. In the resulting delta table from the structured streaming + auto loader approach, the nested columns are structs.
- Is this the option cloudFiles.inferColumnTypes at work?
- As I understand it from the doc, if I were to use false in the non-DLT structured streaming approach, the columns would all be strings, correct?
- It doesn't look like I set anything for that option in the DLT declaration, so is false the default for DLT? Based on the doc I assume DLT using false is the case:
cloudFiles.inferColumnTypes
Type: Boolean
Whether to infer exact column types when leveraging schema inference. By default, columns are inferred as strings when inferring JSON and CSV datasets. See schema inference for more details.
Default value: false
- If I use infer false in the structured streaming approach, would schema changes in those nested struct columns not cause failures due to schema evolution, because they're just strings instead?
Cheers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2024 12:45 PM
A late thank you for your reply, Kaniz. From my experience in the platform so far, I do like what schema inference does and I prefer to use it.

