Databricks Community

ilarsen · ‎11-21-2023

Hi.

Another question, this time about schema inference and column types. I have dabbled with DLT and structured streaming with auto loader (as in, not DLT). My data source use case is json files, which contain nested structures.

I noticed that in the resulting streaming DLT table, all columns were strings. In the resulting delta table from the structured streaming + auto loader approach, the nested columns are structs.

Is this the option cloudFiles.inferColumnTypes at work?
As I understand it from the doc, if I were to use false in the non-DLT structured streaming approach, the columns would all be strings, correct?
It doesn't look like I set anything for that option in the DLT declaration, so is false the default for DLT? Based on the doc I assume DLT using false is the case:

cloudFiles.inferColumnTypes
Type: Boolean
Whether to infer exact column types when leveraging schema inference. By default, columns are inferred as strings when inferring JSON and CSV datasets. See schema inference for more details.
Default value: false

If I use infer false in the structured streaming approach, would schema changes in those nested struct columns not cause failures due to schema evolution, because they're just strings instead?

Cheers.