- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-10-2025 02:39 AM
Hi, I'm trying to read YAML files using pyyaml and convert them into a Spark DataFrame with createDataFrame, without specifying a schema—allowing flexibility for potential YAML schema changes over time. This approach worked as expected on Databricks runtime 13.3, but does not seem to function correctly on runtime 15.4. Any suggestions?
My yaml schema is as below which I can read well in 13.3 runtime, however I get '[CANNOT_INFER_TYPE_FOR_FIELD] Unable to infer the type of the field `dataset`' on 15.4 runtime.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-10-2025 05:01 PM - edited 06-10-2025 05:01 PM
Hi @SatyaKoduri
This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.
The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.
Here are a few solutions:
Option 1: Flatten the structure before creating DataFrame
Option 2: Convert nested structures to JSON strings
Option 3: Use a more explicit schema (flexible but structured)
Option 4: Force schema inference with RDD approach
The flattening approach (Option 1) is probably your best bet if you want to maintain the flexibility you had in 13.3 while working with the stricter schema inference in 15.4. It converts your nested structure into a flat key-value format that Spark can easily handle.