Yaml file to Dataframe

SatyaKoduri — Tue, 10 Jun 2025 09:39:04 GMT

Hi, I'm trying to read YAML files using pyyaml and convert them into a Spark DataFrame with createDataFrame, without specifying a schema—allowing flexibility for potential YAML schema changes over time. This approach worked as expected on Databricks runtime 13.3, but does not seem to function correctly on runtime 15.4. Any suggestions?

My yaml schema is as below which I can read well in 13.3 runtime, however I get '[CANNOT_INFER_TYPE_FOR_FIELD] Unable to infer the type of the field `dataset`' on 15.4 runtime.

Re: Yaml file to Dataframe

lingareddy_Alva — Wed, 11 Jun 2025 00:01:36 GMT

Hi @SatyaKoduri

This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.
The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.

Here are a few solutions:
Option 1: Flatten the structure before creating DataFrame
Option 2: Convert nested structures to JSON strings
Option 3: Use a more explicit schema (flexible but structured)
Option 4: Force schema inference with RDD approach

The flattening approach (Option 1) is probably your best bet if you want to maintain the flexibility you had in 13.3 while working with the stricter schema inference in 15.4. It converts your nested structure into a flat key-value format that Spark can easily handle.

topic Re: Yaml file to Dataframe in Data Engineering

Yaml file to Dataframe

Re: Yaml file to Dataframe