Databricks Community

SatyaKoduri · ‎06-10-2025

Hi, I'm trying to read YAML files using pyyaml and convert them into a Spark DataFrame with createDataFrame, without specifying a schema—allowing flexibility for potential YAML schema changes over time. This approach worked as expected on Databricks runtime 13.3, but does not seem to function correctly on runtime 15.4. Any suggestions?

My yaml schema is as below which I can read well in 13.3 runtime, however I get '[CANNOT_INFER_TYPE_FOR_FIELD] Unable to infer the type of the field `dataset`' on 15.4 runtime.

lingareddy_Alva · ‎06-10-2025

Hi @SatyaKoduri

This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.
The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.

Here are a few solutions:
Option 1: Flatten the structure before creating DataFrame
Option 2: Convert nested structures to JSON strings
Option 3: Use a more explicit schema (flexible but structured)
Option 4: Force schema inference with RDD approach

The flattening approach (Option 1) is probably your best bet if you want to maintain the flexibility you had in 13.3 while working with the stricter schema inference in 15.4. It converts your nested structure into a flat key-value format that Spark can easily handle.

LR

View solution in original post

lingareddy_Alva · ‎06-10-2025

Hi @SatyaKoduri

This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.
The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.

Here are a few solutions:
Option 1: Flatten the structure before creating DataFrame
Option 2: Convert nested structures to JSON strings
Option 3: Use a more explicit schema (flexible but structured)
Option 4: Force schema inference with RDD approach

The flattening approach (Option 1) is probably your best bet if you want to maintain the flexibility you had in 13.3 while working with the stricter schema inference in 15.4. It converts your nested structure into a flat key-value format that Spark can easily handle.

LR

Databricks Community

Yaml file to Dataframe

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples