cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Yaml file to Dataframe

SatyaKoduri
New Contributor II

Hi, I'm trying to read YAML files using pyyaml and convert them into a Spark DataFrame with createDataFrame, without specifying a schemaโ€”allowing flexibility for potential YAML schema changes over time. This approach worked as expected on Databricks runtime 13.3, but does not seem to function correctly on runtime 15.4. Any suggestions?โ€ƒ

My yaml schema is as below which I can read well in 13.3 runtime, however I get '[CANNOT_INFER_TYPE_FOR_FIELD] Unable to infer the type of the field `dataset`' on 15.4 runtime.

Screenshot 2025-06-10 at 10.36.20.png

1 ACCEPTED SOLUTION

Accepted Solutions

lingareddy_Alva
Honored Contributor II

Hi @SatyaKoduri 

This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.
The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.

Here are a few solutions:
Option 1: Flatten the structure before creating DataFrame
Option 2: Convert nested structures to JSON strings
Option 3: Use a more explicit schema (flexible but structured)
Option 4: Force schema inference with RDD approach


The flattening approach (Option 1) is probably your best bet if you want to maintain the flexibility you had in 13.3 while working with the stricter schema inference in 15.4. It converts your nested structure into a flat key-value format that Spark can easily handle.

 

LR

View solution in original post

1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @SatyaKoduri 

This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.
The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.

Here are a few solutions:
Option 1: Flatten the structure before creating DataFrame
Option 2: Convert nested structures to JSON strings
Option 3: Use a more explicit schema (flexible but structured)
Option 4: Force schema inference with RDD approach


The flattening approach (Option 1) is probably your best bet if you want to maintain the flexibility you had in 13.3 while working with the stricter schema inference in 15.4. It converts your nested structure into a flat key-value format that Spark can easily handle.

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now