<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Yaml file to Dataframe in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/yaml-file-to-dataframe/m-p/121415#M46447</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/167842"&gt;@SatyaKoduri&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.&lt;BR /&gt;The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.&lt;/P&gt;&lt;P&gt;Here are a few solutions:&lt;BR /&gt;Option 1: Flatten the structure before creating DataFrame&lt;BR /&gt;Option 2: Convert nested structures to JSON strings&lt;BR /&gt;Option 3: Use a more explicit schema (flexible but structured)&lt;BR /&gt;Option 4: Force schema inference with RDD approach&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The flattening approach (Option 1) is probably your best bet if you want to maintain the flexibility you had in 13.3 while working with the stricter schema inference in 15.4. It converts your nested structure into a flat key-value format that Spark can easily handle.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 11 Jun 2025 00:01:36 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2025-06-11T00:01:36Z</dc:date>
    <item>
      <title>Yaml file to Dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/yaml-file-to-dataframe/m-p/121316#M46420</link>
      <description>&lt;P&gt;Hi, I'm trying to read YAML files using pyyaml and convert them into a Spark DataFrame with createDataFrame, without specifying a schema—allowing flexibility for potential YAML schema changes over time. This approach worked as expected on Databricks runtime 13.3, but does not seem to function correctly on runtime 15.4. Any suggestions? &lt;/P&gt;&lt;P&gt;My yaml schema is as below which I can read well in 13.3 runtime, however I get '&lt;SPAN class=""&gt;[&lt;A class="" href="https://docs.microsoft.com/azure/databricks/error-messages/error-classes#cannot_infer_type_for_field" target="_blank" rel="noopener noreferrer"&gt;CANNOT_INFER_TYPE_FOR_FIELD&lt;/A&gt;] Unable to infer the type of the field `dataset`&lt;/SPAN&gt;' on 15.4 runtime.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2025-06-10 at 10.36.20.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17421i0772B101E6E4638C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2025-06-10 at 10.36.20.png" alt="Screenshot 2025-06-10 at 10.36.20.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jun 2025 09:39:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/yaml-file-to-dataframe/m-p/121316#M46420</guid>
      <dc:creator>SatyaKoduri</dc:creator>
      <dc:date>2025-06-10T09:39:04Z</dc:date>
    </item>
    <item>
      <title>Re: Yaml file to Dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/yaml-file-to-dataframe/m-p/121415#M46447</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/167842"&gt;@SatyaKoduri&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a known issue with newer Spark versions (3.5+) that came with Databricks Runtime 15.4.&lt;BR /&gt;The schema inference has become more strict and struggles with deeply nested structures like your YAML's nested maps.&lt;/P&gt;&lt;P&gt;Here are a few solutions:&lt;BR /&gt;Option 1: Flatten the structure before creating DataFrame&lt;BR /&gt;Option 2: Convert nested structures to JSON strings&lt;BR /&gt;Option 3: Use a more explicit schema (flexible but structured)&lt;BR /&gt;Option 4: Force schema inference with RDD approach&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The flattening approach (Option 1) is probably your best bet if you want to maintain the flexibility you had in 13.3 while working with the stricter schema inference in 15.4. It converts your nested structure into a flat key-value format that Spark can easily handle.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jun 2025 00:01:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/yaml-file-to-dataframe/m-p/121415#M46447</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-06-11T00:01:36Z</dc:date>
    </item>
  </channel>
</rss>

