<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Can schemaHints dynamically handle nested json structures? (Part 2) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures-part-2/m-p/130605#M48849</link>
    <description>&lt;P&gt;I am not aware on schemahints supporting wildcards for now.&amp;nbsp; It would be awesome to have though, I agree.&lt;BR /&gt;So I think you are stuck with what is already proposed in your previous post, or exploding the json or other transformations.&lt;/P&gt;</description>
    <pubDate>Wed, 03 Sep 2025 06:33:42 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2025-09-03T06:33:42Z</dc:date>
    <item>
      <title>Can schemaHints dynamically handle nested json structures? (Part 2)</title>
      <link>https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures-part-2/m-p/130577#M48844</link>
      <description>&lt;P&gt;Hi there, I'd like to follow up on a prior post:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures/m-p/130209/highlight/true#M48731" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures/m-p/130209/highlight/true#M48731&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Basically I'm wondering what's the best way to set *&lt;STRONG&gt;both&lt;/STRONG&gt;* &lt;U&gt;&lt;STRONG&gt;dataPoint&lt;/STRONG&gt;&lt;/U&gt; and &lt;U&gt;&lt;STRONG&gt;values&lt;/STRONG&gt;&lt;/U&gt; to string via schemaHints?&lt;/P&gt;&lt;P&gt;For example, although I've heard that this feature isn't supported currently, it would be great if I could dynamically set schemaHints as follows:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;schemaHints =&amp;gt; 'elementData.element.data.&lt;STRONG&gt;*.&lt;U&gt;dataPoint&lt;/U&gt;&lt;/STRONG&gt; string,&amp;nbsp;elementData.element.data.&lt;STRONG&gt;*.&lt;U&gt;values&lt;/U&gt;&lt;/STRONG&gt; string'&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The reason being, the field `&lt;U&gt;&lt;STRONG&gt;NESTED_DATA_NAME_X&lt;/STRONG&gt;&lt;/U&gt;` is a variable name that I cannot reliably account for (maybe there's a few, or maybe there's 100, and they may change over time), but the `dataPoint` &amp;amp; `values` nested fields are consistent.&lt;/P&gt;&lt;P&gt;Below is a simpler code snippet than in my last post (providing json_text, but in practice I would be trying to use schemaHints via read_files or autoloader):&lt;/P&gt;&lt;LI-CODE lang="python"&gt;json_text = """
{
  "packageHeader": {
    "transactionId": "5c3661ac-abdd-480c-88b6-c3c128bce7bd",
    "rootName": "rootName",
    "endpointName": "report",
    "elementType": "Generator",
    "elementName": "rootName",
    "intervalSize": 5,
    "environment": "production",
    "startDate": "2025-07-10T04:00:00",
    "endDate": "2025-07-10T04:10:00"
  },
  "elementData": [
    {
      "elementName": "elementName1",
      "elementIdentifier": "ae5f1a94-33b4-4001-b926-499dc0425bf1",
      "elementDefinitionIdentifier": 2,
      "metaData": {},
      "data": {}
    },
    {
      "elementName": "elementName2",
      "elementIdentifier": "c647ffb1-b8ba-4b34-8b45-8590fed273ef",
      "elementDefinitionIdentifier": 2,
      "metaData": {},
      "data": {
        "NESTED_DATA_NAME_1": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_1",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "Decimal",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": 0.975200119018556
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": 0.21553290049235
            }
          ]
        },
        "NESTED_DATA_NAME_2": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_2",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "String",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": "ON"
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": "ON"
            }
          ]
        },
        "NESTED_DATA_NAME_3": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_3",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "Decimal",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": 0
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": 0
            }
          ]
        },
        "NESTED_DATA_NAME_5": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_5",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "Decimal",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": 0
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": 0
            }
          ]
        },
        "NESTED_DATA_NAME_6": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_6",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "String",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": "Off"
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": "Off"
            }
          ]
        },
        "NESTED_DATA_NAME_7": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_7",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "Decimal",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": 0.00011502826237119734
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": 0.00011502826237119734
            }
          ]
        }
      }
    }
  ]
}
"""

rdd = spark.sparkContext.parallelize(json_text.split(', '))

df = spark.read.json(rdd)

df.printSchema()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Sep 2025 03:53:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures-part-2/m-p/130577#M48844</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2025-09-03T03:53:03Z</dc:date>
    </item>
    <item>
      <title>Re: Can schemaHints dynamically handle nested json structures? (Part 2)</title>
      <link>https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures-part-2/m-p/130605#M48849</link>
      <description>&lt;P&gt;I am not aware on schemahints supporting wildcards for now.&amp;nbsp; It would be awesome to have though, I agree.&lt;BR /&gt;So I think you are stuck with what is already proposed in your previous post, or exploding the json or other transformations.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Sep 2025 06:33:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures-part-2/m-p/130605#M48849</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2025-09-03T06:33:42Z</dc:date>
    </item>
  </channel>
</rss>

