cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Can schemaHints dynamically handle nested json structures? (Part 2)

ChristianRRL
Valued Contributor III

Hi there, I'd like to follow up on a prior post:

Basically I'm wondering what's the best way to set *both* dataPoint and values to string via schemaHints?

For example, although I've heard that this feature isn't supported currently, it would be great if I could dynamically set schemaHints as follows:

  • schemaHints => 'elementData.element.data.*.dataPoint string, elementData.element.data.*.values string'

The reason being, the field `NESTED_DATA_NAME_X` is a variable name that I cannot reliably account for (maybe there's a few, or maybe there's 100, and they may change over time), but the `dataPoint` & `values` nested fields are consistent.

Below is a simpler code snippet than in my last post (providing json_text, but in practice I would be trying to use schemaHints via read_files or autoloader):

json_text = """
{
  "packageHeader": {
    "transactionId": "5c3661ac-abdd-480c-88b6-c3c128bce7bd",
    "rootName": "rootName",
    "endpointName": "report",
    "elementType": "Generator",
    "elementName": "rootName",
    "intervalSize": 5,
    "environment": "production",
    "startDate": "2025-07-10T04:00:00",
    "endDate": "2025-07-10T04:10:00"
  },
  "elementData": [
    {
      "elementName": "elementName1",
      "elementIdentifier": "ae5f1a94-33b4-4001-b926-499dc0425bf1",
      "elementDefinitionIdentifier": 2,
      "metaData": {},
      "data": {}
    },
    {
      "elementName": "elementName2",
      "elementIdentifier": "c647ffb1-b8ba-4b34-8b45-8590fed273ef",
      "elementDefinitionIdentifier": 2,
      "metaData": {},
      "data": {
        "NESTED_DATA_NAME_1": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_1",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "Decimal",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": 0.975200119018556
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": 0.21553290049235
            }
          ]
        },
        "NESTED_DATA_NAME_2": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_2",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "String",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": "ON"
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": "ON"
            }
          ]
        },
        "NESTED_DATA_NAME_3": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_3",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "Decimal",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": 0
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": 0
            }
          ]
        },
        "NESTED_DATA_NAME_5": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_5",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "Decimal",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": 0
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": 0
            }
          ]
        },
        "NESTED_DATA_NAME_6": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_6",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "String",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": "Off"
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": "Off"
            }
          ]
        },
        "NESTED_DATA_NAME_7": {
          "dataPoint": {
            "name": "NESTED_DATA_NAME_7",
            "friendlyName": "friendlyName",
            "keyName": "keyName",
            "dataType": "Decimal",
            "sequence": null
          },
          "values": [
            {
              "intervalLocal": "2025-07-10T04:05:00-05:00",
              "value": 0.00011502826237119734
            },
            {
              "intervalLocal": "2025-07-10T04:10:00-05:00",
              "value": 0.00011502826237119734
            }
          ]
        }
      }
    }
  ]
}
"""

rdd = spark.sparkContext.parallelize(json_text.split(', '))

df = spark.read.json(rdd)

df.printSchema()

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

I am not aware on schemahints supporting wildcards for now.  It would be awesome to have though, I agree.
So I think you are stuck with what is already proposed in your previous post, or exploding the json or other transformations.

View solution in original post

1 REPLY 1

-werners-
Esteemed Contributor III

I am not aware on schemahints supporting wildcards for now.  It would be awesome to have though, I agree.
So I think you are stuck with what is already proposed in your previous post, or exploding the json or other transformations.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now