<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic schema is not enforced when using autoloader in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/schema-is-not-enforced-when-using-autoloader/m-p/88130#M37486</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I am currently trying to enforce the following schema:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt; StructType([
    StructField("site", StringType(), True),
    StructField("meter", StringType(), True),
    StructField("device_time", StringType(), True),
    StructField("data", StructType([
        StructField("energy", StructType([
            StructField("cumulative", StructType([
                StructField("active", StructType([
                    StructField("value", DoubleType(), True),
                    StructField("unit", StringType(), True)
                ]), True),
                StructField("apparent", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), ),
                StructField("reactive", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), )
            ]), True)
        ]), True),
        StructField("power", StructType([
            StructField("instantaneous", StructType([
                StructField("active", StructType([
                    StructField("value", DoubleType(), True),
                    StructField("unit", StringType(), True)
                ]), True),
                StructField("apparent", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), ),
                StructField("reactive", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), )
            ]), True),
            StructField("average", StructType([
                StructField("active", StructType([
                    StructField("value", DoubleType(), True),
                    StructField("unit", StringType(), True)
                ]), True),
                StructField("apparent", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), ),
                StructField("reactive", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), )
            ]), True)
        ]), True)
    ]), True)
])&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to do it using autoloader:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;dfSilver = (spark.readStream
      .format("cloudFiles")
      .option("cloudFiles.format", "json")
      .schema(json_schema_silver)
      .load("myVolumePathForJsonFilesFromS3"))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;but for some reason the schema is not enforced.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="sakuraDev_0-1725389159389.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/10905i4E4273283C361D2A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="sakuraDev_0-1725389159389.png" alt="sakuraDev_0-1725389159389.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Here's one example of what i am ingesting as json files:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[
    {
        "site": "SiteA",
        "meter": "M1",
        "device_time": "2024-08-25T00:00:01.276Z",
        "data": {
            "energy": {
                "cumulative": {
                    "active": {
                        "value": 60000.000000000000,
                        "unit": "kWh"
                    },
                    "apparent": {
                        "value": 61000.000000000000,
                        "unit": "kVAh"
                    },
                    "reactive": {
                        "value": 420.000000000000,
                        "unit": "kVArH"
                    }
                }
            },
            "power": {
                "instantaneous": {
                    "active": {
                        "value": 9100000.000000000000,
                        "unit": "watt"
                    },
                    "apparent": {
                        "value": 9200000.000000000000,
                        "unit": "VA"
                    },
                    "reactive": {
                        "value": 64000.000000000000,
                        "unit": "var"
                    }
                },
                "average": {
                    "active": {
                        "value": 9100000.000000000000,
                        "unit": "watt"
                    },
                    "apparent": {
                        "value": 9200000.000000000000,
                        "unit": "VA"
                    },
                    "reactive": {
                        "value": 64000.000000000000,
                        "unit": "var"
                    }
                }
            }
        }
    },
    {
        "site": "SiteB",
        "meter": "M2",
        "device_time": "2024-08-25T00:30:31.306Z",
        "data": {
            "energy": {
                "cumulative": {
                    "active": {
                        "value": 61000.000000000000,
                        "unit": "kWh"
                    },
                    "apparent": {
                        "value": 62000.000000000000,
                        "unit": "kVAh"
                    },
                    "reactive": {
                        "value": 430.000000000000,
                        "unit": "kVArH"
                    }
                }
            },
            "power": {
                "instantaneous": {
                    "active": {
                        "value": 10200000.000000000000,
                        "unit": "watt"
                    },
                    "apparent": {
                        "value": 10300000.000000000000,
                        "unit": "VA"
                    },
                    "reactive": {
                        "value": 65000.000000000000,
                        "unit": "var"
                    }
                },
                "average": {
                    "active": {
                        "value": 10200000.000000000000,
                        "unit": "watt"
                    },
                    "apparent": {
                        "value": 10300000.000000000000,&lt;/LI-CODE&gt;&lt;P&gt;By enforcing the schema i udestand it should show each field of the nested objects as it does with site for example in a tabular format not inside data, am i wrong for expecting this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 03 Sep 2024 18:51:01 GMT</pubDate>
    <dc:creator>sakuraDev</dc:creator>
    <dc:date>2024-09-03T18:51:01Z</dc:date>
    <item>
      <title>schema is not enforced when using autoloader</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-is-not-enforced-when-using-autoloader/m-p/88130#M37486</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I am currently trying to enforce the following schema:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt; StructType([
    StructField("site", StringType(), True),
    StructField("meter", StringType(), True),
    StructField("device_time", StringType(), True),
    StructField("data", StructType([
        StructField("energy", StructType([
            StructField("cumulative", StructType([
                StructField("active", StructType([
                    StructField("value", DoubleType(), True),
                    StructField("unit", StringType(), True)
                ]), True),
                StructField("apparent", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), ),
                StructField("reactive", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), )
            ]), True)
        ]), True),
        StructField("power", StructType([
            StructField("instantaneous", StructType([
                StructField("active", StructType([
                    StructField("value", DoubleType(), True),
                    StructField("unit", StringType(), True)
                ]), True),
                StructField("apparent", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), ),
                StructField("reactive", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), )
            ]), True),
            StructField("average", StructType([
                StructField("active", StructType([
                    StructField("value", DoubleType(), True),
                    StructField("unit", StringType(), True)
                ]), True),
                StructField("apparent", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), ),
                StructField("reactive", StructType([
                    StructField("value", DoubleType(), ),
                    StructField("unit", StringType(), )
                ]), )
            ]), True)
        ]), True)
    ]), True)
])&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to do it using autoloader:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;dfSilver = (spark.readStream
      .format("cloudFiles")
      .option("cloudFiles.format", "json")
      .schema(json_schema_silver)
      .load("myVolumePathForJsonFilesFromS3"))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;but for some reason the schema is not enforced.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="sakuraDev_0-1725389159389.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/10905i4E4273283C361D2A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="sakuraDev_0-1725389159389.png" alt="sakuraDev_0-1725389159389.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Here's one example of what i am ingesting as json files:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[
    {
        "site": "SiteA",
        "meter": "M1",
        "device_time": "2024-08-25T00:00:01.276Z",
        "data": {
            "energy": {
                "cumulative": {
                    "active": {
                        "value": 60000.000000000000,
                        "unit": "kWh"
                    },
                    "apparent": {
                        "value": 61000.000000000000,
                        "unit": "kVAh"
                    },
                    "reactive": {
                        "value": 420.000000000000,
                        "unit": "kVArH"
                    }
                }
            },
            "power": {
                "instantaneous": {
                    "active": {
                        "value": 9100000.000000000000,
                        "unit": "watt"
                    },
                    "apparent": {
                        "value": 9200000.000000000000,
                        "unit": "VA"
                    },
                    "reactive": {
                        "value": 64000.000000000000,
                        "unit": "var"
                    }
                },
                "average": {
                    "active": {
                        "value": 9100000.000000000000,
                        "unit": "watt"
                    },
                    "apparent": {
                        "value": 9200000.000000000000,
                        "unit": "VA"
                    },
                    "reactive": {
                        "value": 64000.000000000000,
                        "unit": "var"
                    }
                }
            }
        }
    },
    {
        "site": "SiteB",
        "meter": "M2",
        "device_time": "2024-08-25T00:30:31.306Z",
        "data": {
            "energy": {
                "cumulative": {
                    "active": {
                        "value": 61000.000000000000,
                        "unit": "kWh"
                    },
                    "apparent": {
                        "value": 62000.000000000000,
                        "unit": "kVAh"
                    },
                    "reactive": {
                        "value": 430.000000000000,
                        "unit": "kVArH"
                    }
                }
            },
            "power": {
                "instantaneous": {
                    "active": {
                        "value": 10200000.000000000000,
                        "unit": "watt"
                    },
                    "apparent": {
                        "value": 10300000.000000000000,
                        "unit": "VA"
                    },
                    "reactive": {
                        "value": 65000.000000000000,
                        "unit": "var"
                    }
                },
                "average": {
                    "active": {
                        "value": 10200000.000000000000,
                        "unit": "watt"
                    },
                    "apparent": {
                        "value": 10300000.000000000000,&lt;/LI-CODE&gt;&lt;P&gt;By enforcing the schema i udestand it should show each field of the nested objects as it does with site for example in a tabular format not inside data, am i wrong for expecting this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Sep 2024 18:51:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-is-not-enforced-when-using-autoloader/m-p/88130#M37486</guid>
      <dc:creator>sakuraDev</dc:creator>
      <dc:date>2024-09-03T18:51:01Z</dc:date>
    </item>
    <item>
      <title>Re: schema is not enforced when using autoloader</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-is-not-enforced-when-using-autoloader/m-p/88133#M37488</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119002"&gt;@sakuraDev&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I'm afraid your assumption is wrong. Here you define data field as struct type and the result is as expected. So once you have this column as struct type, you can refer to nested object using dot notation. So if you would like to get energy field you would use:&amp;nbsp;&lt;/P&gt;&lt;P&gt;data.energy&lt;/P&gt;&lt;P&gt;To make this data in tabular format you need to perform some transformation.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Sep 2024 19:19:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-is-not-enforced-when-using-autoloader/m-p/88133#M37488</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-03T19:19:04Z</dc:date>
    </item>
  </channel>
</rss>

