<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to handle schema validation for Json file. Using Databricks Autoloader? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29760#M21467</link>
    <description>&lt;P&gt;Hi @Swapnil Kamle​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 29 Oct 2022 06:16:31 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2022-10-29T06:16:31Z</dc:date>
    <item>
      <title>How to handle schema validation for Json file. Using Databricks Autoloader?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29757#M21464</link>
      <description>&lt;P&gt;Following are the details of the requirement:&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;I am using Spark code to read data from Kafka and write into landing layer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Next step is, I am reading Json files from landing layer and moving to bronze layer, which is another container in my ADLS Gen2. For this purpose, I am using Autoloader with Delta Live table to create table using Autoloader.&lt;/P&gt;&lt;P&gt;	&lt;I&gt;&lt;U&gt;Here is the code for the same:&lt;/U&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;	&lt;I&gt;&lt;U&gt;&amp;nbsp;&lt;/U&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;@dlt.table(&lt;/P&gt;&lt;P&gt;&amp;nbsp;name = tablename,&lt;/P&gt;&lt;P&gt;&amp;nbsp;comment = "Create Bronze Table",&lt;/P&gt;&lt;P&gt;&amp;nbsp;table_properties={&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;"quality": "bronze"&lt;/P&gt;&lt;P&gt;&amp;nbsp;}&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;def Bronze_Table_Create():&lt;/P&gt;&lt;P&gt;&amp;nbsp;return (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;spark&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.readStream&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.format("cloudFiles")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("cloudFiles.format", "json")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("cloudFiles.schemaLocation", schemalocation)&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("cloudFiles.inferColumnTypes", "true")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("cloudFiles.schemaEvolutionMode", "rescue")&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.load(sourcelocation)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;4.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;This code works fine for me, it infers the schema as well. However, I have one scenario, which I am trying to handle. Which I mentioned step by step below:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;i.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;I want to validate the schema, so that if there is any change in the schema, I should get notified and the job should get failed. I can handle it through &lt;B&gt;SchemaEvolutionMode&lt;/B&gt;. However, my scenario is quite different. In my scenario, I am having one column &lt;B&gt;RawData&lt;/B&gt;, which is of type object and there is not specified schema for it. It will get dynamic values, because of that if I infer the schema and apply schema validation, then every time it will bring new schema and throw schema mismatch error. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;ii.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Is there is any solution by which I can exclude the &lt;B&gt;RawData&lt;/B&gt; column from schema validation, so that I allows this column to have any type of data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am struggling for this from a long time. Any help on this is helpful. Please let me know if any additional details are required on this. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sample Json:&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;"Header": {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"SchemaVersion": "1.0",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"EventId": "123",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"EventTime_UTC": "2022-09-22 16:18:16",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Environment": "dev",&lt;/P&gt;&lt;P&gt;&amp;nbsp;},&lt;/P&gt;&lt;P&gt;&amp;nbsp;"Payload": {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"RawData": {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CusID": "12345",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Status": "Pending",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"LastModifiedAt": "2022-09-22 16:18:12",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"ContainerName": "default,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CreatedAt": "2022-09-22 16:18:11" &lt;I&gt;*The data in the RawData is inconsistent like it can have different columns*&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;},&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Data": {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"CustID": "12345",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"ArrayKeys": [&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"ArrayName": "WorkHistory",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"ArrayKeyName": "SampleId"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;}&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;</description>
      <pubDate>Sat, 01 Oct 2022 10:15:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29757#M21464</guid>
      <dc:creator>SRK</dc:creator>
      <dc:date>2022-10-01T10:15:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle schema validation for Json file. Using Databricks Autoloader?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29758#M21465</link>
      <description>&lt;P&gt;Maybe don't validate schema than in next step do dlt expectation to check are there all required fields in Data struct type.&lt;/P&gt;</description>
      <pubDate>Sun, 02 Oct 2022 18:37:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29758#M21465</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-10-02T18:37:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle schema validation for Json file. Using Databricks Autoloader?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29759#M21466</link>
      <description>&lt;P&gt;Hi @Hubert Dudek​, Actually I want to validate the schema so that I would be able to know if there are any additional fields apart from the schema added to the data. If I do the expectation check in the next level, I need to apply a check for individual columns and there are many columns, so it will be difficult to handle. Is there any way I can exclude a particular column like RawData in my case from schema enforcement? So that I won't apply for the RawData column for which unspecified Or dynamic data is coming. &lt;/P&gt;</description>
      <pubDate>Mon, 03 Oct 2022 04:29:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29759#M21466</guid>
      <dc:creator>SRK</dc:creator>
      <dc:date>2022-10-03T04:29:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle schema validation for Json file. Using Databricks Autoloader?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29760#M21467</link>
      <description>&lt;P&gt;Hi @Swapnil Kamle​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Oct 2022 06:16:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29760#M21467</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-10-29T06:16:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle schema validation for Json file. Using Databricks Autoloader?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29761#M21468</link>
      <description>&lt;P&gt;Sorry for delay in reply.  I didn't get the exact answer&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 00:56:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/29761#M21468</guid>
      <dc:creator>SRK</dc:creator>
      <dc:date>2022-12-02T00:56:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle schema validation for Json file. Using Databricks Autoloader?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/96072#M39206</link>
      <description>&lt;P&gt;just to clarify, are you reading kafka and writing into adls in &lt;STRONG&gt;json&lt;/STRONG&gt; files? like for each message from kafka is 1 json file in adls ?&lt;/P&gt;</description>
      <pubDate>Fri, 25 Oct 2024 05:01:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-handle-schema-validation-for-json-file-using-databricks/m-p/96072#M39206</guid>
      <dc:creator>maddy08</dc:creator>
      <dc:date>2024-10-25T05:01:27Z</dc:date>
    </item>
  </channel>
</rss>

