<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: 'from_json' spark function not parsing value column from Confluent Kafka topic in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/107946#M42954</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36707"&gt;@Sidhant07&lt;/a&gt;,&lt;BR /&gt;Thanks for responding, I would try this spark config&amp;nbsp;&lt;EM&gt;spark.sql.json.enablePartialResults&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;But on the other hand, Python JSON library is able to parse the raw json string without additional configs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards!&lt;/P&gt;</description>
    <pubDate>Fri, 31 Jan 2025 03:48:34 GMT</pubDate>
    <dc:creator>hari-prasad</dc:creator>
    <dc:date>2025-01-31T03:48:34Z</dc:date>
    <item>
      <title>'from_json' spark function not parsing value column from Confluent Kafka topic</title>
      <link>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/106608#M42530</link>
      <description>&lt;P&gt;For one of badge completion, it was mandatory to complete a&amp;nbsp;&lt;EM&gt;Spark Streaming Demo Practice.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Due to the absence of a Kafka broker setup required for the demo practice, I configured a Confluent Kafka cluster and made several modifications to the Spark script provided by DBDemos &lt;EM&gt;&lt;STRONG&gt;streaming-sessionization&lt;/STRONG&gt;&lt;/EM&gt; to ensure compatibility with the Confluent Kafka cluster. As a result, I successfully ingested data from the Kafka topic into the bronze table.&lt;/P&gt;&lt;P&gt;Upon initiating the data load from the bronze table to the silver table, I observed an anomaly where the code executed successfully, and the transaction log indicated a 'Streaming_update' in delta table logs. However, the table did not contain any data. Further investigation revealed that the from_json function was not parsing the values correctly, despite the schema I was passing to function was as expected or in script.&lt;/P&gt;&lt;H4&gt;Step By Step Resolving&amp;nbsp;&lt;/H4&gt;&lt;P&gt;1. Understand the schema of value column from Kafka topic. Using&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;schema_of_json&lt;/STRONG&gt;&lt;/EM&gt; function we can dynamically extract schema from stringified JSON value.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="hariprasad_0-1737534905518.png" style="width: 1113px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14273i86E5A9A9318214A8/image-dimensions/1113x398?v=v2" width="1113" height="398" role="button" title="hariprasad_0-1737534905518.png" alt="hariprasad_0-1737534905518.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. Now trying&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;from_json&lt;/STRONG&gt;&lt;/EM&gt; with &lt;EM&gt;json_schema&lt;/EM&gt; to convert value column into individual key columns, where key-value pair parse but values were becoming &lt;EM&gt;&lt;STRONG&gt;null&lt;/STRONG&gt;&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="hariprasad_1-1737534936401.png" style="width: 896px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14274iA4C8B8794BC06825/image-dimensions/896x303?v=v2" width="896" height="303" role="button" title="hariprasad_1-1737534936401.png" alt="hariprasad_1-1737534936401.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="hariprasad_0-1737533122673.png" style="width: 935px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14271i63014C4488878397/image-dimensions/935x388?v=v2" width="935" height="388" role="button" title="hariprasad_0-1737533122673.png" alt="hariprasad_0-1737533122673.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, checked whether JSON is valid by using python json lib, which parsed it without any error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="hariprasad_2-1737534973740.png" style="width: 945px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14275iF50D4DDD20767B0D/image-dimensions/945x469?v=v2" width="945" height="469" role="button" title="hariprasad_2-1737534973740.png" alt="hariprasad_2-1737534973740.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3. Finally, resolving the issue with few simple strings cleaning as mentioned below, &lt;FONT color="#008000"&gt;&lt;STRONG&gt;it worked&lt;/STRONG&gt;&lt;/FONT&gt;.&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;a) Replacing the leading and trailing double quotes from string value&lt;BR /&gt;b) Replacing back-slash with empty-string&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="hariprasad_3-1737534995490.png" style="width: 951px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14276i63816225EB5AEF7A/image-dimensions/951x421?v=v2" width="951" height="421" role="button" title="hariprasad_3-1737534995490.png" alt="hariprasad_3-1737534995490.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="hariprasad_1-1737533469275.png" style="width: 872px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14272i819BE029094A9192/image-dimensions/872x317?v=v2" width="872" height="317" role="button" title="hariprasad_1-1737533469275.png" alt="hariprasad_1-1737533469275.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not sure whether it's expected behavior of&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;from_json&lt;/STRONG&gt;&lt;/EM&gt; function or it should be fixed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;#DataEngineering&amp;nbsp;&lt;SPAN&gt;#StreamingSessionization&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;#dbdemos&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;#SparkStructuredStreaming&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Hari Prasad&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jan 2025 08:40:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/106608#M42530</guid>
      <dc:creator>hari-prasad</dc:creator>
      <dc:date>2025-01-22T08:40:23Z</dc:date>
    </item>
    <item>
      <title>Re: 'from_json' spark function not parsing value column from Confluent Kafka topic</title>
      <link>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/107730#M42904</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/98469"&gt;@hari-prasad&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;We have a ES ticket that mentions that JSON parsing for structs, maps, and arrays was fixed so that when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. This behavior is optional and can be enabled by setting spark.sql.json.enablePartialResults to true. By default, this flag is disabled to preserve the original behavior.&lt;BR /&gt;This suggests that the default behavior of from_json might not handle certain discrepancies in the JSON data gracefully, leading to null values. Cleaning the string values by replacing leading and trailing double quotes and backslashes indicates that the input data might not have been in the expected format, which could cause parsing issues.&lt;BR /&gt;Therefore, the behavior you encountered might be expected under certain conditions, especially if the input data format does not align perfectly with the expected schema. It may not necessarily be a bug but rather a limitation or characteristic of the default parsing behavior. You can consider enabling the spark.sql.json.enablePartialResults option to see if it improves the parsing behavior in your case.&lt;/P&gt;
&lt;P&gt;Thanks!!&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jan 2025 08:26:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/107730#M42904</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2025-01-30T08:26:29Z</dc:date>
    </item>
    <item>
      <title>Re: 'from_json' spark function not parsing value column from Confluent Kafka topic</title>
      <link>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/107946#M42954</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36707"&gt;@Sidhant07&lt;/a&gt;,&lt;BR /&gt;Thanks for responding, I would try this spark config&amp;nbsp;&lt;EM&gt;spark.sql.json.enablePartialResults&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;But on the other hand, Python JSON library is able to parse the raw json string without additional configs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards!&lt;/P&gt;</description>
      <pubDate>Fri, 31 Jan 2025 03:48:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/107946#M42954</guid>
      <dc:creator>hari-prasad</dc:creator>
      <dc:date>2025-01-31T03:48:34Z</dc:date>
    </item>
    <item>
      <title>Re: 'from_json' spark function not parsing value column from Confluent Kafka topic</title>
      <link>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/108141#M42982</link>
      <description>&lt;P&gt;I am not sure if I read the full explanation but how about this :&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;df&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;.&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'value_str'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;decode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'value'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;'utf-8'&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'value_json'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;explode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;from_json&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'value_str'&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;json_schema&lt;/SPAN&gt;&lt;SPAN&gt;)))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;select&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'value_json&lt;STRONG&gt;.*&lt;/STRONG&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;select&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'user_id'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'userid'&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'platform'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'platform'&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'event_id'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'eventid'&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;F&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'event_date'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'eventdate'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Jan 2025 16:48:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/108141#M42982</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2025-01-31T16:48:40Z</dc:date>
    </item>
    <item>
      <title>Re: 'from_json' spark function not parsing value column from Confluent Kafka topic</title>
      <link>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/108341#M43041</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;, decode won't help as value column is not of binary string and is not encoded with utf-8 or any other unicode. Value is available as stringified JSON without encoding.&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Hari Prasad&lt;/P&gt;</description>
      <pubDate>Sun, 02 Feb 2025 07:01:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/108341#M43041</guid>
      <dc:creator>hari-prasad</dc:creator>
      <dc:date>2025-02-02T07:01:54Z</dc:date>
    </item>
    <item>
      <title>Re: 'from_json' spark function not parsing value column from Confluent Kafka topic</title>
      <link>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/108342#M43042</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36707"&gt;@Sidhant07&lt;/a&gt;,&amp;nbsp;the spark config&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"spark.sql.json.enablePartialResults"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;SPAN&gt; is not helping, I assume it is exception which need to be handle by replace those characters from string to convert.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Hari Prasad&lt;/P&gt;</description>
      <pubDate>Sun, 02 Feb 2025 07:01:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/108342#M43042</guid>
      <dc:creator>hari-prasad</dc:creator>
      <dc:date>2025-02-02T07:01:23Z</dc:date>
    </item>
    <item>
      <title>Re: 'from_json' spark function not parsing value column from Confluent Kafka topic</title>
      <link>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/108554#M43083</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/98469"&gt;@hari-prasad&lt;/a&gt;&amp;nbsp;thanks then I think pre-processing json string using regexp is the right thing to do as you're doing already&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 11:56:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-json-spark-function-not-parsing-value-column-from-confluent/m-p/108554#M43083</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2025-02-03T11:56:42Z</dc:date>
    </item>
  </channel>
</rss>

