Sidhant07
Databricks Employee
Databricks Employee

Hi @hari-prasad ,

We have a ES ticket that mentions that JSON parsing for structs, maps, and arrays was fixed so that when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. This behavior is optional and can be enabled by setting spark.sql.json.enablePartialResults to true. By default, this flag is disabled to preserve the original behavior.
This suggests that the default behavior of from_json might not handle certain discrepancies in the JSON data gracefully, leading to null values. Cleaning the string values by replacing leading and trailing double quotes and backslashes indicates that the input data might not have been in the expected format, which could cause parsing issues.
Therefore, the behavior you encountered might be expected under certain conditions, especially if the input data format does not align perfectly with the expected schema. It may not necessarily be a bug but rather a limitation or characteristic of the default parsing behavior. You can consider enabling the spark.sql.json.enablePartialResults option to see if it improves the parsing behavior in your case.

Thanks!!