<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {&amp;quot;hydra:xxxx&amp;quot;: {&amp;quot;hydra:value&amp;quot;:&amp;quot;yyyy&amp;quot;,
&amp;quot;hydra:value1&amp;quot;:&amp;quot;zzzzz&amp;quot;} in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26901#M18894</link>
    <description>&lt;P&gt;The best is just define schema manually. There is nice article from person who had exactly the same problem &lt;A href="https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-names-505f215eb5bf" target="test_blank"&gt;https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-names-505f215eb5bf&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 26 Feb 2022 18:05:47 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2022-02-26T18:05:47Z</dc:date>
    <item>
      <title>Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy",
"hydra:value1":"zzzzz"}</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26899#M18892</link>
      <description>&lt;P&gt;Azure DBR - Have to load list of json files into data frame and then from DF to  data bricks table but the column has special character and getting below error.&lt;/P&gt;&lt;P&gt;&lt;B&gt;Both column(key) and value (as json record) has special characters in the json file. &lt;/B&gt;&lt;/P&gt;&lt;P&gt;# Cannot create a table having a nested column whose name contains invalid characters (',', ':', ';') in Hive metastore. Table: `db`.`tablename`; Column: hydra:xxxx&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Also needs to convert the struct type into string.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Ex:&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;	"@context": "/contexts/xxxxx",&lt;/P&gt;&lt;P&gt;	"@id": "/xxxxx",&lt;/P&gt;&lt;P&gt;	"@type": "hydra:Collection",&lt;/P&gt;&lt;P&gt;	&lt;B&gt;"hydra:xxxx"&lt;/B&gt;: [&lt;/P&gt;&lt;P&gt;		{&lt;/P&gt;&lt;P&gt;			"@id": "/xxxxx/1",&lt;/P&gt;&lt;P&gt;			"@type": "product",&lt;/P&gt;&lt;P&gt;			"type": "update",&lt;/P&gt;&lt;P&gt;			"id": null,&lt;/P&gt;&lt;P&gt;			"pid": 2,&lt;/P&gt;&lt;P&gt;			"pCode": null,&lt;/P&gt;&lt;P&gt;			"details": "abcdefgh",&lt;/P&gt;&lt;P&gt;			"updt": "2019-01-09T03:04:27+01:00",&lt;/P&gt;&lt;P&gt;			"codes": [&lt;/P&gt;&lt;P&gt;				"123456789101"&lt;/P&gt;&lt;P&gt;			]&lt;/P&gt;&lt;P&gt;		},&lt;/P&gt;&lt;P&gt;				{&lt;/P&gt;&lt;P&gt;			"@id": "/xxxxx/2",&lt;/P&gt;&lt;P&gt;			"@type": "product",&lt;/P&gt;&lt;P&gt;			"type": "update",&lt;/P&gt;&lt;P&gt;			"id": null,&lt;/P&gt;&lt;P&gt;			"pid": 2,&lt;/P&gt;&lt;P&gt;			"pCode": null,&lt;/P&gt;&lt;P&gt;			"details": "abcdefgh",&lt;/P&gt;&lt;P&gt;			"updt": "2019-01-09T03:04:27+01:00",&lt;/P&gt;&lt;P&gt;			"codes": [&lt;/P&gt;&lt;P&gt;				"123456789101"&lt;/P&gt;&lt;P&gt;			]&lt;/P&gt;&lt;P&gt;		}&lt;/P&gt;&lt;P&gt;		&lt;/P&gt;&lt;P&gt;	],&lt;/P&gt;&lt;P&gt;	"hydra:totalItems": 2,&lt;/P&gt;&lt;P&gt;	&lt;B&gt;"hydra:view"&lt;/B&gt;: {&lt;/P&gt;&lt;P&gt;		"@id": "/xxxxx?page=1",&lt;/P&gt;&lt;P&gt;		"@type": "hydra:vw",&lt;/P&gt;&lt;P&gt;		&lt;B&gt;"hydra:first&lt;/B&gt;": "/xxxxx?page=1",&lt;/P&gt;&lt;P&gt;		&lt;B&gt;"hydra:last"&lt;/B&gt;: "/xxxxx?page=12",&lt;/P&gt;&lt;P&gt;		&lt;B&gt;"hydra:next"&lt;/B&gt;: "/xxxxx?page=245"&lt;/P&gt;&lt;P&gt;	},&lt;/P&gt;&lt;P&gt;	"hydra:search": {&lt;/P&gt;&lt;P&gt;		"@type": "hydra:IriTemplate",&lt;/P&gt;&lt;P&gt;		"hydra:template": "xxxxxxxxxxxxxxxxx",&lt;/P&gt;&lt;P&gt;		"hydra:variableRepresentation": "BasicRepresentation",&lt;/P&gt;&lt;P&gt;		&lt;B&gt;"hydra:mapping"&lt;/B&gt;: [&lt;/P&gt;&lt;P&gt;			{&lt;/P&gt;&lt;P&gt;                                &lt;B&gt;"hydra:value"&lt;/B&gt;: "zzzzz"&lt;/P&gt;&lt;P&gt;				&lt;B&gt;"@type"&lt;/B&gt;: "xx",&lt;/P&gt;&lt;P&gt;				"variable": "yy[before]",&lt;/P&gt;&lt;P&gt;				"property": "yy",&lt;/P&gt;&lt;P&gt;				"required": false&lt;/P&gt;&lt;P&gt;			},&lt;/P&gt;&lt;P&gt;			{&lt;/P&gt;&lt;P&gt;                                "hydra:value": "ssssss"&lt;/P&gt;&lt;P&gt;				"@type": "sf",&lt;/P&gt;&lt;P&gt;				"variable": "asf",&lt;/P&gt;&lt;P&gt;				"property": "sfs",&lt;/P&gt;&lt;P&gt;				"required": false&lt;/P&gt;&lt;P&gt;			}&lt;/P&gt;&lt;P&gt;		]&lt;/P&gt;&lt;P&gt;	}&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please provide your suggestion to resolve it. I am using pyspark in azure databricks. &lt;/P&gt;</description>
      <pubDate>Sat, 26 Feb 2022 05:52:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26899#M18892</guid>
      <dc:creator>hare</dc:creator>
      <dc:date>2022-02-26T05:52:19Z</dc:date>
    </item>
    <item>
      <title>Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy",
"hydra:value1":"zzzzz"}</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26900#M18893</link>
      <description>&lt;P&gt;You can rename the columns like &lt;A href="https://stackoverflow.com/questions/62441435/pyspark-remove-special-characters-in-all-column-names-for-all-special-characters" alt="https://stackoverflow.com/questions/62441435/pyspark-remove-special-characters-in-all-column-names-for-all-special-characters" target="_blank"&gt;here on SO&lt;/A&gt; (pyspark) or &lt;A href="https://stackoverflow.com/questions/36018072/replacing-whitespace-in-all-column-names-in-spark-dataframe" alt="https://stackoverflow.com/questions/36018072/replacing-whitespace-in-all-column-names-in-spark-dataframe" target="_blank"&gt;here in scala&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;You can also define your schema manually.&lt;/P&gt;&lt;P&gt;To flatten the json struct, there is for example the explode function.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/databricks/kb/scala/flatten-nested-columns-dynamically" alt="https://docs.microsoft.com/en-us/azure/databricks/kb/scala/flatten-nested-columns-dynamically" target="_blank"&gt;Interesting read.&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Feb 2022 07:43:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26900#M18893</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-02-26T07:43:26Z</dc:date>
    </item>
    <item>
      <title>Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy",
"hydra:value1":"zzzzz"}</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26901#M18894</link>
      <description>&lt;P&gt;The best is just define schema manually. There is nice article from person who had exactly the same problem &lt;A href="https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-names-505f215eb5bf" target="test_blank"&gt;https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-names-505f215eb5bf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Feb 2022 18:05:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26901#M18894</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-02-26T18:05:47Z</dc:date>
    </item>
    <item>
      <title>Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy",
"hydra:value1":"zzzzz"}</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26902#M18895</link>
      <description>&lt;P&gt;Thanks for your solution werners. Yes,we can able to rename the column as you suggested. But the problem here is the value of the column is also a json (Array of struct). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For an instance;&lt;/P&gt;&lt;P&gt;&lt;B&gt;"hydra:mapping" - &lt;/B&gt;is a column which has the value called &lt;B&gt;"hydra:value".  &lt;/B&gt; Here, we have to rename the  &lt;B&gt;"hydra:value" as &lt;/B&gt; &lt;B&gt;"hydra_value".&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;"hydra:mapping"&lt;/B&gt;: [&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;&lt;B&gt;"hydra:value"&lt;/B&gt;: "zzzzz"&lt;/P&gt;&lt;P&gt;&lt;B&gt;"@type"&lt;/B&gt;: "xx",&lt;/P&gt;&lt;P&gt;"variable": "yy[before]",&lt;/P&gt;&lt;P&gt;"property": "yy",&lt;/P&gt;&lt;P&gt;"required": false&lt;/P&gt;&lt;P&gt;},&lt;/P&gt;</description>
      <pubDate>Mon, 28 Feb 2022 16:37:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26902#M18895</guid>
      <dc:creator>hare</dc:creator>
      <dc:date>2022-02-28T16:37:01Z</dc:date>
    </item>
    <item>
      <title>Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy",
"hydra:value1":"zzzzz"}</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26903#M18896</link>
      <description>&lt;P&gt;Thanks for the solution @Hubert Dudek​&amp;nbsp;. In our scenario we couldn't able to define the schema manually since schema may vary file to file. So i am thinking how to do it dynamically based on the Json schema. Please do let me know if you have any taught on this.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Feb 2022 16:39:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-dbr-have-to-load-list-of-json-files-but-the-column-has/m-p/26903#M18896</guid>
      <dc:creator>hare</dc:creator>
      <dc:date>2022-02-28T16:39:06Z</dc:date>
    </item>
  </channel>
</rss>

