topic Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"} in Data Engineering

Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

hare — Sat, 26 Feb 2022 05:52:19 GMT

Azure DBR - Have to load list of json files into data frame and then from DF to data bricks table but the column has special character and getting below error.

Both column(key) and value (as json record) has special characters in the json file.

# Cannot create a table having a nested column whose name contains invalid characters (',', ':', ';') in Hive metastore. Table: `db`.`tablename`; Column: hydra:xxxx

Also needs to convert the struct type into string.

Ex:

{

"@context": "/contexts/xxxxx",

"@id": "/xxxxx",

"@type": "hydra:Collection",

"hydra:xxxx": [

{

"@id": "/xxxxx/1",

"@type": "product",

"type": "update",

"id": null,

"pid": 2,

"pCode": null,

"details": "abcdefgh",

"updt": "2019-01-09T03:04:27+01:00",

"codes": [

"123456789101"

]

{

"@id": "/xxxxx/2",

"@type": "product",

"type": "update",

"id": null,

"pid": 2,

"pCode": null,

"details": "abcdefgh",

"updt": "2019-01-09T03:04:27+01:00",

"codes": [

"123456789101"

]

}

"hydra:totalItems": 2,

"hydra:view": {

"@id": "/xxxxx?page=1",

"@type": "hydra:vw",

"hydra:first": "/xxxxx?page=1",

"hydra:last": "/xxxxx?page=12",

"hydra:next": "/xxxxx?page=245"

"hydra:search": {

"@type": "hydra:IriTemplate",

"hydra:template": "xxxxxxxxxxxxxxxxx",

"hydra:variableRepresentation": "BasicRepresentation",

"hydra:mapping": [

{

"hydra:value": "zzzzz"

"@type": "xx",

"variable": "yy[before]",

"property": "yy",

"required": false

{

"hydra:value": "ssssss"

"@type": "sf",

"variable": "asf",

"property": "sfs",

"required": false

}

]

}

Please provide your suggestion to resolve it. I am using pyspark in azure databricks.

Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

-werners- — Sat, 26 Feb 2022 07:43:26 GMT

You can rename the columns like here on SO (pyspark) or here in scala.

You can also define your schema manually.

To flatten the json struct, there is for example the explode function.

Interesting read.

Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

Hubert-Dudek — Sat, 26 Feb 2022 18:05:47 GMT

The best is just define schema manually. There is nice article from person who had exactly the same problem https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-names-505f215eb5bf

Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

hare — Mon, 28 Feb 2022 16:37:01 GMT

Thanks for your solution werners. Yes,we can able to rename the column as you suggested. But the problem here is the value of the column is also a json (Array of struct).

For an instance;

"hydra:mapping" - is a column which has the value called "hydra:value". Here, we have to rename the "hydra:value" as "hydra_value".

"hydra:mapping": [

{

"hydra:value": "zzzzz"

"@type": "xx",

"variable": "yy[before]",

"property": "yy",

"required": false

Re: Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

hare — Mon, 28 Feb 2022 16:39:06 GMT

Thanks for the solution @Hubert Dudek . In our scenario we couldn't able to define the schema manually since schema may vary file to file. So i am thinking how to do it dynamically based on the Json schema. Please do let me know if you have any taught on this.