cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

hare
New Contributor III

Azure DBR - Have to load list of json files into data frame and then from DF to data bricks table but the column has special character and getting below error.

Both column(key) and value (as json record) has special characters in the json file.

# Cannot create a table having a nested column whose name contains invalid characters (',', ':', ';') in Hive metastore. Table: `db`.`tablename`; Column: hydra:xxxx

Also needs to convert the struct type into string.

Ex:

{

"@context": "/contexts/xxxxx",

"@id": "/xxxxx",

"@type": "hydra:Collection",

"hydra:xxxx": [

{

"@id": "/xxxxx/1",

"@type": "product",

"type": "update",

"id": null,

"pid": 2,

"pCode": null,

"details": "abcdefgh",

"updt": "2019-01-09T03:04:27+01:00",

"codes": [

"123456789101"

]

},

{

"@id": "/xxxxx/2",

"@type": "product",

"type": "update",

"id": null,

"pid": 2,

"pCode": null,

"details": "abcdefgh",

"updt": "2019-01-09T03:04:27+01:00",

"codes": [

"123456789101"

]

}

],

"hydra:totalItems": 2,

"hydra:view": {

"@id": "/xxxxx?page=1",

"@type": "hydra:vw",

"hydra:first": "/xxxxx?page=1",

"hydra:last": "/xxxxx?page=12",

"hydra:next": "/xxxxx?page=245"

},

"hydra:search": {

"@type": "hydra:IriTemplate",

"hydra:template": "xxxxxxxxxxxxxxxxx",

"hydra:variableRepresentation": "BasicRepresentation",

"hydra:mapping": [

{

"hydra:value": "zzzzz"

"@type": "xx",

"variable": "yy[before]",

"property": "yy",

"required": false

},

{

"hydra:value": "ssssss"

"@type": "sf",

"variable": "asf",

"property": "sfs",

"required": false

}

]

}

}

Please provide your suggestion to resolve it. I am using pyspark in azure databricks.

4 REPLIES 4

-werners-
Esteemed Contributor III

You can rename the columns like here on SO (pyspark) or here in scala.

You can also define your schema manually.

To flatten the json struct, there is for example the explode function.

Interesting read.

hare
New Contributor III

Thanks for your solution werners. Yes,we can able to rename the column as you suggested. But the problem here is the value of the column is also a json (Array of struct).

For an instance;

"hydra:mapping" - is a column which has the value called "hydra:value". Here, we have to rename the "hydra:value" as "hydra_value".

"hydra:mapping": [

{

"hydra:value": "zzzzz"

"@type": "xx",

"variable": "yy[before]",

"property": "yy",

"required": false

},

Hubert-Dudek
Esteemed Contributor III

The best is just define schema manually. There is nice article from person who had exactly the same problem https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-nam...

hare
New Contributor III

Thanks for the solution @Hubert Dudek​ . In our scenario we couldn't able to define the schema manually since schema may vary file to file. So i am thinking how to do it dynamically based on the Json schema. Please do let me know if you have any taught on this.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.