cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure DBR - Have to load list of json files but the column has special character.(ex: {"hydra:xxxx": {"hydra:value":"yyyy", "hydra:value1":"zzzzz"}

hare
New Contributor III

Azure DBR - Have to load list of json files into data frame and then from DF to data bricks table but the column has special character and getting below error.

Both column(key) and value (as json record) has special characters in the json file.

# Cannot create a table having a nested column whose name contains invalid characters (',', ':', ';') in Hive metastore. Table: `db`.`tablename`; Column: hydra:xxxx

Also needs to convert the struct type into string.

Ex:

{

"@context": "/contexts/xxxxx",

"@id": "/xxxxx",

"@type": "hydra:Collection",

"hydra:xxxx": [

{

"@id": "/xxxxx/1",

"@type": "product",

"type": "update",

"id": null,

"pid": 2,

"pCode": null,

"details": "abcdefgh",

"updt": "2019-01-09T03:04:27+01:00",

"codes": [

"123456789101"

]

},

{

"@id": "/xxxxx/2",

"@type": "product",

"type": "update",

"id": null,

"pid": 2,

"pCode": null,

"details": "abcdefgh",

"updt": "2019-01-09T03:04:27+01:00",

"codes": [

"123456789101"

]

}

],

"hydra:totalItems": 2,

"hydra:view": {

"@id": "/xxxxx?page=1",

"@type": "hydra:vw",

"hydra:first": "/xxxxx?page=1",

"hydra:last": "/xxxxx?page=12",

"hydra:next": "/xxxxx?page=245"

},

"hydra:search": {

"@type": "hydra:IriTemplate",

"hydra:template": "xxxxxxxxxxxxxxxxx",

"hydra:variableRepresentation": "BasicRepresentation",

"hydra:mapping": [

{

"hydra:value": "zzzzz"

"@type": "xx",

"variable": "yy[before]",

"property": "yy",

"required": false

},

{

"hydra:value": "ssssss"

"@type": "sf",

"variable": "asf",

"property": "sfs",

"required": false

}

]

}

}

Please provide your suggestion to resolve it. I am using pyspark in azure databricks.

4 REPLIES 4

-werners-
Esteemed Contributor III

You can rename the columns like here on SO (pyspark) or here in scala.

You can also define your schema manually.

To flatten the json struct, there is for example the explode function.

Interesting read.

hare
New Contributor III

Thanks for your solution werners. Yes,we can able to rename the column as you suggested. But the problem here is the value of the column is also a json (Array of struct).

For an instance;

"hydra:mapping" - is a column which has the value called "hydra:value". Here, we have to rename the "hydra:value" as "hydra_value".

"hydra:mapping": [

{

"hydra:value": "zzzzz"

"@type": "xx",

"variable": "yy[before]",

"property": "yy",

"required": false

},

Hubert-Dudek
Esteemed Contributor III

The best is just define schema manually. There is nice article from person who had exactly the same problem https://towardsdev.com/create-a-spark-hive-meta-store-table-using-nested-json-with-invalid-field-nam...

hare
New Contributor III

Thanks for the solution @Hubert Dudek​ . In our scenario we couldn't able to define the schema manually since schema may vary file to file. So i am thinking how to do it dynamically based on the Json schema. Please do let me know if you have any taught on this.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group