How would i retrieve data JSON data with namespaces using spark SQL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2022 10:17 PM
File.json from the below code contains huge JSON data with each key containing namespace prefix(This JSON file converted from the XML file).
I could able to retrieve if JSON does not contain namespaces but what could be the approach to retrieve records/values if each key is with a namespace prefix?
jsondf = spark.read.json("<path>/file.json")
#jsondf.printSchema()
jsondf.createOrReplaceTempView("ramp")
elements = spark.sql("SELECT * FROM ramp")
elements.show()
Here,I wanted to retrieve records of w: document/w.body/w:p. I tried different ways but nothing is working. Any suggestions really helpful
- Labels:
-
JSON
-
Pyspark
-
Pyspark Dataframe
-
Python
-
SQL
-
SQL Statements
-
Xml
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-22-2022 03:45 AM
HI @Ramesh Bathini ,
not sure what you have tried, but maybe you can try this way:
select `w:document`.`w.body`.`w:p` from ramp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-22-2022 09:48 AM
Thanks alot @Pat Sienkiewicz for your response. It works for me
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-29-2022 12:45 PM
I case of struct you can use (.) For extracting the value

