How To Parse a XML Column with string data type into multiple sql columns
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2024 07:25 AM
Hi,
I have a table with XML data in it which is stored in a column with STRING datatype. Can someone please help me on how to parse this XML into multiple sql columns.Below is the sample XML Table & desired output data
Select * from default.SampleData
XMLData
'Status': {
'Code': 'Ok',
'Message': None,
'Errors': None
},
'RequestReference': None,
'Response': {
'IdNumber': '295',
'SearchKeys': {
'Subject': {
'FirstName': 'John',
'MiddleName': None,
'Surname': 'Ross',
'DateOfBirth': datetime.datetime(1900, 1, 31, 0, 0),
'Gender': 'M'
},
'Address': {
'AddressLine1': '123 old street',
'AddressLine2': 'high street',
'AddressLine3': '',
'AddressLine4': None,
'AddressLine5': None,
'Postcode': '12345',
'Country': None
},
'Phone': None,
'DrivingLicence': None,
'Bank': None,
'ConsentFlag': True
},
Expected output columns
IDNumber FirstName MiddleName LastName DateOfBirth AddressLine1 Phone
295 John Ross 1900-01-01 123 Highstreet None
Can someone please provide any ideas around this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2024 07:31 AM
Have you checked the parse method listed in doc https://docs.databricks.com/en/query/formats/xml.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-13-2024 12:49 AM - edited 12-13-2024 12:50 AM
Hi @LearnDB1234 ,
Are you sure that this column stores xml as a string? To me it looks more like json string 🙂
If so, you can use new VARIANT data type through parse_json function:
%sql
WITH src AS (
SELECT parse_json('{
"Status": {
"Code": "Ok",
"Message": null,
"Errors": null
},
"RequestReference": null,
"Response": {
"IdNumber": "295",
"SearchKeys": {
"Subject": {
"FirstName": "John",
"MiddleName": null,
"Surname": "Ross",
"DateOfBirth": "1900-01-31T00:00:00",
"Gender": "M"
},
"Address": {
"AddressLine1": "123 old street",
"AddressLine2": "high street",
"AddressLine3": "",
"AddressLine4": null,
"AddressLine5": null,
"Postcode": "12345",
"Country": null
},
"Phone": null,
"DrivingLicence": null,
"Bank": null,
"ConsentFlag": true
}
}
}') as json_column
)
SELECT
json_column:Response.IdNumber::INTEGER as id_number,
json_column:Response.SearchKeys.Subject.FirstName::STRING as first_name,
json_column:Response.SearchKeys.Subject.MiddleName::STRING as middle_name,
json_column:Response.SearchKeys.Subject.Surname::STRING as last_name,
json_column:Response.SearchKeys.Subject.DateOfBirth::STRING as date_of_birth,
json_column:Response.SearchKeys.Address.AddressLine1::STRING as address_line1,
json_column:Response.SearchKeys.Phone::STRING as phone
FROM src
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-13-2024 09:05 AM
Even though the data in the column seems like a JSON formatted the data type is string, so using your code and suggestions from databricks below is the code i have tried below code:

