cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How To Parse a XML Column with string data type into multiple sql columns

LearnDB1234
New Contributor

Hi,

I have a table with XML data in it which is stored in a column with STRING datatype. Can someone please help me on how to parse this XML into multiple sql columns.Below is the sample XML Table & desired output data

 

 

Select * from default.SampleData

XMLData
'Status': {
        'Code': 'Ok',
        'Message': None,
        'Errors': None
    },
    'RequestReference': None,
    'Response': {
        'IdNumber': '295',
        'SearchKeys': {
            'Subject': {
                'FirstName': 'John',
                'MiddleName': None,
                'Surname': 'Ross',
                'DateOfBirth': datetime.datetime(1900, 1, 31, 0, 0),
                'Gender': 'M'
            },
            'Address': {
                'AddressLine1': '123 old street',
                'AddressLine2': 'high street',
                'AddressLine3': '',
                'AddressLine4': None,
                'AddressLine5': None,
                'Postcode': '12345',
                'Country': None
            },
            'Phone': None,
            'DrivingLicence': None,
            'Bank': None,
            'ConsentFlag': True
        },

 

 

Expected output columns 

 

 

IDNumber   FirstName   MiddleName   LastName  DateOfBirth  AddressLine1     Phone 
295        John                      Ross      1900-01-01   123 Highstreet   None

 

 

Can someone please provide any ideas around this.

 

3 REPLIES 3

Walter_C
Databricks Employee
Databricks Employee

Have you checked the parse method listed in doc https://docs.databricks.com/en/query/formats/xml.html 

 

szymon_dybczak
Contributor III

Hi @LearnDB1234 ,

Are you sure that this column stores xml as a string? To me it looks more like json  string 🙂
If so, you can use new VARIANT data type through parse_json function:

 

 

%sql
WITH src AS (

  SELECT parse_json('{
    "Status": {
        "Code": "Ok",
        "Message": null,
        "Errors": null
    },
    "RequestReference": null,
    "Response": {
        "IdNumber": "295",
        "SearchKeys": {
            "Subject": {
                "FirstName": "John",
                "MiddleName": null,
                "Surname": "Ross",
                "DateOfBirth": "1900-01-31T00:00:00",
                "Gender": "M"
            },
            "Address": {
                "AddressLine1": "123 old street",
                "AddressLine2": "high street",
                "AddressLine3": "",
                "AddressLine4": null,
                "AddressLine5": null,
                "Postcode": "12345",
                "Country": null
            },
            "Phone": null,
            "DrivingLicence": null,
            "Bank": null,
            "ConsentFlag": true
        }
    }
}') as json_column
)
SELECT 
    json_column:Response.IdNumber::INTEGER as id_number,
    json_column:Response.SearchKeys.Subject.FirstName::STRING as first_name,
    json_column:Response.SearchKeys.Subject.MiddleName::STRING as middle_name,
    json_column:Response.SearchKeys.Subject.Surname::STRING as last_name,
    json_column:Response.SearchKeys.Subject.DateOfBirth::STRING as date_of_birth,
    json_column:Response.SearchKeys.Address.AddressLine1::STRING as address_line1,
    json_column:Response.SearchKeys.Phone::STRING as phone
FROM src

 

Even though the data in the column seems like a JSON formatted the data type is string, so using your code and suggestions from databricks below is the code i have tried below code:

 

SELECT *,
    get_json_object(XMLData, '$.Response.IdNumber') AS id_number,
    get_json_object(XMLData, '$.Status.Code:') AS Code
FROM default.sampledata;
 
I get NULL values in the output, am i missing something in here?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group