<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How To Parse a XML Column with string data type into multiple sql columns in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/102106#M40966</link>
    <description>&lt;P&gt;Even though the data in the column seems like a JSON formatted the data type is string, so using your code and suggestions from databricks below is the code i have tried below code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; get_json_object(XMLData, &lt;/SPAN&gt;&lt;SPAN&gt;'$.Response.IdNumber'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; id_number,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; get_json_object(XMLData, &lt;/SPAN&gt;&lt;SPAN&gt;'$.Status.Code:'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt; &lt;SPAN&gt;Code&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;FROM&lt;/SPAN&gt; &lt;SPAN&gt;default&lt;/SPAN&gt;&lt;SPAN&gt;.sampledata;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I get NULL values in the output, am i missing something in here?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Fri, 13 Dec 2024 17:05:44 GMT</pubDate>
    <dc:creator>LearnDB1234</dc:creator>
    <dc:date>2024-12-13T17:05:44Z</dc:date>
    <item>
      <title>How To Parse a XML Column with string data type into multiple sql columns</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/101947#M40905</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a table with XML data in it which is stored in a column with STRING datatype. Can someone please help me on how to parse this XML into multiple sql columns.Below is the sample XML Table &amp;amp; desired output data&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Select * from default.SampleData

XMLData
'Status': {
        'Code': 'Ok',
        'Message': None,
        'Errors': None
    },
    'RequestReference': None,
    'Response': {
        'IdNumber': '295',
        'SearchKeys': {
            'Subject': {
                'FirstName': 'John',
                'MiddleName': None,
                'Surname': 'Ross',
                'DateOfBirth': datetime.datetime(1900, 1, 31, 0, 0),
                'Gender': 'M'
            },
            'Address': {
                'AddressLine1': '123 old street',
                'AddressLine2': 'high street',
                'AddressLine3': '',
                'AddressLine4': None,
                'AddressLine5': None,
                'Postcode': '12345',
                'Country': None
            },
            'Phone': None,
            'DrivingLicence': None,
            'Bank': None,
            'ConsentFlag': True
        },&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Expected output columns&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;IDNumber   FirstName   MiddleName   LastName  DateOfBirth  AddressLine1     Phone 
295        John                      Ross      1900-01-01   123 Highstreet   None&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can someone please provide any ideas around this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 15:25:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/101947#M40905</guid>
      <dc:creator>LearnDB1234</dc:creator>
      <dc:date>2024-12-12T15:25:35Z</dc:date>
    </item>
    <item>
      <title>Re: How To Parse a XML Column with string data type into multiple sql columns</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/101950#M40906</link>
      <description>&lt;P&gt;Have you checked the parse method listed in doc&amp;nbsp;&lt;A href="https://docs.databricks.com/en/query/formats/xml.html" target="_blank"&gt;https://docs.databricks.com/en/query/formats/xml.html&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 15:31:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/101950#M40906</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-12T15:31:31Z</dc:date>
    </item>
    <item>
      <title>Re: How To Parse a XML Column with string data type into multiple sql columns</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/102018#M40934</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136231"&gt;@LearnDB1234&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Are you sure that this column stores xml as a string? To me it looks more like json&amp;nbsp; string &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;If so, you can use new VARIANT data type through parse_json function:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;%sql
WITH src AS (

  SELECT parse_json('{
    "Status": {
        "Code": "Ok",
        "Message": null,
        "Errors": null
    },
    "RequestReference": null,
    "Response": {
        "IdNumber": "295",
        "SearchKeys": {
            "Subject": {
                "FirstName": "John",
                "MiddleName": null,
                "Surname": "Ross",
                "DateOfBirth": "1900-01-31T00:00:00",
                "Gender": "M"
            },
            "Address": {
                "AddressLine1": "123 old street",
                "AddressLine2": "high street",
                "AddressLine3": "",
                "AddressLine4": null,
                "AddressLine5": null,
                "Postcode": "12345",
                "Country": null
            },
            "Phone": null,
            "DrivingLicence": null,
            "Bank": null,
            "ConsentFlag": true
        }
    }
}') as json_column
)
SELECT 
    json_column:Response.IdNumber::INTEGER as id_number,
    json_column:Response.SearchKeys.Subject.FirstName::STRING as first_name,
    json_column:Response.SearchKeys.Subject.MiddleName::STRING as middle_name,
    json_column:Response.SearchKeys.Subject.Surname::STRING as last_name,
    json_column:Response.SearchKeys.Subject.DateOfBirth::STRING as date_of_birth,
    json_column:Response.SearchKeys.Address.AddressLine1::STRING as address_line1,
    json_column:Response.SearchKeys.Phone::STRING as phone
FROM src&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 08:50:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/102018#M40934</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-12-13T08:50:23Z</dc:date>
    </item>
    <item>
      <title>Re: How To Parse a XML Column with string data type into multiple sql columns</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/102106#M40966</link>
      <description>&lt;P&gt;Even though the data in the column seems like a JSON formatted the data type is string, so using your code and suggestions from databricks below is the code i have tried below code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; get_json_object(XMLData, &lt;/SPAN&gt;&lt;SPAN&gt;'$.Response.IdNumber'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; id_number,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; get_json_object(XMLData, &lt;/SPAN&gt;&lt;SPAN&gt;'$.Status.Code:'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt; &lt;SPAN&gt;Code&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;FROM&lt;/SPAN&gt; &lt;SPAN&gt;default&lt;/SPAN&gt;&lt;SPAN&gt;.sampledata;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I get NULL values in the output, am i missing something in here?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 13 Dec 2024 17:05:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-parse-a-xml-column-with-string-data-type-into-multiple/m-p/102106#M40966</guid>
      <dc:creator>LearnDB1234</dc:creator>
      <dc:date>2024-12-13T17:05:44Z</dc:date>
    </item>
  </channel>
</rss>

