cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to extract a full node from an xml string using sql

azl
New Contributor

I have an XML string in my table and I want to extract a full node from it using SQL functions, without needing to parse it in full i.e. from '<a><b><c>c1</c></b></a>' I want '<b><c>c1</c></b>' (preferably) or '<c>c1</c>'.

What little I can find that does not use Spark suggests it should work but I just get null running on DBSQL v 2026.10.

I've tried:
SELECT xpath('<a><b><c>c1</c></b></a>', '/a/b')
SELECT xpath('<a><b><c>c1</c></b></a>', '/a/b/.')
SELECT xpath('<a><b><c>c1</c></b></a>', '/a/b/node()')
SELECT xpath('<a><b><c>c1</c></b></a>', '//b')

Please help.

1 ACCEPTED SOLUTION

Accepted Solutions

azl
New Contributor

Thanks very much.  I had over-simplified my example perhaps, and got stuck on the idea of using xml functions, particularly because I'm converting existing SQL from another DB, and wasn't being open-minded enough.

The actual data has multiple instances of the b element and varying contents of each that I want to retrieve as an array.  Using your first suggestion but with regexp_extract_all works perfectly e.g.

SELECT regexp_extract_all('<a><b><c>c1</c><d>1</d></b><b><c>c2</c><e>x</e></b></a>', '(<b>.*?</b>)', 1) out

 

View solution in original post

2 REPLIES 2

balajij8
Contributor III

xpath returns an array of string values containing text content from matching nodes.

You can follow below

  • Reg Exp Extract
SELECT regexp_extract('<a><b><c>c1</c></b></a>', '(<b>.*?</b>)', 1) out
  • XML Functions & Reg Exp
SELECT trim(

  regexp_replace(

    regexp_replace(

      to_xml(named_struct('b', from_xml('<a><b><c>c1</c></b></a>', 'struct<b:struct<c:string>>', map('rowTag', 'a')).b)),

      '<ROW>|</ROW>',

      ''

    ),

    '>\\s+<',

    '><'

  )

) out
  • Create Python scalar UDF for XML extraction & register it as a UDF and use it from SQL

azl
New Contributor

Thanks very much.  I had over-simplified my example perhaps, and got stuck on the idea of using xml functions, particularly because I'm converting existing SQL from another DB, and wasn't being open-minded enough.

The actual data has multiple instances of the b element and varying contents of each that I want to retrieve as an array.  Using your first suggestion but with regexp_extract_all works perfectly e.g.

SELECT regexp_extract_all('<a><b><c>c1</c><d>1</d></b><b><c>c2</c><e>x</e></b></a>', '(<b>.*?</b>)', 1) out