<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: split parse_url output for the information in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/split-parse-url-output-for-the-information/m-p/134132#M50029</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/74890"&gt;@turagittech&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;Honestly, it all depends on how complex your URLs can get.&lt;/P&gt;&lt;P class=""&gt;UDFs will always be &lt;SPAN class=""&gt;&lt;STRONG&gt;more flexible&lt;/STRONG&gt;&lt;/SPAN&gt; but &lt;SPAN class=""&gt;&lt;STRONG&gt;less performant&lt;/STRONG&gt;&lt;/SPAN&gt; than native SQL functions.&lt;/P&gt;&lt;P class=""&gt;That said, if your team mainly works with SQL, trying to solve it natively in Databricks SQL is definitely a valid option.&lt;/P&gt;&lt;P class=""&gt;For the &lt;I&gt;path&lt;/I&gt; part, it really depends on how much detail you actually need to keep — whether you just need the last element or the full structure.&lt;/P&gt;&lt;P class=""&gt;But for the &lt;I&gt;query string&lt;/I&gt; part, I’d recommend building a &lt;SPAN class=""&gt;&lt;STRONG&gt;map of key-value pairs&lt;/STRONG&gt;&lt;/SPAN&gt;, which works really well in SQL:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;map_from_arrays(
  transform(split(parse_url(url, 'QUERY'), '&amp;amp;'), x -&amp;gt; split_part(x, '=', 1)),
  transform(split(parse_url(url, 'QUERY'), '&amp;amp;'), x -&amp;gt; split_part(x, '=', 2))
) AS query_map&lt;/LI-CODE&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Captura de pantalla 2025-10-08 a las 1.57.34.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20542iC08411A9C0745980/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Captura de pantalla 2025-10-08 a las 1.57.34.png" alt="Captura de pantalla 2025-10-08 a las 1.57.34.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P class=""&gt;If you could share an example of a more complex URL, we could probably build something more tailored that handles both the path and query parts more dynamically. &lt;BR /&gt;&lt;BR /&gt;Hope this helps, &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 08 Oct 2025 00:00:13 GMT</pubDate>
    <dc:creator>Isi</dc:creator>
    <dc:date>2025-10-08T00:00:13Z</dc:date>
    <item>
      <title>split parse_url output for the information</title>
      <link>https://community.databricks.com/t5/data-engineering/split-parse-url-output-for-the-information/m-p/134129#M50028</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I have data in blobs which I am loading from blob store to Databricks delta tables. One of the blob types contains urls. From the Urls I want to extract knowledge from the path and query parts I can get those out easily with parse url. the problem is with extracting the bits from that. As I am writing this I am thinking I'll need a couple of udfs as things like split while useful; are then still hard and that the part I am getting stuck with I can do the split part but the only reliable one is the -1 value, I'd like all of them. Same as the output of the query is as you would expect still not clean and ends up with label=value pairs.&lt;/P&gt;&lt;P&gt;If anyone has any tips, best functions etc that would be awesome.&lt;/P&gt;&lt;P&gt;I assume I'm not the first who has chomped urls for the information. I am using sql to make it manageable by people with sql experience in my absence&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt;&lt;SPAN&gt; content, &amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;split_part(parse_url(content:&lt;/SPAN&gt;&lt;SPAN&gt;Url&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'PATH'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;'/'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;-1&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; feature2,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;split_part(parse_url(content:&lt;/SPAN&gt;&lt;SPAN&gt;Url&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'PATH'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;'/'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; feature1,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;parse_url(content:&lt;/SPAN&gt;&lt;SPAN&gt;Url&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'QUERY'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt; &lt;SPAN&gt;url&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;split(parse_url(content:&lt;/SPAN&gt;&lt;SPAN&gt;Url&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'QUERY'&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;'&amp;amp;'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; query_values&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;FROM&lt;/SPAN&gt; &amp;lt;table&amp;gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 07 Oct 2025 23:26:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/split-parse-url-output-for-the-information/m-p/134129#M50028</guid>
      <dc:creator>turagittech</dc:creator>
      <dc:date>2025-10-07T23:26:25Z</dc:date>
    </item>
    <item>
      <title>Re: split parse_url output for the information</title>
      <link>https://community.databricks.com/t5/data-engineering/split-parse-url-output-for-the-information/m-p/134132#M50029</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/74890"&gt;@turagittech&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;Honestly, it all depends on how complex your URLs can get.&lt;/P&gt;&lt;P class=""&gt;UDFs will always be &lt;SPAN class=""&gt;&lt;STRONG&gt;more flexible&lt;/STRONG&gt;&lt;/SPAN&gt; but &lt;SPAN class=""&gt;&lt;STRONG&gt;less performant&lt;/STRONG&gt;&lt;/SPAN&gt; than native SQL functions.&lt;/P&gt;&lt;P class=""&gt;That said, if your team mainly works with SQL, trying to solve it natively in Databricks SQL is definitely a valid option.&lt;/P&gt;&lt;P class=""&gt;For the &lt;I&gt;path&lt;/I&gt; part, it really depends on how much detail you actually need to keep — whether you just need the last element or the full structure.&lt;/P&gt;&lt;P class=""&gt;But for the &lt;I&gt;query string&lt;/I&gt; part, I’d recommend building a &lt;SPAN class=""&gt;&lt;STRONG&gt;map of key-value pairs&lt;/STRONG&gt;&lt;/SPAN&gt;, which works really well in SQL:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;map_from_arrays(
  transform(split(parse_url(url, 'QUERY'), '&amp;amp;'), x -&amp;gt; split_part(x, '=', 1)),
  transform(split(parse_url(url, 'QUERY'), '&amp;amp;'), x -&amp;gt; split_part(x, '=', 2))
) AS query_map&lt;/LI-CODE&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Captura de pantalla 2025-10-08 a las 1.57.34.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20542iC08411A9C0745980/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Captura de pantalla 2025-10-08 a las 1.57.34.png" alt="Captura de pantalla 2025-10-08 a las 1.57.34.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P class=""&gt;If you could share an example of a more complex URL, we could probably build something more tailored that handles both the path and query parts more dynamically. &lt;BR /&gt;&lt;BR /&gt;Hope this helps, &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Oct 2025 00:00:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/split-parse-url-output-for-the-information/m-p/134132#M50029</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-10-08T00:00:13Z</dc:date>
    </item>
  </channel>
</rss>

