<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to make backreferences in regexp_replace repl string work correctly in Databricks SQL? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-make-backreferences-in-regexp-replace-repl-string-work/m-p/33378#M24384</link>
    <description>&lt;P&gt;Hello @Stephen Wilcoxon​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please share the expected output in Spark SQL?&lt;/P&gt;</description>
    <pubDate>Tue, 11 Oct 2022 14:04:56 GMT</pubDate>
    <dc:creator>User16764241763</dc:creator>
    <dc:date>2022-10-11T14:04:56Z</dc:date>
    <item>
      <title>How to make backreferences in regexp_replace repl string work correctly in Databricks SQL?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-backreferences-in-regexp-replace-repl-string-work/m-p/33377#M24383</link>
      <description>&lt;P&gt;Both of these work in Spark SQL:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;regexp_replace('1234567890abc', '^(?&amp;lt;one&amp;gt;\\w)(?&amp;lt;two&amp;gt;\\w)(?&amp;lt;three&amp;gt;\\w)', '$1')
regexp_replace('1234567890abc', '^(?&amp;lt;one&amp;gt;\\w)(?&amp;lt;two&amp;gt;\\w)(?&amp;lt;three&amp;gt;\\w)', '${one}')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;However, neither work in Databricks SQL. I found that this works for the numeric backreference but I can't find any way to make named backreferences work in Databricks. The problem seems to be around the repl string of regexp_replace.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;regexp_replace('1234567890abc', '^(?&amp;lt;one&amp;gt;\\w)(?&amp;lt;two&amp;gt;\\w)(?&amp;lt;three&amp;gt;\\w)', '\$1')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;However, when used in the regex itself, it only works without the backslash:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;select 'aabc' RLIKE '^(\\w)$1' dollar, 'aabc' RLIKE '^(\\w)\$1' escdollar   -- dollar = true and escdollar = false
select 'aabc' RLIKE '^(?&amp;lt;one&amp;gt;\\w)${one}' name   -- also works&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Does anyone know of a way to get named backreferences to work in repl string? Or better yet, to get Spark SQL syntax to work properly in Databricks?&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2022 15:15:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-backreferences-in-regexp-replace-repl-string-work/m-p/33377#M24383</guid>
      <dc:creator>Twilight</dc:creator>
      <dc:date>2022-08-26T15:15:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to make backreferences in regexp_replace repl string work correctly in Databricks SQL?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-backreferences-in-regexp-replace-repl-string-work/m-p/33378#M24384</link>
      <description>&lt;P&gt;Hello @Stephen Wilcoxon​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please share the expected output in Spark SQL?&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2022 14:04:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-backreferences-in-regexp-replace-repl-string-work/m-p/33378#M24384</guid>
      <dc:creator>User16764241763</dc:creator>
      <dc:date>2022-10-11T14:04:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to make backreferences in regexp_replace repl string work correctly in Databricks SQL?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-make-backreferences-in-regexp-replace-repl-string-work/m-p/33379#M24385</link>
      <description>&lt;P&gt;I apologize - I apparently lost track of this...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In Spark (but not in Databricks), both of these:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;regexp_replace('1234567890abc', '^(?&amp;lt;one&amp;gt;\\w)(?&amp;lt;two&amp;gt;\\w)(?&amp;lt;three&amp;gt;\\w)', '$1')
regexp_replace('1234567890abc', '^(?&amp;lt;one&amp;gt;\\w)(?&amp;lt;two&amp;gt;\\w)(?&amp;lt;three&amp;gt;\\w)', '${one}')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;result in:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;14567890abc&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;In general, I've found Databricks to work at least as well as Spark but it fails to work correctly in this situation.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In Databricks, you have to use these variations (which are not standard Spark):&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;regexp_replace('1234567890abc', '^(?&amp;lt;one&amp;gt;\\w)(?&amp;lt;two&amp;gt;\\w)(?&amp;lt;three&amp;gt;\\w)', '\$1')
regexp_replace('1234567890abc', '^(?&amp;lt;one&amp;gt;\\w)(?&amp;lt;two&amp;gt;\\w)(?&amp;lt;three&amp;gt;\\w)', '\$\{one}')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;(note that the named capture requires escaping both the $ and the { )&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Upon further testing, $# and ${name} do not work in RLIKE after all (in Databricks at least - have not tested Spark). For example:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;regexp_extract('aabc', '^(\\w)$1', 0)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;returns "a" (not "aa") but '^(\\w)\1' works correctly.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Feb 2023 23:09:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-make-backreferences-in-regexp-replace-repl-string-work/m-p/33379#M24385</guid>
      <dc:creator>Twilight</dc:creator>
      <dc:date>2023-02-22T23:09:40Z</dc:date>
    </item>
  </channel>
</rss>

