<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15633#M9941</link>
    <description>&lt;P&gt;Hi @Jonathan Dufault​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm glad it answered your question. From what I can see internally there is a decent amount of demand for this feature so I have a feeling it will be implemented in the future, but I don't have any timelines at this point.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It also appears that OSS Spark has two pull requests that would implement Recursive CTEs which shows promise for this feature&lt;/P&gt;</description>
    <pubDate>Wed, 21 Dec 2022 19:00:49 GMT</pubDate>
    <dc:creator>LandanG</dc:creator>
    <dc:date>2022-12-21T19:00:49Z</dc:date>
    <item>
      <title>How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15630#M9938</link>
      <description>&lt;P&gt;I have a table that looks like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;/* input */
-- | parent | child |
-- | ------ | ----- |
-- | 1      | 2     |
-- | 2      | 3     |
-- | 3      | 4     |
-- | 5      | 6     |
-- | 6      | 7     |
-- | 8      | 9     |
-- | 10     | 11    |&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;and I want create something that looks like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;/* output */
-- | person | first_ancestor |
-- | ------ | --------------- |
-- | 1      | 1               |
-- | 2      | 1               |
-- | 3      | 1               |
-- | 4      | 1               |
-- | 5      | 5               |
-- | 6      | 5               |
-- | 7      | 5               |
-- | 8      | 8               |
-- | 9      | 8               |
-- | 10     | 10              |
-- | 11     | 10              |&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;If I were using bigquery, it would be simple using a recursive CTE&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;-- recursive CTE: get the first ancestor of each record
with recursive table_data as (
  -- populate data
  select 1 as parent, 2 as child union all
  select 2,3 union all 
  select 3,4 union all
  select 5,6 union all
  select 6,7 union all
  select 8,9 union all
  select 10,11
), 
base_records as (
    -- make the base case (get all records that are not children of another record)
    -- also has the effect of filtering out cyclic groups (e.g. 1-&amp;gt;2-&amp;gt;-&amp;gt;1)
    select parent as first_ancestor,parent as person 
    from table_data 
    where parent not in (select child from table_data)
    ),
lookup_table as (
  -- first start with the base case...the first ancestor
  select person,first_ancestor from base_records
  union all 
  -- recursively add descendants, noting the first ancestor to that descendant
  select table_data.child as person,lookup_table.first_ancestor
    from lookup_table 
    join table_data 
    on lookup_table.person = table_data.parent
)
select  person,first_ancestor from lookup_table
order by first_ancestor,person&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;is there 1) some function that has recursion in databricks (spark sql or pyspark), or 2) a canonical way to do this? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can brute force this by coding an arbitrary length join (join table_data t1 to table_data t2 to table_data t3 ...) but was wondering if there's a less inefficient/more dynamic way. I know nothing about graph databases or if the proper solution lies in them. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you so much &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2022 17:54:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15630#M9938</guid>
      <dc:creator>jonathan-dufaul</dc:creator>
      <dc:date>2022-12-20T17:54:59Z</dc:date>
    </item>
    <item>
      <title>Re: How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15631#M9939</link>
      <description>&lt;P&gt;Hi @Jonathan Dufault​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Around Recursive CTEs, at the moment they aren't really supported on Apache Spark (and Databricks). There are some workarounds that might help such as this one by my colleague &lt;A href="https://medium.com/@24chynoweth/recursive-cte-on-databricks-2ac0dff8ca06 " alt="https://medium.com/@24chynoweth/recursive-cte-on-databricks-2ac0dff8ca06 " target="_blank"&gt;https://medium.com/@24chynoweth/recursive-cte-on-databricks-2ac0dff8ca06 &lt;/A&gt;or this slightly newer post &lt;A href="https://medium.com/globant/how-to-implement-recursive-queries-in-spark-3d26f7ed3bc9 " alt="https://medium.com/globant/how-to-implement-recursive-queries-in-spark-3d26f7ed3bc9 " target="_blank"&gt;https://medium.com/globant/how-to-implement-recursive-queries-in-spark-3d26f7ed3bc9&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hopefully this helps, - LG&lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2022 20:07:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15631#M9939</guid>
      <dc:creator>LandanG</dc:creator>
      <dc:date>2022-12-20T20:07:35Z</dc:date>
    </item>
    <item>
      <title>Re: How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15632#M9940</link>
      <description>&lt;P&gt;wow this completely answers my question. do you have a sense for whether that's a thing that's going to be implemented at some point/how important or not it is? just curious mainly. &lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2022 17:59:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15632#M9940</guid>
      <dc:creator>jonathan-dufaul</dc:creator>
      <dc:date>2022-12-21T17:59:39Z</dc:date>
    </item>
    <item>
      <title>Re: How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15633#M9941</link>
      <description>&lt;P&gt;Hi @Jonathan Dufault​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm glad it answered your question. From what I can see internally there is a decent amount of demand for this feature so I have a feeling it will be implemented in the future, but I don't have any timelines at this point.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It also appears that OSS Spark has two pull requests that would implement Recursive CTEs which shows promise for this feature&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2022 19:00:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15633#M9941</guid>
      <dc:creator>LandanG</dc:creator>
      <dc:date>2022-12-21T19:00:49Z</dc:date>
    </item>
    <item>
      <title>Re: How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15634#M9942</link>
      <description>&lt;P&gt;@Landan George​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hey, I am looking into same issue, but when I execute what's suggested in the post  for CTE_Recursive &lt;A href="https://medium.com/globant/how-to-implement-recursive-queries-in-spark-3d26f7ed3bc9" target="test_blank"&gt;https://medium.com/globant/how-to-implement-recursive-queries-in-spark-3d26f7ed3bc9&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I get error&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Error in SQL statement: AnalysisException: Table or view not found: CTE_Recursive; line x pos y;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I created the same table as in the post's example and even copied the code and executed it so I am not doing something else.&lt;/P&gt;&lt;P&gt;Can you please advise ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jan 2023 19:03:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15634#M9942</guid>
      <dc:creator>JGil</dc:creator>
      <dc:date>2023-01-22T19:03:16Z</dc:date>
    </item>
    <item>
      <title>Re: How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15635#M9943</link>
      <description>&lt;P&gt;I selected the above answer because it answered the "oh this function doesn't exist." I didn't try the workaround. @Landan George​&amp;nbsp;just letting you know this is a question above&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2023 15:26:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-look-up-the-first-ancestor-person-first-ancestor-of-a/m-p/15635#M9943</guid>
      <dc:creator>jonathan-dufaul</dc:creator>
      <dc:date>2023-01-24T15:26:23Z</dc:date>
    </item>
  </channel>
</rss>

