Databricks Community

Databricks143 · ‎10-03-2023

Hi Team,

How to write recrusive cte in databricks SQL.

Please let me know any one have solution for this

Databricks143 · ‎10-03-2023

need to run below query in databricks ql but it showing error as UserCTE table not found.

WITH UserCTE AS (
  SELECT userId, userName, managerId, 0 AS EmpLevel
  FROM Users where managerId IS NULL

  UNION ALL

  SELECT usr.userId, usr.userName, usr.managerId, mgr.[EmpLevel]+1
  FROM Users AS usr
    INNER JOIN UserCTE AS mgr
      ON usr.managerId = mgr.userId where usr.managerId IS NOT NULL
)
SELECT * 
  FROM UserCTE AS u where u.ManagerId=3
  ORDER BY EmpLevel

Please help me on this

-werners- · ‎10-04-2023

recursive CTE is not supported in spark. but there is a workaround using python:

https://medium.com/@24chynoweth/recursive-cte-on-databricks-2ac0dff8ca06

Databricks143 · ‎10-06-2023

Thanks for update .

We have frame work restrictions in databricks in my organisation and we need to use only scala code

-werners- · ‎10-06-2023

the code is easily translated to scala.
There is nothing python specific about it.

A_Kennedy · ‎05-06-2024

Did you find a solution? I have an almost identical CTE query to pull employee hierarchy info and I am receiving the same error

-werners- · ‎05-06-2024

It is still not supported. Not sure when it will be (if ever).

EWhitley · ‎05-07-2024

Weirdly, if you ask the Databricks assistant, it tells you it does support recursive CTE and will give you sample code for it. If you follow up and press it on details it doubles down and says it supports it, but tell you the runtime version information.

I'm wondering if support is in development.

Msquare · ‎08-26-2024

Yes , Databricks not supporting recursive.. Same AI Assistant message.. This feature should be enabled here..

filipniziol · ‎08-27-2024

Hi @Databricks143 ,

This code works in Scala. One thing. It has a number of iterations hard-coded to 10. If there are more levels, you need to adjust

import org.apache.spark.sql.functions._

val data = Seq(
  (1, "Alice", None: Option[Int]),  // Root employee
  (2, "Bob", Some(1)),              // Reports to Alice
  (3, "Charlie", Some(2)),          // Reports to Bob
  (4, "David", Some(1)),            // Reports to Alice
  (5, "Eve", Some(3))               // Reports to Charlie
)

val df = data.toDF("userId", "userName", "managerId")

// Initialize root employee, set EmpLevel to 0
var df_levels = df.filter($"managerId".isNull).withColumn("EmpLevel", lit(0))

// Create a DataFrame to hold the results as they are built up
var final_df = df_levels

// Iteratively process subordinates
val max_iterations = 10  // Adjust based on expected maximum depth
var hasNewLevels = true

for (_ <- 1 to max_iterations if hasNewLevels) {
  // Identify subordinates of employees whose levels have already been calculated
  val df_new_levels = df.as("emp")
    .join(df_levels.as("mgr"), $"emp.managerId" === $"mgr.userId", "inner")
    .select(
      $"emp.userId",
      $"emp.userName",
      $"emp.managerId",
      ($"mgr.EmpLevel" + 1).as("EmpLevel")
    )

  // If no new levels are calculated, set the flag to false to break the loop
  if (df_new_levels.isEmpty) {
    hasNewLevels = false
  } else {
    // Append the new levels to the final result set
    final_df = final_df.union(df_new_levels)
    
    // Update df_levels to include only the newly calculated levels for the next iteration
    df_levels = df_new_levels
  }
}

// Final result
final_df.orderBy("EmpLevel", "userId").show()

The output:

dlehmann · ‎08-28-2024

I want to create a persistent view on a recursive cte. Is something like that just not possible?

As far as i understand:

You need to use pyspark to be able to do a recursive cte.

BUT

You need to use Spark SQL to create a persistent view.

Is there any workaround for this? The reason i want to do this is that my data gets updated frequently. My users want to look at the data in a way that requires a recursive cte but they will not access the majority of it frequently, meaning it would be a waste of resources to persist the information in a physical table.

filipniziol · ‎08-28-2024

Could use pyspark to calculate the recursive logic and save the result as a table?

dlehmann · ‎08-28-2024

Hi @filipniziol , yes i know that is an option, but that would mean i have to do this every time the source table updates, even if the data could possibly not be used before the next update, which is why i wanted to create a view on the data, because as far as i understand, that would calculate the cte at run time and apply filters if any are used.

filipniziol · ‎08-28-2024

In a typical data architecture, data from source systems is ingested into the bronze layer, where it is stored as-is. Subsequent transformations are then applied in the silver layer.

One of these transformations could involve calculating the EmpLevel column. Since recursive CTEs are not supported in Databricks SQL, you have the following options:

Use pyspark: Use pyspark, as in the example below, and persist the calculated EmpLevel column. This is flexible and you can have as many levels as you need. The table is updated on some basis, so you can consider EmpLevel calculation as one of the steps of this update.
Hardcode Levels for a Limited Hierarchy: If the hierarchy depth is known and limited, you can manually define levels using a series of unions. This is for sure not as elegant as recursive CTE or pyspark solution, but it will work as well.

dlehmann · ‎08-29-2024

Hello @filipniziol , I went with your second suggestion as i preferred to use views in this case. It works very well as there is a limited depth and i could just write that many unions.

Thanks for your response!