cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Recrusive cte in databrick sql

Databricks143
New Contributor III

Hi Team,

How to write recrusive cte in databricks SQL.

Please let me know any one have solution for this 

14 REPLIES 14

Databricks143
New Contributor III
need to run below query in databricks ql but it showing error as UserCTE table not found.

WITH
UserCTE AS ( SELECT userId, userName, managerId, 0 AS EmpLevel FROM Users where managerId IS NULL UNION ALL SELECT usr.userId, usr.userName, usr.managerId, mgr.[EmpLevel]+1 FROM Users AS usr INNER JOIN UserCTE AS mgr ON usr.managerId = mgr.userId where usr.managerId IS NOT NULL ) SELECT * FROM UserCTE AS u where u.ManagerId=3 ORDER BY EmpLevel

Please help me on this

-werners-
Esteemed Contributor III

recursive CTE is not supported in spark. but there is a workaround using python:

https://medium.com/@24chynoweth/recursive-cte-on-databricks-2ac0dff8ca06

Thanks for update .

We have frame work  restrictions in databricks  in my organisation and  we need to use only scala code

-werners-
Esteemed Contributor III

the code is easily translated to scala.
There is nothing python specific about it.

A_Kennedy
New Contributor II

Did you find a solution? I have an almost identical CTE query to pull employee hierarchy info and I am receiving the same error

-werners-
Esteemed Contributor III

It is still not supported.  Not sure when it will be (if ever).

EWhitley
New Contributor III

Weirdly, if you ask the Databricks assistant, it tells you it does support recursive CTE and will give you sample code for it. If you follow up and press it on details it doubles down and says it supports it, but tell you the runtime version information.

I'm wondering if support is in development.

Msquare
New Contributor II

Yes , Databricks not supporting recursive.. Same AI Assistant message.. This feature should be enabled here.. 

filipniziol
Contributor

Hi @Databricks143 ,

This code works in Scala. One thing. It has a number of iterations hard-coded to 10. If there are more levels, you need to adjust

import org.apache.spark.sql.functions._

val data = Seq(
  (1, "Alice", None: Option[Int]),  // Root employee
  (2, "Bob", Some(1)),              // Reports to Alice
  (3, "Charlie", Some(2)),          // Reports to Bob
  (4, "David", Some(1)),            // Reports to Alice
  (5, "Eve", Some(3))               // Reports to Charlie
)

val df = data.toDF("userId", "userName", "managerId")

// Initialize root employee, set EmpLevel to 0
var df_levels = df.filter($"managerId".isNull).withColumn("EmpLevel", lit(0))

// Create a DataFrame to hold the results as they are built up
var final_df = df_levels

// Iteratively process subordinates
val max_iterations = 10  // Adjust based on expected maximum depth
var hasNewLevels = true

for (_ <- 1 to max_iterations if hasNewLevels) {
  // Identify subordinates of employees whose levels have already been calculated
  val df_new_levels = df.as("emp")
    .join(df_levels.as("mgr"), $"emp.managerId" === $"mgr.userId", "inner")
    .select(
      $"emp.userId",
      $"emp.userName",
      $"emp.managerId",
      ($"mgr.EmpLevel" + 1).as("EmpLevel")
    )

  // If no new levels are calculated, set the flag to false to break the loop
  if (df_new_levels.isEmpty) {
    hasNewLevels = false
  } else {
    // Append the new levels to the final result set
    final_df = final_df.union(df_new_levels)
    
    // Update df_levels to include only the newly calculated levels for the next iteration
    df_levels = df_new_levels
  }
}

// Final result
final_df.orderBy("EmpLevel", "userId").show()

The output:

filipniziol_0-1724782428698.png

 

dlehmann
New Contributor III

I want to create a persistent view on a recursive cte. Is something like that just not possible?

As far as i understand:

You need to use pyspark to be able to do a recursive cte.

BUT

You  need to use Spark SQL to create a persistent view.

 

Is there any workaround for this? The reason i want to do this is that my data gets updated frequently. My users want to look at the data in a way that requires a recursive cte but they will not access the majority of it frequently, meaning it would be a waste of resources to persist the information in a physical table.

Could use pyspark to calculate the recursive logic and save the result as a table?


dlehmann
New Contributor III

Hi @filipniziol , yes i know that is an option, but that would mean i have to do this every time the source table updates, even if the data could possibly not be used before the next update, which is why i wanted to create a view on the data, because as far as i understand, that would calculate the cte at run time and apply filters if any are used.

In a typical data architecture, data from source systems is ingested into the bronze layer, where it is stored as-is. Subsequent transformations are then applied in the silver layer.

One of these transformations could involve calculating the EmpLevel column. Since recursive CTEs are not supported in Databricks SQL, you have the following options:

  1. Use pyspark: Use pyspark, as in the example below, and persist the calculated EmpLevel column. This is flexible and you can have as many levels as you need. The table is updated on some basis, so you can consider EmpLevel calculation as one of the steps of this update.

  2. Hardcode Levels for a Limited Hierarchy: If the hierarchy depth is known and limited, you can manually define levels using a series of unions. This is for sure not as elegant as recursive CTE or pyspark solution, but it will work as well.

dlehmann
New Contributor III

Hello @filipniziol , I went with your second suggestion as i preferred to use views in this case. It works very well as there is a limited depth and i could just write that many unions.

Thanks for your response!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group