Databricks Community

shaz · ‎12-23-2021

Consider the following input:

ID          PrevID
-------- ---------
33          NULL
272       33
317        272
318        317

I need to somehow get the following result:

Result
--------
/33
/33/272
/33/272/317
/33/272/317/318

I need to do this in SQL and if REALLY not possible, then anything else goes, and Databricks does not support recursive CTE's so I am out of ideas, i have an SQL Fiddle here that illustrates the recursivce CTE way that doesn't work on databricks: http://sqlfiddle.com/#!18/101df/3/0

Any help is appreciated, thanks

Pholo · ‎01-08-2022

Hi @Shazal Gomes ,

I found out using pyspark this solution

from pyspark.sql import functions as F
from pyspark.sql.window import Window
from pyspark.sql import Row
from pyspark.sql.types import *
 
 
 
ids = ["33", "272", "317", "318"]
df_ids = spark.createDataFrame([Row(ID = i) for i in ids])
 
display(
  df_ids.withColumn(
    'PREVIOUS_ID', F.concat_ws('/',F.array_sort(F.collect_set(F.col('ID')).over(Window.orderBy(F.col('ID').cast('integer'))).cast('array<integer>')))
  )
)

Let me know if It meets your need.

View solution in original post

shaz · ‎12-24-2021

Hi @Kaniz Fatma , Thanks for following up ! Great to meet you too

Kari_05152023 · ‎06-28-2023

Hi Kaniz:

I am reaching out for assistance using SQL/T-SQL for a string function to use on a concatenated string (i.e., DATABRICKS concatenated functionality is used to create unique identification names in the data table). I have read-only access and need to validate huge data sets. Thank you in advance!

Kind regards - Karen

Pholo · ‎01-08-2022

Hi @Shazal Gomes ,

I found out using pyspark this solution

from pyspark.sql import functions as F
from pyspark.sql.window import Window
from pyspark.sql import Row
from pyspark.sql.types import *
 
 
 
ids = ["33", "272", "317", "318"]
df_ids = spark.createDataFrame([Row(ID = i) for i in ids])
 
display(
  df_ids.withColumn(
    'PREVIOUS_ID', F.concat_ws('/',F.array_sort(F.collect_set(F.col('ID')).over(Window.orderBy(F.col('ID').cast('integer'))).cast('array<integer>')))
  )
)

Let me know if It meets your need.