Concatenating strings based on previous row values

shaz
New Contributor II

Consider the following input:

ID          PrevID
-------- ---------
33          NULL
272       33
317        272
318        317

I need to somehow get the following result:

Result
--------
/33
/33/272
/33/272/317
/33/272/317/318

I need to do this in SQL and if REALLY not possible, then anything else goes, and Databricks does not support recursive CTE's so I am out of ideas, i have an SQL Fiddle here that illustrates the recursivce CTE way that doesn't work on databricks: http://sqlfiddle.com/#!18/101df/3/0

Any help is appreciated, thanks

shaz
New Contributor II

Hi @Kaniz Fatma​ , Thanks for following up ! Great to meet you too

Hi Kaniz:

I am reaching out for assistance using SQL/T-SQL for a string function to use on a concatenated string (i.e., DATABRICKS concatenated functionality is used to create unique identification names in the data table).  I have read-only access and need to validate huge data sets.  Thank you in advance!

Kind regards - Karen

 

 

 

Pholo
Contributor

Hi @Shazal Gomes​ ,

I found out using pyspark this solution

from pyspark.sql import functions as F
from pyspark.sql.window import Window
from pyspark.sql import Row
from pyspark.sql.types import *
 
 
 
ids = ["33", "272", "317", "318"]
df_ids = spark.createDataFrame([Row(ID = i) for i in ids])
 
display(
  df_ids.withColumn(
    'PREVIOUS_ID', F.concat_ws('/',F.array_sort(F.collect_set(F.col('ID')).over(Window.orderBy(F.col('ID').cast('integer'))).cast('array<integer>')))
  )
)

Let me know if It meets your need.

View solution in original post

shaz
New Contributor II

Hi, thanks for this code, it works as needed ! really would've liked to have it in SQL but this will do for now! Thank you