โ04-06-2024 07:12 AM
Hi! I would be glad to ask you some questions.
I have the following data.โ
I would like to get this kind of result. I want `move` to correspond to the order of `hist`.
Therefore, i considered the following query.
```
Tried sorting by `order by` before combining strings into an array with `array_join(array_agg())`. At first glance it seems to work, but the official documentation for `array_agg()` states the following
```
The order of elements in the array is non-deterministic.
Does it make sense to sort by `order by` before running `array_join(array_agg())`? Will it work as expected with larger data sizes?
I apologize for the inconvenience and would appreciate your response.
โ04-09-2024 04:11 PM
Hello,
Below is a solution
regexp_replace(array_join(sort_array(array_agg(concat(hist, alf))), ' -> '),'[0-9]','')
The idea in the code is to concatenate the "hist" column with the "alf" column, this way the "sort_array" can sort based on the numbers in the "hist" column and finally a "regexp_replace" is done removing the numbers from the "hist" column. hist" as they cannot be shown in the final result
Hope this helps
โ04-07-2024 08:38 AM
Hi,
I believe that for your scenario the "sort_array", function can help
https://docs.databricks.com/en/sql/language-manual/functions/sort_array.html
โ04-09-2024 05:08 AM - edited โ04-09-2024 05:14 AM
Thanks for your response and Sorry for my late reply.
I apologize that my question did not accurately describe my scenario.
It is true that the `sort_array` function works well for this sample data.
On the other hand, array values do not necessarily have rules like alphabetical or numeric.
------------------------------
with tmp as (
select * from (values
(1, 1, 'cherry'),
(1, 3, 'strawberries'),
(1, 2, 'acerola'),
(1, 5, 'banan'),
(1, 4, 'lemon')
) as tab(custid, hist, alf)
order by custid asc, hist asc
)
select
custid,
array_join(array_agg(alf), ' -> ') as move
from tmp
group by custid
order by custid asc;
------------------------------
I would like to get array like `[cherry -> acerola -> strawberries -> lemon -> banan]` for above sample data. I would like to sort and keep the array values according to the order of the `hist` columns but `array_agg()` states `The order of elements in the array is non-deterministic`.
If you have a solution, I would appreciate your response!
โ04-09-2024 04:11 PM
Hello,
Below is a solution
regexp_replace(array_join(sort_array(array_agg(concat(hist, alf))), ' -> '),'[0-9]','')
The idea in the code is to concatenate the "hist" column with the "alf" column, this way the "sort_array" can sort based on the numbers in the "hist" column and finally a "regexp_replace" is done removing the numbers from the "hist" column. hist" as they cannot be shown in the final result
Hope this helps
โ04-09-2024 05:25 PM
This is a great idea. It can solve my problem.Thank you.
โ04-09-2024 05:26 PM
This is a great idea. It can solve my problem.Thank you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group