cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Agg items in a map

AdamIH123
New Contributor II

What is the best way to aggregate a map across rows? In the below, The agg results would be red: 4, green 7, blue: 10. This can be achieved using explode wondering if there is a better way. 

%sql 

with cte as (
select 
1 as id 
, map('red', 1, 'green' 2) as color_map
union all 
select 
1 as id 
, map('red', 3, 'green' 5, 'blue': 10) as color_map
)

select 
id
<agg color map so final output is red 4 green 7 blue 10>
from cte 

group by id 

The below works but uses an explode. 

%sql
with cte as (
select 
1 as id 
, map('red', 1, 'green', 2) as color_map
union all 
select 
1 as id 
, map('red', 3, 'green', 5, 'blue', 10) as color_map
),
exploded as (
  select 
    id,
    explode(color_map) as (key, value)
  from cte
),
aggregated as (
  select 
    id,
    key,
    sum(value) as sum_value
  from exploded
  group by id, key
)
select 
  id,
  map_from_arrays(
    collect_list(key),
    collect_list(sum_value)
  ) as aggregated_color_map
from aggregated
group by id

 

1 ACCEPTED SOLUTION

Accepted Solutions

SP_6721
Contributor III

Hi @AdamIH123 ,

The explode-based approach is widely used and remains the most reliable and readable method.

But if you're looking for an alternative without using explode, you can try the REDUCE + MAP_FILTER approach. It lets you aggregate maps across rows efficiently:

SELECT
  id,
  MAP_FILTER(
    REDUCE(
      COLLECT_LIST(color_map),
      MAP('init', 0),
      (acc, m) -> MAP_ZIP_WITH(
        acc,
        m,
        (k, v1, v2) -> COALESCE(v1, 0) + COALESCE(v2, 0)
      )
    ),
    (k, v) -> k != 'init'
  ) AS aggregated_color_map
FROM cte
GROUP BY id;

View solution in original post

1 REPLY 1

SP_6721
Contributor III

Hi @AdamIH123 ,

The explode-based approach is widely used and remains the most reliable and readable method.

But if you're looking for an alternative without using explode, you can try the REDUCE + MAP_FILTER approach. It lets you aggregate maps across rows efficiently:

SELECT
  id,
  MAP_FILTER(
    REDUCE(
      COLLECT_LIST(color_map),
      MAP('init', 0),
      (acc, m) -> MAP_ZIP_WITH(
        acc,
        m,
        (k, v1, v2) -> COALESCE(v1, 0) + COALESCE(v2, 0)
      )
    ),
    (k, v) -> k != 'init'
  ) AS aggregated_color_map
FROM cte
GROUP BY id;

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now