cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

collect_set wired result when Proton enable

danny_edm
New Contributor

Cluster : DBR 10.4 LTS with proton

Sample schema

seq_no (decimal)

type (string)

Sample data

seq_no type

1 A

1 A

2 A

2 B

2 B

command : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))))

The cluster with Proton yielded wire results, like the size of array > 2; while without proton the results were still good.

Currently, have to use workaround code with F.size(F.array_distinct(F.collect_list())))

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.