collect_set wired result when Proton enable

- - Certifications
- - Learning Paths
- - Databricks Product Tours
- - Get Started Guides

- - Get Started Resources
- - Announcements
- - Community Articles
- - Databricks TV
- - Learning Events
- - MVP Articles
- - Product Platform Updates
- - Support FAQs
- - Technical Blog
- - Community Events
- - BrickTalks TV

- - Databricks Academy Learners
  - Databricks Academy Learners Forum
- - Regional and Interest Groups
- - Private Groups

- - Databricks Community Champions
- - Khoros Community Forums Support (Not for Databricks Product Questions)
- - Databricks Community Code of Conduct
- - DAIS 2026

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Cluster : DBR 10.4 LTS with proton

Sample schema

seq_no (decimal)

type (string)

Sample data

seq_no type

1 A

2 A

2 B

command : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))))

The cluster with Proton yielded wire results, like the size of array > 2; while without proton the results were still good.

Currently, have to use workaround code with F.size(F.array_distinct(F.collect_list())))

0 REPLIES 0

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Databricks AMER Learning Festival | Virtual Training

Introducing the Genie Hub: Ask Questions, Share Builds, and Master Conversational Analytics

🌟 Community Pulse: Your Weekly Roundup! July 13 – 19, 2026

Solution Accelerator Series | Social Determinants of Health

Upcoming Community BrickTalk | Sports Analytics: Turning Tracking Data into Real-Time AI Decisions

Databricks Community

collect_set wired result when Proton enable

Databricks AMER Learning Festival | Virtual Training

Introducing the Genie Hub: Ask Questions, Share Builds, and Master Conversational Analytics

🌟 Community Pulse: Your Weekly Roundup! July 13 – 19, 2026

Solution Accelerator Series | Social Determinants of Health

Upcoming Community BrickTalk | Sports Analytics: Turning Tracking Data into Real-Time AI Decisions