- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-02-2021 03:39 PM
I have set numBuckets and numBucketsArray for a group of columns to bin them into 5 buckets.
Unfortunately the number of buckets does not seem to be respected across all columns even though there is variation within them.
I have tried setting the relativeerror to 0.
Any idea why this is and how to solve it to force the number of buckets specified?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2021 06:19 PM
Thank you.
What I did was:
- Apply QuntileBucketizer to Non-Zeros and specified a very small value (bottom 1%) to capture the lower bucket including zeroes.
That fixed the issue! You can define your own splits which would work as well but the splits themselves were important in this case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2021 06:11 AM
QuantileDiscretizer does not guarantee the number of buckets afaik. Depending on your data you might get less buckets than asked.
Bucketizer however does, but you have to define your splits.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2021 06:19 PM
Thank you.
What I did was:
- Apply QuntileBucketizer to Non-Zeros and specified a very small value (bottom 1%) to capture the lower bucket including zeroes.
That fixed the issue! You can define your own splits which would work as well but the splits themselves were important in this case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2022 08:13 PM
Can you explain a bit more?
data:image/s3,"s3://crabby-images/cb5bb/cb5bb73aed1093bf2bbc88d029c5de02e8c5cfc3" alt=""
data:image/s3,"s3://crabby-images/cb5bb/cb5bb73aed1093bf2bbc88d029c5de02e8c5cfc3" alt=""