Sunday, March 17, 2019

Weighted random picker

I've been dabbling in Genetic Algorithms as of late. Great videos here on the subject.
In implementing a GA using JavaScript, I came across the problem of selecting into a mating pool individuals from a population depending on their fitness values. Talking in terms of code, it basically boils down to picking from a list of objects conforming to a custom distribution defined typically by certain 'weights' assigned to each of them.
I first tried the most obvious and the most naive solution: to generate a new list consisting of the original list's objects with their population corresponding to their respective weights. So, for example, if A,B and C are the objects with the weight distribution of 50%, 30% and 20%, I created a new list of, say a hundred total objects with 50 A's, 30 B's and 20 C's. Then, if I choose randomly from this new list, I should get roughly the desired distribution of our original objects. But such a method quickly led me to a whole host of problems: for starters, for large original populations, the individual probabilities/normalized weights could be arbitrarily small - this meant, the new pool's total population needed to be correspondingly large enough to reflect all the small differences between the original population's members' weights. Ignoring this fact led to diminishing diversity in subsequent generations and owing to that, a complete collapse of the mating pool after a number of iterations.
I got a couple of solutions to the problem upon researching about the topic from the internet but most of them were, while fully functional, unintuitive. There was a lot of head-scratching and frustration-filled deliberation. I knew that for a weighted/biased true or false logic, the fact that a random percentage needs to be less than a given percentage or occurrence rate can be exploited as a good decision criterion. For example, if the desired occurrence rate is 1%, choose a random percentage and if it happens to be less than 1%, take the outcome as true. It is obvious that the distribution is going to be heavily skewed towards false. It works and checks out with the desired rate of 1%. However, when it comes to a bunch of different objects each with their probabilities/relative weights, the logic can not simply forgo the interdependence that inherently exists between the such objects. Using the same idea willy-nilly on each member just doesn't work.
Then, I got an idea of my own. Somehow, an object's probability score had to map to its selection and that process could not be deterministic. Just because a member has a score of 80% shouldn't always mean a yes, and shouldn't always mean a no. It should be randomly determined but even in randomness, the choice should be a yes more often than not. What if I multiplied each member's probability with a random percentage? Some members would get a large random percentage and some, small. Since the random numbers themselves don't have any particular bias, the resulting 'scaled' probabilities will be random with a slight skew defined by member-specific weights. Because of the imposed biases in the form of weights, more often than not, members with larger weights will on average, score higher than others. This won't always be true but on average, the frequency that a member will have the highest final score will be proportional to its assigned weight - the greater its weight, the easier it is going to be the one with the highest score. Members with lower scores will occasionally be the top scorers - but rarely. And when we pick the highest scoring members enough number of times, the resulting distribution will reflect this phenomenon.
Phew! Enough rambling.
Here is the code.
For a piece of code that small, this is such a big explanation isn't it?

No comments:

Post a Comment