Is this way to select `parent` like `gambling roulette ` workable? - java

It occurs to me that using random number to select the parentslike gambling roulettemaybe workalbe.Let me explain it using an example in find the max value of function.The example is shown below:
1.Imagine that we have already generated n random individual and calculated their function value.We named the individual 'j' Xj,and its function value name is f(Xj).And we find and name the max function-value maxValue.
2.It is clear that the fitness of individual j is f(Xj)/maxValue.We can name it g(Xj).And then we calculate all fitness of individuals.
3.The next step is to find the parents.(We abandon the individual whose fitness values is less than 0) .A classtic way is gambling roulette.The chance of selecting Xjand Xkis g(Xj)*g(Xk)/[g(X1)+g(X2)+...+g(Xn)]^2.
My idea is
1.select two random individual Xj and Xk
2.generate a random number rn in range of 0~1.
3.if rn is less than g(Xj)and g(Xk)(the fitness of Xj and Xk),then they are able to reproduce.Then crossover and mutate.
4.judge whether we have generated enough child individuals,if so,end.
else,repeat 1-3.
The chance of selecting Xjand Xk is g(Xj)*g(Xk)/n^2,which is similar to gambling roulette.Consider that both denominator of two chance are constant value,they are equal in a certain way.
double randomNumToJudge=Math.random();//generate a random number to judge with the fitness
int randomMother=(int)(Math.random()*1000);
int randomFather=(int)(Math.random()*1000);//random generate parents
if((randomNumToJudge<=individualArray[generation][randomFather].fitnessValue)
&&(randomNumToJudge<=individualArray[generation][randomMother].fitnessValue))
//if the number is less than both fitness of parents,they are permited to reproduce.
{
Individual childIndividual=individualArray[generation][randomFather].crossOverAndMutate(individualArray[generation][randomFather], individualArray[generation][randomMother]);
//Crossover and mutate and generate child individual
individualArray[generation+1][counter]=childIndividual;//add childIndividual to tha Array.
counter++;//the count of individual number in child generation
}
I test this way in a java code.The function is x + 10sin(5x) + 7cos(4x), x∈[0,10).I generate 100 generation and the individual number in a generation is 1000.
Its result is correct.
In a certain execution,in the 100th generation,i find the best individual is 7.856744175554171,and the best function value is 24.855362868957645.
I have tested for 10 times.Every result is accurate to 10 decimal places in 100th generation.
So is this way workable?Is this way already been thought by others?
Any comments are appreciated ^#^
PS:Pardon my poor english-_-

Please note I have edited this answer.
From point 2, I am assuming your target fitness is 1. Your algorithm will likely never fully converge (find a local minima). This is because your random value range (0~>1) does not change even if your fitnesses do.
Note that this does not mean better fitnesses are not created; they will be. But there will be a sharp decline in the speed at which better fitnesses are created due to the fact that you are checking for fitnesses (random 0~>1).
Consider this example where all fitnesses have converged to be high:
[0.95555, 0.98888, 0.92345, 0.92366]
Here, all values are very likely to satisfy randomNumToJudge<=fitness. This means any of the values are equally likely to be chosen as a parent. You do not want this - you want the best values to have a higher chance of being chosen.
Your algorithm could be amended to converge properly if you set your randomNumToJudge to have a range of (median fitness in population ~> 1), though this still is not optimal.
Alternative Method
I recommend implementing the classic roulette wheel method.
The roulette wheel method assigns to each individual a probability of being chosen as a parent based on how "fit" they are. Essentially, the greater the fitness, the the bigger the slice of the wheel the individual will occupy and the higher the chance a random number will choose this position on the wheel.
Example Java code for roulette wheel selection

Related

Do an action with some probability in java

In Java, I am trying to do an action with a probability p. p is a float variable in my code. I came up with this way of doing it:
if( new Random().nextFloat() < p)
do action
I wanted to confirm if this is the correct way of doing it.
There is a TL;DR at the end.
From javadocs for nextFloat() (emphasis by me):
public float nextFloat()
Returns the next pseudorandom, uniformly distributed float value
between 0.0 and 1.0 from this random number generator's sequence.
If you understand what uniform distribution is, knowing this about nextFloat() is going to be enough for you. Yet I am going to explain a little about uniform distribution.
In uniform distribution, U(a,b) each number in the interval [a,b], and also all sub-intervals of the same length within [a,b] are equally probable, i.e. they have equal probability.
In the figure, on the left is the PDF, and on the right the CDF for uniform distribution.
For uniform distribution, the probability of getting a number less than or equal to n, P(x <= n) from the distribution is equal to the number itself (look at the right graph, which is cumulative distribution function for uniform distribution). That is, P(x <= 0.5) = 0.5, P(x <= 0.9) = 0.9. You can learn more about uniform distribution from any good statistics book, or some googling.
Fitting to your situation:
Now, probability of getting a number less than or equal to p generated using nextFloat() is equal to p, as nextFloat() returns uniformly distributed number. So, to make an action happen with a probability equal to p all you have to do is:
if (condition that is true with a probability p) {
do action
}
From what is discussed about nextFloat() and uniform distribution, it turns out to be:
if(randObj.nextFloat() <= p) {
do action
}
Conclusion:
What you did is almost the right way to do what you intended. Just adding the equal sign after < is all that's needed, and it doesn't hurt much to leave out the equal sign either!
P.S.: You don't need to create a new Random object each time in your conditional, you can create one, say randObj before your loop, and then invoke its nextFloat() method whenever you want to generate a random number, as I have done in my code.
Comment by pjs:
Take a look at the comment on the question by pjs, which is very important and well said. I quote:
Do not create a new Random object each time, that's not how PRNGs are
meant to be used! A single Random object provides a sequence of values
with good distributional properties. Multiple Random objects created
in rapid succession are 1) computationally expensive, and 2) may have
highly correlated initial states, thus producing highly correlated
outcomes. Random actually works best when you create a single instance
per program and keep drawing from it, unless you really really know
what you're doing and have specific reasons for using correlation
induction strategies.
TL;DR
What you did is almost the right way to do it. Just adding the equal sign after < (to make it <=) is all that's needed, and it doesn't hurt much to leave out the equal sign either!
Yes. That is correct (from a pure probability perspective). Random().nextFloat() will generate a number between 0.0 and 1.0 exclusive. So as long as your probability is as a float in the range 0.0 and 1.0, this is the correct way of doing it.
You can read more of the exact nextFloat() documentation here.

Matching Algorithm (Java)

I am writing an algorithm to match students with different groups. Each group has a limited number of spots. Each student provides their top 5 choices of groups. The students are then placed into groups in a predetermined order (older students and students with perfect attendance are given higher priority). There is no requirement for groups to be filled entirely, but they cannot be filled passed capacity.
I've looked into similar marriage problems such as the Gale-Shapely stable marriage algorithm, but the problem I am having is that there far fewer groups than students and each group can accept multiple students.
What is the best way to implement such an algorithm to find a solution that has been optimized entirely such that there is no better arrangement of students in groups? In terms of algorithm complexity, I'm placing roughly 600 students into 10-20 groups.
NB The close votes are terribly misplaced. Algorithm choice and design to solve an ambiguous problem is absolutely part of programming.
I think you'll get farther with Minimum Weight Bipartite Matching than Stable Marriage (also called the Hungarian method or algorithm or Maximum Weight Matching, which can give you a min weight matching just by negating the weights).
You are out to match positions with students. So the two node types in the bipartite graph are these.
The simplest statement of the algorithm requires a complete weighed bipartite graph with equal numbers of nodes in each set. You can think of this as a square matrix. The weights are the elements. The rows are students. The columns are positions.
The algorithm will pick a single element from each row/column such that the sum is minimized.
#nava's proposal is basically a greedy version of MWBM that's not optimal. The true Hungarian algorithm will give you an optimal answer.
To handle the fact that you have fewer positions than students is easy. To the "real" positions add as many "dummy" positions as needed. Connect all these to all the students with super high-weight edges. The algorithm will only pick them after all the real positions are matched.
The trick is to pick the edge weights. Let's call the ordinal where a student would be considered for a position O_i for the i'th student. Then let R_ip be the ranking that the same student places on the p'th position. Finally, W_ip is the weight of the edge connecting the i'th student to the p'th position. You'll want something like:
W_ip = A * R_ip + B * O_i
You get to pick A and B to specify the relative importance of the students' preferences and the order they're ranked. It sounds like order is quite important. So in that case you want B to be big enough to completely override students' rankings.
A = 1, B = N^2, where N is the number of students.
Once you get an implementation working, it's actually fun to tweak the parameters to see how many students get what preference, etc. You might want to tweak the parameters a bit to give up a little on the order.
At the time I was working on this (late 90's), the only open source MWBM I could find was an ancient FORTRAN lib. It was O(N^3). It handled 1,000 students (selecting core academic program courses) in a few seconds. I spent a lot of time coding a fancy O(N^2 log N) version that turned out to be about 3x slower for N=1000. It only started "winning" at about 5,000.
These days there are probably better options.
I would modify the Knapsack problem (Knapsack problem, wikipedia) to work with K numbers of groups (knapsacks) instead of just one. You can assign "value" to the preferences they have and the number of spots would be the maximum "weight" of the Knapsack. With this, you can backtrack to check what is the optimal solution of the problem.
I am not sure how efficient you need the problem to be, but I think this will work.
the most mathematically perfect
is very opinion based.
simplicity (almost) always wins. here
Here is a pseudocode:
students <-- sorted by attendance
for i=0 to n in students:
groups <-- sorted by ith student's preference
for j=0 to m in groups:
if group j has space then add student i to group j; studentAssigned=true; break;
if studentAssigned=false;
add i to unallocated
for i=0 to k in unallocated
allocate i to a random group that is not filled
For each group:
create an ordered set and add all the students (you must design the heuristic which will order the students inside the set, which could be attendance level multiplied by 1 if the group is within their choice, 0 otherwise).
Fill the group with the first nth students
But there are some details that you didn't explain. For example, what happen if there are students that couldn't enter any of their 5 choices because they got full with other students with higher priority?

Finding smallest delta available for permutations of a set of numbers

Presume we have a set of 12 objects, lets say {1,2,3,4,5,6,7,8,9,10,11,12}. We must break this set into 4 smaller ones composed of three objects, so that the largest sum and smallest sum of these four sets is minimized. We must find this difference. In our example, {1,7,12},(3,8,9},{4,5,10},{2,6,11}. These four sets satisfy the problem since their sums are 20 and 19, meaning a delta of 1, our answer.
How can one solve this problem for any arbitrary 12 values?
I've tried enumerating all partitions of said set into 4 sets of 3, and finding one with the optimal score. However, time is of the essence, and so I was wondering how one would approach this problem in Java
I don't have exact code on me right now, but what it essentially was was 9 nested for loops, where the first three that nest are one set, the next three are the next set, the last three are another set, and the three left overs are another set. I used a 2D array so that values would be in score[i][0] and score[i][1] would act as an indicator to let me know if that value in score[i][0] had already been placed into a set.
This of course gets tedious and inefficient.
You could easily simplify the problem by finding the values that the sums must approach for a better optimisation :
For instance, in your simple case (1,2...12), then the total sum of every terms is 78. Thus, Each groups must have a sum very close to 78/4=19.
So, let's try a very simple algorithm :
- compute TOTAL_SUM = SUM(terms)
- compute TARGET_SUM = TOTAL_SUM / number(terms)
- set DELTA=0
- loop {
- Try to split terms in groups where TARGET_SUM - DELTA <= SUM <= TARGET_SUM + DELTA
- if a solution is found, exit
- DELTA = DELTA + 1
- }
Ok, I did not helped you much with this "Try to split..." step. But it should look like you own solution, except that you have additional constraints which can help you to speed up the process.
Hope this helps.

What would be considered a standard deviation boundry for java random?

I'm using java 6 random (java.util.Random,linux 64) to randomly decide between serving one version of a page to a second one (Normal A/B testing), technically i initialize the class once with the default empty constructor and it's injected to a bean (Spring) as a property .
Most of the times the copies of the pages are within 8%(+-) of each other but from time to time i see deviations of up to 20 percent , e.g :
I now have two copies that split : 680 / 570 is that considered normal ?
Is there a better/faster version to use than java random ?
Thanks
A deviation of 20% does seem rather large, but you would need to talk to a trained statistician to find out if it is statistically anomalous.
UPDATE - and the answer is that it is not necessarily anomalous. The statistics predict that you would get an outlier like this roughly 0.3% of the time.
It is certainly plausible for a result like this to be caused by the random number generator. The Random class uses a simple "linear congruential" algorithm, and this class of algorithms are strongly auto-correlated. Depending on how you use the random number, this could lead anomalies at the application level.
If this is the cause of your problem, then you could try replacing it with a crypto-strength random number generator. See the javadocs for SecureRandom. SecureRandom is more expensive than Random, but it is unlikely that this will make any difference in your use-case.
On the other hand, if these outliers are actually happening at roughly the rate predicted by the theory, changing the random number generator shouldn't make any difference.
If these outliers are really troublesome, then you need to take a different approach. Instead of generating N random choices, generate a list of false / true with exactly the required ratio, and then shuffle the list; e.g. using Collections.shuffle.
I believe this is fairly normal as it is meant to generate random sequences. If you want repeated patterns after certain interval, I think you may want to use a specific seed value in the constructor and reset the random with same seed after certain interval.
e.g. after every 100/500/n calls to Random.next.., reset the seed with old value using Random.setSeed(long seed) method.
java.util.Random.nextBoolean() is an approach for a standard binomial distribution, which has standard deviation of sqrt(n*p*(1-p)), with p=0.5.
So if you do 900 iterations, the standard deviation is sqrt(900*.5*.5) = 15, so most times the distribution would be in the range 435 - 465.
However, it is pseudo-random, and has a limited cycle of numbers it will go through before starting over. So if you have enough iterations, the actual deviation will be much smaller than the theoretical one. Java uses the formula seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1). You could write a different formula with smaller numbers to purposely obtain a smaller deviation, which would make it a worse random number generator, but better fitted for your purpose.
You could for example create a list of 5 trues and 5 falses in it, and use Collections.shuffle to randomize the list. Then you iterate over them sequentially. After 10 iterations you re-shuffle the list and start from the beginning. That way you'll never deviate more than 5.
See http://en.wikipedia.org/wiki/Linear_congruential_generator for the mathematics.

How to be sure that random numbers are unique and not duplicated?

I have a simple code which generates random numbers
SecureRandom random = new SecureRandom();
...
public int getRandomNumber(int maxValue) {
return random.nextInt(maxValue);
}
The method above is called about 10 times (not in a loop). I want to ensure that all the numbers are unique (assuming that maxValue > 1000).
Can I be sure that I will get unique numbers every time I call it? If not, how can I fix it?
EDIT: I may have said it vaguely. I wanted to avoid manual checks if I really got unique numbers so I was wondering if there is a better solution.
There are different ways of achieving this and which is more appropriate will depend on how many numbers you need to pick from how many.
If you are selecting a small number of random numbers from a large range of potential numbers, then you're probably best just storing previously chosen numbers in a set and "manually" checking for duplicates. Most of the time, you won't actually get a duplicate and the test will have practically zero cost in practical terms. It might sound inelegant, but it's not actually as bad as it sounds.
Some underlying random number generation algorithms don't produce duplicates at their "raw" level. So for example, an algorithm called a XORShift generator can effectively produce all of the numbers within a certain range, shuffled without duplicates. So you basically choose a random starting point in the sequence then just generate the next n numbers and you know there won't be duplicates. But you can't arbitrarily choose "max" in this case: it has to be the natural maximum of the generator in question.
If the range of possible numbers is small-ish but the number of numbers you need to pick is within a couple of orders of magnitude of that range, then you could treat this as a random selection problem. For example, to choose 100,000 numbers within the range 10,000,000 without duplicates, I can do this:
Let m be the number of random numbers I've chosen so far
For i = 1 to 10,000,000
Generate a random (floating point) number, r, in the range 0-1
If (r < (100,000-m)/(10,000,000-i)), then add i to the list and increment m
Shuffle the list, then pick numbers sequentially from the list as required
But obviously, there's only much point in choosing the latter option if you need to pick some reasonably large proportion of the overall range of numbers. For choosing 10 numbers in the range 1 to a billion, you would be generating a billion random numbers when by just checking for duplicates as you go, you'd be very unlikely to actually get a duplicate and would only have ended up generating 10 random numbers.
A random sequence does not mean that all values are unique. The sequence 1,1,1,1 is exactly as likely as the sequence 712,4,22,424.
In other words, if you want to be guaranteed a sequence of unique numbers, generate 10 of them at once, check for the uniqueness condition of your choice and store them, then pick a number from that list instead of generating a random number in your 10 places.
Every time you call Random#nextInt(int) you will get
a pseudorandom, uniformly distributed int value between 0 (inclusive)
and the specified value (exclusive).
If you want x unique numbers, keep getting new numbers until you have that many, then select your "random" number from that list. However, since you are filtering the numbers generated, they won't truly be random anymore.
For such a small number of possible values, a trivial implementation would be to put your 1000 integers in a list, and have a loop which, at each iteration, generates a random number between 0 and list.size(), pick the number stored at this index, and remove it from the list.
This is code is very efficient with the CPU at the cost of memory. Each potiental value cost sizeof(int) * maxValue. An unsigned integer will work up to 65535 as a max. long can be used at the cost of a lot of memory 2000 bytes for 1000 values of 16 bit integers.
The whole purpose of the array is to say have you used this value before or not 1 = yes
'anything else = no
'The while loop will keep generating random numbers until a unique value is found.
'after a good random value is found it marks it as used and then returns it.
'Be careful of the scope of variable a as if it goes out of scope your array could erased.
' I have used this in c and it works.
' may take a bit of brushing up to get it working in Java.
unsigned int a(1000);
public int getRandomNumber(int maxValue) {
unsigned int rand;
while(a(rand)==1) {
rand=random.nextInt(maxValue);
if (a(rand)!=1) { a(rand)=1; return rand;}
}
}

Categories