Doing a Monte Carlo Analysis of the Birthday Paradox using a HashSet

Doing a Monte Carlo Analysis of the Birthday Paradox using a HashSet - java

DISCLAIMER : I DO NOT WANT THE ANSWER TO THIS PROBLEM. I SIMPLY NEED SOME GUIDANCE.
I want to perform Monte Carlo analysis on the infamous Birthday Paradox (determining the probability that at least 2 people in a given group share the same birthday) using a HashSet.
Now when I run this, the collisionCount is WAY lower than I expected it to be.First, I was expecting the collisionCount for a group of 10 people to be 11446 (or a probability of 0.11446). Then by the time I got to 100 people, I was expecting the collisionCount to be 100,000 (with a probability of 1.0). But instead, for every 10 people, the collisionCount only counts by 1 (10 people: 1 collision, 20 people: 2 collisions, 30 people: 3 collisions, etc).
Here is the code I have wrote so far :
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
public class BirthdayParadox
{
public static void main(String [] args)
{
Random rand = new Random();
int tests = 100000;
int collisionCount = 0;
for(int people = 10; people <= 100; people += 10)
{
Set<Integer> birthdays = new HashSet<>(365);
birthdays.add(rand.nextInt(365));
for(int runs = 0; runs < tests; runs++)
{
int randomBirthday = rand.nextInt(365);
if(birthdays.contains(randomBirthday))
{
collisionCount++;
break;
}
birthdays.add(randomBirthday);
}
float prob = (float)collisionCount / tests;
System.out.println("After " + tests + " tests there were " +
collisionCount + " occurrences of shared " +
" birthdays in a set of " + people + " people.");
System.out.println("Probability : " + prob);
}
}
}
I guess my question is : Am I not doing something right with either of my for-loops in order to get the collisionCount to count correctly?
I am new to learning Java and I am new to the Stack Overflow community and am still learning the ropes. Any help/ advice/ tips are greatly appreciated.

Your problem appears to be that you are missing one of your loops.
Notice that your runs loop is broken by the first collision. This means that your value will never be more than 1.
Also, you never use your people variable inside the inner loop except when outputting results.
What you need to do is run your simulation 100_000 times. The way to do this is to place logic within your runs loop that checks if people people will have a birthday collision and then iterate your collision count.

I think that the java solution is not the best, this probably it is the problem why you have a difference between the simulation and the mathematical values. What i understand for the problem is that you have to determine for a group of 10 people (in this case) how many of them share the same birthday. To do that you have to random an array of 10 with number from 0 to 365 (days of the year) and count how many of them are the same. You have to do that severals time (100000 in your case).
I think that you have to invert the FOR order. I mean..
for(int runs = 0; runs < tests; runs++)
{
//initialize an array of 10
for(int people = 0; people <= 10; people +=1)
{
//random birthdayDay
//array [people] = rand.nextInt(365);
}
//check if there is a collision
//If there is one you have to increase you success variable in 1
}
//calculate the probability
I try to help you, doing kind of pseudocode.
Hope that this help you a little bit.
Regards
Arturo.

Related

I'm working on Euler 12 , the code i have seems to workes properly but too slow , very very slow. How can i modify it to run faster?

Like i sad , i am working on Euler problem 12 https://projecteuler.net/problem=12 , i believe that this program will give the correct answer but is too slow , i tried to wait it out but even after 9min it still cant finish it. How can i modify it to run faster ?
package highlydivisibletriangularnumber_ep12;
public class HighlyDivisibleTriangularNumber_EP12 {
public static void findTriangular(int triangularNum){
triangularValue = triangularNum * (triangularNum + 1)/2;
}
static long triangularValue = 0l;
public static void main(String[] args) {
long n = 1l;
int counter = 0;
int i = 1;
while(true){
findTriangular(i);
while(n<=triangularValue){
if(triangularValue%n==0){
counter++;
}
n++;
}
if(counter>500){
break;
}else{
counter = 0;
}
n=1;
i++;
}
System.out.println(triangularValue);
}
}

Just two simple tricks:
When x%n == 0, then also x%m == 0 with m = x/n. This way you need to consider only n <= Math.ceil(sqrt(x)), which is a huge speed up. With each divisor smaller than the square root, you get another one for free. Beware of the case of equality. The speed gain is huge.
As your x is a product of two numbers i and i+1, you can generate all its divisors as product of the divisors of i and i+1. What makes it more complicated is the fact that in general, the same product can be created using different factors. Can it happen here? Do you need to generate products or can you just count them? Again, the speed gain is huge.
You could use prime factorization, but I'm sure, these tricks alone are sufficient.

It appears to me that your algorithm is a bit too brute-force, and due to this, will consume an enormous amount of cpu time regardless of how you might rearrange it.
What is needed is an algorithm that implements a formula that calculates at least part of the solution, instead of brute-forcing the whole thing.
If you get stuck, you can use your favorite search engine to find a number of solutions, with varying degrees of efficiency.

Random.nextInt returns contiguous same-value integers (frequently)

I see results like the below when running scala.util.Random().nextInt(3) 81 times (Java developers, please see edit for how this relates):
200010202002112102222211012021020111220022001021101222222022210222220100000100010
Notice the large contiguous blocks:
000, 22222, 111, 222222, 222, 22222, 00000, 000.
Intuitively, the sequence doesn't seem naturally / "real-world coin flip" random.
For example, to achieve 6x contiguous 2s there's only a 0.4% chance (AFAIK) and for 5x contiguous values there's a 1.2% chance ... so it seems unlikely I should keep seeing patterns like this in the output.
Would this happen in the real-world with a 3-sided coin? Or is this an expected deviation from "true random" when using Java's Random.nextInt(exclusiveMax) method?
EDIT:
I've actually been using scala.util.Random.nextInt(int), which creates a new global java.util.Random via new java.util.Random().

This isn't a joke: I'd try it in the real world (probably easiest with two coins: Both heads = 0, mixed = 1, both tails = 2). I suspect you'll see the same result.
You only have three values, so the chances of something being a 2 are always 1:3. You're quite right that, having got a 2, your odds of getting 2 five more times in a row are about 0.04%, which is indeed unlikely in 81 rolls. But after getting a 2, your odds of getting four more are (as you say) 1.23% — much more likely, and not surprising in 81 rolls.
Running the program below myself, I routinely get runs of three in an 81-roll batch and frequently get runs of four, but only rarely runs of five, and quite rarely runs of six. All of which largely matches my expectation.
Measuring the randomness of a PRNG is a quite complicated topic. The simplistic measure is to run it millions of times and then see if you've gotten each value ~0.33333333% of the time. But of course, that could be a million 0s followed by a million 1s followed by a million 2s, which would be a suspicious random result. :-) But you could try several of the approaches discussed in that Wikipedia article if you want to test your setup. Or subscribe to sources of true randomness like https://www.random.org/. Or a random USB device (though I'd subject one of those to quite a lot of due diligence).
My program:
import java.util.*;
public class E {
public static void main(String[] args) {
Random r = new Random();
Map<Integer,Integer> runs = new TreeMap<>();
int last = -1;
int run = 0;
for (int i = 0; i < 81; ++i) {
int v = r.nextInt(3);
if (v != last) {
if (i != 0) {
if (runs.containsKey(run)) {
runs.put(run, runs.get(run) + 1);
} else {
runs.put(run, 1);
}
System.out.println(" (" + run + ")");
}
last = v;
run = 0;
}
++run;
System.out.print(v);
}
System.out.println("\n****");
for (Map.Entry e : runs.entrySet()) {
System.out.println(e.getKey() + ": " + e.getValue());
}
}
}

How to make my room sorter more random?

So I'm working on a program which is supposed to randomly put people in 6 rooms (final input is the list of rooms with who is in each room). So I figured out how to do all that.
//this is the main sorting sequence:
for (int srtM = 0; srtM < Guys.length; srtM++) {
done = false;
People newMove = Guys[srtM]; //Guys is an array of People
while (!done) {
newMove.rndRoom(); //sets random number from 4 to 6
if (newMove.getRoom() == 4 && !room4.isFull()) {
room4.add(newMove); //adds person into the room4 object rList
done = true;
} else if (newMove.getRoom() == 5 && !room5.isFull()) {
room5.add(newMove);
done = true;
} else if (newMove.getRoom() == 6 && !room6.isFull()) {
room6.add(newMove);
done = true;
}
}
The problem now is that the code for reasons I don't completely understand (something with the way I wrote it here) is hardly random. It seems the same people are put into the same rooms almost every time I run the program. For example me, I'm almost always put by this program into room 6 together with another one friend (interestingly, we're both at the end of the Guys array). So how can I make it "truly" random? Or a lot more random than it is now?
Thanks in advance!
Forgot to mention that "rndRoom()" does indeed use the standard Random method (for 4-6) in the background:
public int rndRoom() {
if (this.gender == 'M') {
this.room = (rnd.nextInt((6 - 4) + 1)) + 4;
}
if (this.gender == 'F') {
this.room = (rnd.nextInt(((3 - 1) + 1))) + 1;
}
return this.room;
}

if you want it to be more random try doing something with the Random method, do something like this:
Random random = new Random();
for (int i = 0; i < 6; i++)
{
int roomChoice = random.nextInt(5) + 1;
roomChoice += 1;
}
of course this is not exactly the code you will want to use, this is just an example of how to use the Random method, change it to how you want to use it.
Also, the reason I did random.nextInt(5) + 1; is because if random.nextInt(5) + 1; gets you a random number from 0 to 5, so if you want a number from 1 to 6 you have to add 1, pretty self explanatory.
On another note, to get "truly" random is not as easy as it seems, when you generate a "random" number it will use something called Pseudo random number generation, this, is basically these programs produce endless strings of single-digit numbers, usually in base 10, known as the decimal system. When large samples of pseudo-random numbers are taken, each of the 10 digits in the set {0,1,2,3,4,5,6,7,8,9} occurs with equal frequency, even though they are not evenly distributed in the sequence.

There might be something wrong with code you didn't post.
I've build a working example with what your classes might be, and it is distributing pretty randomly:
http://pastebin.com/u8sZRxi6

OK so I figured out why the results don't seem very random. So the room sorter works based on an alphabetical people list of 18 guys. There are only 3 guy rooms (rooms 4, 5 and 6) So each guy has a 1 in 3 chance to be put in say, room 6. But each person could only possibly be in 2 of the 6 spots in each room (depending on where they are in the list).
The first two people for example, could each only be in either the first or second spot of each room. By "spot" I mean their place in the room list which is printed in the end. Me on the other hand am second last on the list, so at that point I could only be in either the last or second last spot of each room.
Sorry if it's confusing but I figured out this is the reason the generated room lists don't appear very random - it's because only the same few people could be put in each room spot every time. The lists are random though, it's just the order in which people appear in each list which is not random.
So in order to make the lists look more random I had to make people's positions in the room random too. So the way I solved this is by adding a shuffler action which mixes the Person arrays:
public static void shuffle(Person[] arr) {
Random rgen = new Random();
for (int i = 0; i < arr.length; i++) {
int randPos = rgen.nextInt(arr.length);
Person tmp = arr[i];
arr[i] = arr[randPos];
arr[randPos] = tmp;
}
}
TL;DR the generated room lists were random - but since the order of the people that got put into the rooms wasn't random the results didn't look very random. In order to solve this I shuffled the Person arrays.

How to create statistics from output of Java code?

Summary: I need a function based on the output. The problem is
connecting Eclipse or a Java code with another software.
I'm studying Physics. I needed a code that works the following way:
first, it declares a random number n;
then it outputs a "winner" number (based on some rules; the code
itself is irrelevant now I think), 20 times (but should be more,
first I need something to record the values, though).
I have n and 20 other numbers which are each between 1 and n (including 1 and n). I want, after compiling the code once, to see the 20 values, how they are distributed (for example, are they around one particular number, or in a region, is there a pattern (this is based on the rules, of course)).
The problem is, I'm only familiar with basic Java (I used eclipse), and have no clue on how I should register for example 2000 instead of the 20 numbers (so for an n number the code should print 2000 or more numbers, which should appear on a function: domain of the function is 1, 2, ..., n, and range is 0, 1, ..., 2000, as it might happen that all 2000 numbers are the same). I thought of Excel, but how could I connect a Java code with it? Visual interpretation is not necessary, although it would make my work easier (I hope, at least).
The code:
import java.util.Random;
public class korbeadosjo {
public static void main(String Args[]){
Random rand = new Random();
int n = (rand.nextInt(300)+2);
System.out.println("n= " + n);
int narrayban = n-1;
int jatekmester = n/2;
int jatekmesterarrayban = jatekmester-1;
System.out.println("n/2: " + jatekmester);
for(int i=0; i<400; i++){
int hanyembernelvoltmar = 1;
int voltmar[] = new int[n];
voltmar[jatekmesterarrayban]=1;
int holvan=jatekmester;
int holvanarrayban = holvan-1;
fori: for(;;){
int jobbravagybalra = rand.nextInt(2);
switch(jobbravagybalra){
case 0: //balra
if(holvanarrayban ==0){
holvanarrayban = narrayban;
}else {
--holvanarrayban;
};
if(voltmar[holvanarrayban]==0){
voltmar[holvanarrayban] =1;
++hanyembernelvoltmar;
}
break;
case 1: //jobbra
if(holvanarrayban == narrayban){
holvanarrayban = 0;
} else {++holvanarrayban;};
if(voltmar[holvanarrayban]==0){
voltmar[holvanarrayban]=1;
++hanyembernelvoltmar;
}
break;
}if(hanyembernelvoltmar==n){
System.out.println(holvanarrayban+1);
break fori;
}}}}}

basic Java (I used eclipse)
Unrelated.
I could only find two prompts in your question:
How to create statistics from output of Java code?
You are likely not wanting to get the output alone. Use those numbers in your Java program to find what you want and output it.
How did you store 2000 values? An array, list, queue...? So also iterate on that data structure and generate the statistics you need.
I thought of Excel, but how could I connect a Java code with it?
There is this site.

Java and Increasing the Efficiency of Genetic Algorithms

I was wondering if I could get some advice on increasing the overall efficiency of a program that implements a genetic algorithm. Yes this is an assignment question, but I have already completed the assignment on my own and am simply looking for a way to get it to perform better
Problem Description
My program at the moment reads a given chain made of the types of constituents, h or p. (For example: hphpphhphpphphhpphph) For each H and P it generated a random move (Up, Down, Left, Right) and adds the move to an arrayList contained in the "Chromosome" Object. At the start the program is generating 19 moves for 10,000 Chromosomes
SecureRandom sec = new SecureRandom();
byte[] sbuf = sec.generateSeed(8);
ByteBuffer bb = ByteBuffer.wrap(sbuf);
Random numberGen = new Random(bb.getLong());
int numberMoves = chromosoneData.length();
moveList = new ArrayList(numberMoves);
for (int a = 0; a < numberMoves; a++) {
int randomMove = numberGen.nextInt(4);
char typeChro = chromosoneData.charAt(a);
if (randomMove == 0) {
moveList.add(Move.Down);
} else if (randomMove == 1) {
moveList.add(Move.Up);
} else if (randomMove == 2) {
moveList.add(Move.Left);
} else if (randomMove == 3) {
moveList.add(Move.Right);
}
}
After this comes the selection of chromosomes from the Population to crossover. My crossover function selections the first chromosome at random from the fittest 20% of the population and the other at random from outside of the top 20%. The chosen chromosomes are then crossed and a mutation function is called. I believe the area in which I am taking the biggest hit is calculating the fitness of each Chromosome. Currently my fitness function creates a 2d Array to act as a grid, places the moves in order from the move list generated by the function shown above, and then loops through the array to do the fitness calculation. (I.E. found and H at location [2,1] is Cord [1,1] [3,1] [2,0] or [2,2] also an H and if an H is found it just increments the count of bonds found)
After the calculation is complete the least fit chromosome is removed from my population and the new one is added and then the array list of chromosomes is sorted. Rinse and repeat until target solution is found
If you guys want to see more of my code to prove I actually did the work before asking for help just let me know (dont want to post to much so other students cant just copy pasta my stuff)
As suggested in the comments I have ran the profiler on my application (have never used it before, only a first year CS student) and my initial guess on where i am having issues was somewhat incorrect. It seems from what the profiler is telling me is that the big hotspots are:
When comparing the new chromosome to the others in the population to determine its position. I am doing this by implementing Comparable:
public int compareTo(Chromosome other) {
if(this.fitness >= other.fitness)
return 1;
if(this.fitness ==other.fitness )
return 0;
else
return -1;
}
The other area of issue described is in my actual evolution function, consuming about 40% of the CPU time. A codesample from said method below
double topPercentile = highestValue;
topPercentile = topPercentile * .20;
topPercentile = Math.ceil(topPercentile);
randomOne = numberGen.nextInt((int) topPercentile);
//Lower Bount for random two so it comes from outside of top 20%
int randomTwo = numberGen.nextInt(highestValue - (int) topPercentile);
randomTwo = randomTwo + 25;
//System.out.println("Selecting First: " + randomOne + " Selecting Second: " + randomTwo);
Chromosome firstChrom = (Chromosome) populationList.get(randomOne);
Chromosome secondChrom = (Chromosome) populationList.get(randomTwo);
//System.out.println("Selected 2 Chromosones Crossing Over");
Chromosome resultantChromosome = firstChrom.crossOver(secondChrom);
populationList.add(resultantChromosome);
Collections.sort(populationList);
populationList.remove(highestValue);
Chromosome bestResult = (Chromosome) populationList.get(0);
The other main preformance hit is the inital population seeding which is performed by the first code sample in the post

I believe the area in which I am taking the biggest hit is calculating the fitness of each Chromosome
If you are not sure then I assume you have not run a profiler on the program yet.
If you want to improve the performance, profiling is the first thing you should do.

Instead of repeatedly sorting your population, use a collection that maintains its contents already sorted. (e.g. TreeSet)

If your fitness measure is consistent across generations (i.e. not dependent on other members of the population) then I hope at least that you are storing that in the Chromosome object so you only calculate it once for each member of the population. With that in place you'd only be calculating fitness on the newly generated/assembled chromosome each iteration. Without more information on how fitness if calculated it's difficult to be able to offer any optimisations in that area.

Your random number generator seed doesn't need to be cryptographically strong.
Random numberGen = new Random();

A minor speedup when seeding your population is to remove all the testing and branching:
static Move[] moves = {Move.Down, Move.Up, Move.Left, Move.Right};
...
moveList.add(moves[randomMove]);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Doing a Monte Carlo Analysis of the Birthday Paradox using a HashSet - java

Related

I'm working on Euler 12 , the code i have seems to workes properly but too slow , very very slow. How can i modify it to run faster?

Random.nextInt returns contiguous same-value integers (frequently)

How to make my room sorter more random?

How to create statistics from output of Java code?

Java and Increasing the Efficiency of Genetic Algorithms

Categories

Resources