I am implementing a test data generator in java that uses to generate random values for java primitive types.
The range of possible parameters values is not limited. For example, if I want to generate a random integer or float I will consider all possible values (MAX_INT-MIN_INT). To do so, I am using stuff like :
Random().nextInt()
Random().nextLong()
Random().nextFloat()*Float.MAX_VALUE
Random().nextDouble()*Double.MAX_VALUE
And so on...
However, doing like this, I note that the generated values are always high (close to the max and low value of the parameter type). After 100000 iteration for example, the random operator didn't generate a value in the range [-1000 - 1000]. The same thing for floats, longs. etc,...
Can you give me an explanation of how the random operator is performing in Java? Why the generated values are always high when we consider all possible values of the Java type?
Thanks in advance.
Your preception of "high" and "low" is wrong.
The probability of a single value (assuming uniform distribution) to be in [-1000,1000] is 2001/(MAX_INT-MIN_INT), which is around 0.00000046.
This probability is extremely small, and thus also the expected number of "small" variables will be small.
In fact, in uniform distribution over [MIN_INT,MAX_INT], approximately half of the element will be positive - and half negative.
Similarly, only quarter of them will be between 0 to MAX_INT/2 (which is much higher than 1000 as you know).
If you want more "low" values, narrow yourself to smaller range of elements, or use non uniform distribution that is expected to generate more values closer to 0 (gaussian for exmaple).
Have a look at this code snippest:
int count1 = 0, count2=0;
for (int i = 0; i < 10000; i++) {
float x = genFloat(null);
if (x < 1E38 && x > 0) count1++;
if (x > Float.MAX_VALUE - 1E38) count2++;
}
System.out.println(count1);
System.out.println(count2);
It generates 10000 random floats, and checks how much are in [0,1E38], and how much are in [MAX-1E38,MAX]
Note that when talking about floats, the theoretical probability of each is ~1/(2*MAX) ~= 14.7%.
And as you can see, both "close to 0" and "close to MAX" in the same range has similar empirical number of variables produced in their ranges.
Related
I want to do something if there's a 60% chance (only by using Math.random()). For example, a situation where every value in an array has a 60% chance to be set to 1, otherwise 0.
My confusion is whether we should check
if (Math.random() < 0.6) { ... }
or
if (Math.random() <= 0.6) { ... }
My thought is that it should be the first because 0.0 counts as a possible returned value, but I would like to confirm which one I should use and why.
Use the < operator.
While this is a simplification, consider that computers store fractions by adding terms of the form 2−n. Most real numbers can't be represented exactly, and literals like 0.6 are converted to the nearest double value from a finite set.
Likewise, the random() function can't generate real values from a continuous range between zero and one. Instead, it chooses an integer from N elements in the range 0 to N − 1, then divides it by N to yield a (rational) result in the range [0, 1). If you want to satisfy a condition with a probability of P, it should be true for P ⋅ N elements from the set of N possibilties.
Since zero is counted as one of the elements, the maximum result value that should be included is ( P ⋅ N − 1 ) / N. Or, in other words, we should exclude P ⋅ N / N, i.e., P.
That exclusion of P is what leads to the use of <.
It might be easier to reason about when you consider how you'd use a method like nextInt(), There, the effect of zero on a small range is more obvious, and your expression would clearly use the < operator: current().nextInt(5) < 3. That wouldn't change if you divide the result: current().nextInt(5) / 5.0 < 0.6. The difference between nextInt(5) / 5.0 and random() is only that the latter has many more, much smaller steps between 0 and 1.
I apologize for misleading people with my original answer, and thank the user who straightened me out.
Use <. If you ever want to set your odds to 0%, then you might change your comparison value to 0. Since Math.random() can return 0, using <= would not result in a 0% chance. As others have noted, the chance of Math.random() generating the exact number you are testing against is extremely low, so for all values other than 0 you will never notice a difference.
This question already has answers here:
Why do people say there is modulo bias when using a random number generator?
(10 answers)
Closed 2 years ago.
so my question is at Java but it can be in any programming language.
there is this declaration :
Random rnd = new Random();
We want to get a random number at range 0 to x
I want to know if there is any mathematical difference between the following:
rnd.nextInt() % x;
and
rnd.nextInt(x)
The main question is, are one of these solutions more random than the other? Is one solution more appropriate or "correct" than the other? If they are equal I will be happy to see the mathematics proof for it
Welcome to "mathematical insight" with "MS Paint".
So, from a statistical standpoint, it would depend on the distribution of the numbers being generated. First of all, we'll treat the probability of any one number coming up as an independant event (aka discarding the seed, which RNG, etc). Following that, a modulus simply takes a range of numbers (e.g. a from N, where 0<=a<N), and subdivides them based on the divisor (the x in a % x). While the numbers are technically from a discrete population (integers), the range of integers for a probability mass function would be so large that it'd end up looking like a continuous graph anyhow. So let's consider a graph of the probability distribution function for a range of numbers:
If your random number generator doesn't generate with a uniform distribution across the range of numbers (aka, any number is as likely to come up as another number), then modulo would (potentially) be breaking up the results of a non-uniform distribution. When you consider the individual integers in those ranges as discrete (and individual) outcomes, the probability of any number i (0 <= i < x) being the result is the multiplication of the individual probabilities (i_1 * i_2 * ... * i_(N/x)). To think of it another way, if we overlaid the subdivisions of the ranges, it's plain to see that in non-symmetric distributions, it's much more likely that a modulo would not result in equally likely outcomes:
Remember, the likelihood of an outcome i in the graph above would be achieved through multiplying the likelihood of the individuals numbers (i_1, ..., i_(N/x)) in the range N that could result in i. For further clarity, if your range N doesn't evenly divide by the modular divisor x, there will always be some amount of numbers N % x that will have 1 addditional integer that could produce their result. This means that most modulus divisors that aren't a power of 2 (and similarly, ranges that are not a multiple of their divisor) could be skewed towards their lower results, regardless of having a uniform distribution:
So to summarize the point, Random#nextInt(int bound) takes all of these things (and more!) into consideration, and will consistently produce an outcome with uniform probability across the range of bound. Random#nextInt() % bound is only a halfway step that works in some specific scenarios. To your teacher's point, I would argue it's more likely you'll see some specific subset of numbers when using the modulus approach, not less.
new Random(x) just creates the Random object with the given seed, it does not ifself yield a random value.
I presume you are asking what the difference is between nextInt() % x and nextInt(x).
The difference is as follows.
nextInt(x)
nextInt(x) yields a random number n where 0 ≤ n < x, evenly distributed.
nextInt() % x
nextInt() % x yields a random number in the full integer range1, and then applies modulo x. The full integer range includes negative numbers, so the result could also be a negative number. With other words, the range is −x < n < x.
Furthermore, the distribution is not even in by far the most cases. nextInt() has 232 possibilities, but, for simplicity's sake, let's assume it has 24 = 16 possibilities, and we choose x not to be 16 or greater. Let's assume that x is 10.
All possibilities are 0, 1, 2, …, 14, 15, 16. After applying the modulo 10, the results are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5. That means that some numbers have a greater likelihood to occur than others. That also means that the change of some numbers occurring twice has increased.
As we see, nextInt() % x has two problems:
Range is not as required.
Uneven distribution.
So you should definetely use nextInt(int bound) here. If the requirement is get only unique numbers, you must exclude the numbers already drawn from the number generator. See also Generating Unique Random Numbers in Java.
1 According to the Javadoc.
I have a collection of double values, each representing a probability (i.e. [0-1] range).
I want to add an uniformly distributed noise to these values. The values, after the modification, must still represent probabilities ([0-1] range).
Additionally, I want to be able to specify a desired range for the perturbation independently for each value: for some probabilities i would like to modify them by a small percentage and for others I would like to change them by a large percentage.
I'm unsure about how to proceed: I can only imagine swapping probabilities between different items.
I have considered to use java.Util.Random but I dont see any methods I can use.
Can anyone illustrate a possible algorithm to achieve this?
If I understand you correctly, you are trying to add a random, uniformly distributed noise to a set of values (which are probabilities, and therefore must remain between 0 and 1).
Suppose your original value is:
double myProbability;
and you want to add an uniform noise with max range:
double desiredDelta;
You can, at most, vary it such that it doesn't become > 1 or < 0:
double maxDelta = Math.min(myProbability, 1-myProbability);
At this point, you need to define what the actual range of your noise will be:
double actualDelta = Math.min(desiredDelta, maxDelta);
Now you need to either add or remove a random value between 0 and actualDelta:
boolean noiseIsAdded = Math.random() > 0.5; //50% prob add, 50% prob subtract
double noise = Math.random() * actualDelta; //scales down the 0-1 range to 0-actualDelta
double newProbability;
if (noiseIsAdded) newProbability = myProbability + noise;
else newProbability = myProbability - noise;
And you should have it.
Specifically, if used in the form of:
Random.nextFloat() * N;
can I expect a highly randomized distribution of values from 0 to N?
Would it be better to do something like this?
Random.nextInt(N) * Random.nextFloat();
A single random number from a good generator--and java.util.Random is a good one--will be evenly distributed across the range... it will have a mean and median value of 0.5*N. 1/4 of the numbers will be less than 0.25*N and 1/4 of the numbers will be larger than 0.75*N, etc.
If you then multiply this by another random number generator (whose mean value is 0.5), you will end up with a random number with a mean value of 0.25*N and a median value of 0.187*N... So half your numbers are less than 0.187*N! 1/4 of the numbers will be under .0677*N! And only 1/4 of the numbers will be over 0.382*N. (Numbers obtained experimentally by looking at 1,000,000 random numbers generated as the product of two other random numbers, and analyzing them.)
This is probably not what you want.
At first, Random in Java doesn't contain rand() method. See docs. I think you thought about Random.next() method.
Due to your question, documentation says that nextFloat() is implemented like this:
public float nextFloat() {
return next(24) / ((float)(1 << 24));
}
So you don't need to use anything else.
Random#nextFloat() will give you an evenly distributed number between 0 and 1.
If you take an even distribution and multiply it by N, you scale the distribution up evenly. So you get a random number between 0 and N evenly distributed.
If you multiply this by a random number between 0 and N, then you'll get an uneven distribution. If multiplying by N gives you an even distribution between 0 and N, then multiplying by a number between 0 and N, must give you an answer that is less or equal to if you just multiplied by N. So your numbers on average are smaller.
Namely, it will never generate more than 16 even numbers in a row with some specific upperBound parameters:
Random random = new Random();
int c = 0;
int max = 17;
int upperBound = 18;
while (c <= max) {
int nextInt = random.nextInt(upperBound);
boolean even = nextInt % 2 == 0;
if (even) {
c++;
} else {
c = 0;
}
}
In this example the code will loop forever, while when upperBound is, for example, 16, it terminates quickly.
What can be the reason of this behavior? There are some notes in the method's javadoc, but I failed to understand them.
UPD1: The code seems to terminate with odd upper bounds, but may stuck with even ones
UPD2:
I modified the code to capture the statistics of c as suggested in the comments:
Random random = new Random();
int c = 0;
long trials = 1 << 58;
int max = 20;
int[] stat = new int[max + 1];
while (trials > 0) {
while (c <= max && trials > 0) {
int nextInt = random.nextInt(18);
boolean even = nextInt % 2 == 0;
if (even) {
c++;
} else {
stat[c] = stat[c] + 1;
c = 0;
}
trials--;
}
}
System.out.println(Arrays.toString(stat));
Now it tries to reach 20 evens in the row - to get better statistics, and the upperBound is still 18.
The results turned out to be more than surprising:
[16776448, 8386560, 4195328, 2104576, 1044736,
518144, 264704, 132096, 68864, 29952, 15104,
12032, 1792, 3072, 256, 512, 0, 256, 0, 0]
At first it decreases as expected by the factor of 2, but note the last line! Here it goes crazy and the captured statistics seem to be completely weird.
Here is a bar plot in log scale:
How c gets the value 17 256 times is yet another mystery
http://docs.oracle.com/javase/6/docs/api/java/util/Random.html:
An instance of this class is used to generate a stream of
pseudorandom numbers. The class uses a 48-bit seed, which is modified
using a linear congruential formula. (See Donald Knuth, The Art of
Computer Programming, Volume 3, Section 3.2.1.)
If two instances of Random are created with the same seed, and the
same sequence of method calls is made for each, they will generate and
return identical sequences of numbers. [...]
It is a pseudo-random number generator. This means that you are not actually rolling a dice but rather use a formula to calculate the next "random" value based on the current random value. To creat the illusion of randomisation a seed is used. The seed is the first value used with the formula to generate the random value.
Apparently javas random implementation (the "formula"), does not generate more than 16 even numbers in a row.
This behaviour is the reason why the seed is usually initialized with the time. Deepending on when you start your program you will get different results.
The benefits of this approach are that you can generate repeatable results. If you have a game generating "random" maps, you can remember the seed to regenerate the same map if you want to play it again, for instance.
For true random numbers some operating systems provide special devices that generate "randomness" from external events like mousemovements or network traffic. However i do not know how to tap into those with java.
From the Java doc for secureRandom:
Many SecureRandom implementations are in the form of a pseudo-random
number generator (PRNG), which means they use a deterministic
algorithm to produce a pseudo-random sequence from a true random seed.
Other implementations may produce true random numbers, and yet others
may use a combination of both techniques.
Note that secureRandom does NOT guarantee true random numbers either.
Why changing the seed does not help
Lets assume random numbers would only have the range 0-7.
Now we use the following formula to generate the next "random" number:
next = (current + 3) % 8
the sequence becomes 0 3 6 1 4 7 2 5.
If you now take the seed 3 all you do is to change the starting point.
In this simple implementation that only uses the previous value, every value may occur only once before the sequence wraps arround and starts again. Otherwise there would be an unreachable part.
E.g. imagine the sequence 0 3 6 1 3 4 7 2 5. The numbers 0,4,7,2 and 5 would never be generated more than once(deepending on the seed they might be generated never), since once the sequence loops 3,6,1,3,6,1,... .
Simplified pseudo random number generators can be thought of a permutation of all numbers in the range and you use the seed as a starting point. If they are more advanced you would have to replace the permutation with a list in which the same numbers might occur multiple times.
More complex generators can have an internal state, allowing the same number to occur several times in the sequence, since the state lets the generator know where to continue.
The implementation of Random uses a simple linear congruential formula. Such formulae have a natural periodicity and all sorts of non-random patterns in the sequence they generate.
What you are seeing is an artefact of one of these patterns ... nothing deliberate. It is not an example of bias. Rather it is an example of auto-correlation.
If you need better (more "random") numbers, then you need to use SecureRandom rather than Random.
And the answer to "why was it implemented that way is" ... performance. A call to Random.nextInt can be completed in tens or hundreds of clock cycles. A call to SecureRandom is likely to be at least 2 orders of magnitude slower, possibly more.
For portability, Java specifies that implementations must use the inferior LCG method for java.util.Random. This method is completely unacceptable for any serious use of random numbers like complex simulations or Monte Carlo methods. Use an add-on library with a better PRNG algorithm, like Marsaglia's MWC or KISS. Mersenne Twister and Lagged Fibonacci Generators are often OK as well.
I'm sure there are Java libraries for these algorithms. I have a C library with Java bindings if that will work for you: ojrandlib.