What is the randomness of Java.nextFloat()

What is the randomness of Java.nextFloat() - java

Specifically, if used in the form of:
Random.nextFloat() * N;
can I expect a highly randomized distribution of values from 0 to N?
Would it be better to do something like this?
Random.nextInt(N) * Random.nextFloat();

A single random number from a good generator--and java.util.Random is a good one--will be evenly distributed across the range... it will have a mean and median value of 0.5*N. 1/4 of the numbers will be less than 0.25*N and 1/4 of the numbers will be larger than 0.75*N, etc.
If you then multiply this by another random number generator (whose mean value is 0.5), you will end up with a random number with a mean value of 0.25*N and a median value of 0.187*N... So half your numbers are less than 0.187*N! 1/4 of the numbers will be under .0677*N! And only 1/4 of the numbers will be over 0.382*N. (Numbers obtained experimentally by looking at 1,000,000 random numbers generated as the product of two other random numbers, and analyzing them.)
This is probably not what you want.

At first, Random in Java doesn't contain rand() method. See docs. I think you thought about Random.next() method.
Due to your question, documentation says that nextFloat() is implemented like this:
public float nextFloat() {
return next(24) / ((float)(1 << 24));
}
So you don't need to use anything else.

Random#nextFloat() will give you an evenly distributed number between 0 and 1.
If you take an even distribution and multiply it by N, you scale the distribution up evenly. So you get a random number between 0 and N evenly distributed.
If you multiply this by a random number between 0 and N, then you'll get an uneven distribution. If multiplying by N gives you an even distribution between 0 and N, then multiplying by a number between 0 and N, must give you an answer that is less or equal to if you just multiplied by N. So your numbers on average are smaller.

Related

Smart algorithm to randomize a Double in range but with odds

I use the following function to generate a random double in a specific range :
nextDouble(1.50, 7.00)
However, I've been trying to come up with an algorithm to make the randomization have higher probability to generate a double that is close to the 1.50 than it is to 7.00. Yet I don't even know where it starts. Anything come to mind ?
Java is also welcome.

You should start by discovering what probability distribution you need. Based on your requirements, and assuming that random number generations are independent, perhaps Poisson distribution is what you are looking for:
a call center receives an average of 180 calls per hour, 24 hours a day. The calls are independent; receiving one does not change the probability of when the next one will arrive. The number of calls received during any minute has a Poisson probability distribution with mean 3: the most likely numbers are 2 and 3 but 1 and 4 are also likely and there is a small probability of it being as low as zero and a very small probability it could be 10.
The usual probability distributions are already implemented in libraries e.g. org.apache.commons.math3.distribution.PoissonDistribution in Apache Commons Math3.

I suggest to not think about this problem in terms of generating a random number with irregular probability. Instead, think about generating a random number normally in a some range, but then map this range into another one in non-linear way.
Let's split our algorithm into 3 steps:
Generate a random number in [0, 1) range linearly (so using a standard random generator).
Map it into another [0, 1) range in non-linear way.
Map the resulting [0, 1) into [1.5, 7) linearly.
Steps 1. and 3. are easy, the core of our algorithm is 2. We need a way to map [0, 1) into another [0, 1), but non-linearly, so e.g. 0.7 does not have to produce 0.7. Classic math helps here, we just need to look at visual representations of algebraic functions.
In your case you expect that while the input number increases from 0 to 1, the result first grows very slowly (to stay near 1.5 for a longer time), but then it speeds up. This is exactly how e.g. y = x ^ 2 function looks like. Your resulting code could be something like:
fun generateDouble(): Double {
val step1 = Random.nextDouble()
val step2 = step1.pow(2.0)
val step3 = step2 * 5.5 + 1.5
return step3
}
or just:
fun generateDouble() = Random.nextDouble().pow(2.0) * 5.5 + 1.5
By changing the exponent to bigger numbers, the curve will be more aggressive, so it will favor 1.5 more. By making the exponent closer to 1 (e.g. 1.4), the result will be more close to linear, but still it will favor 1.5. Making the exponent smaller than 1 will start to favor 7.
You can also look at other algebraic functions with this shape, e.g. y = 2 ^ x - 1.

What you could do is to 'correct' the random with a factor in the direction of 1.5. You would create some sort of bias factor. Like this:
#Test
void DoubleTest() {
double origin = 1.50;
final double fiarRandom = new Random().nextDouble(origin, 7);
System.out.println(fiarRandom);
double biasFactor = 0.9;
final double biasedDiff = (fiarRandom - origin) * biasFactor;
double biasedRandom = origin + biasedDiff;
System.out.println(biasedRandom);
}
The lower you set the bias factor (must be >0 & <= 1), the stronger your bias towards 1.50.

You can take a straightforward approach. As you said you want a higher probability of getting the value closer to 1.5 than 7.00, you can even set the probability. So, here their average is (1.5+7)/2 = 4.25.
So let's say I want a 70% probability that the random value will be closer to 1.5 and a 30% probability closer to 7.
double finalResult;
double mid = (1.5+7)/2;
double p = nextDouble(0,100);
if(p<=70) finalResult = nextDouble(1.5,mid);
else finalResult = nextDouble(mid,7);
Here, the final result has 70% chance of being closer to 1.5 than 7.
As you did not specify the 70% probability you can even make it random.
you just have to generate nextDouble(50,100) which will give you a value more than or equal 50% and less than 100% which you can use later to apply this probability for your next calculation. Thanks

I missed that I am using the same solution strategy as in the reply by Nafiul Alam Fuji. But since I have already formulated my answer, I post it anyway.
One way is to split the range into two subranges, say nextDouble(1.50, 4.25) and nextDouble(4.25, 7.0). You select one of the subranges by generating a random number between 0.0 and 1.0 using nextDouble() and comparing it to a threshold K. If the random number is less than K, you do nextDouble(1.50, 4.25). Otherwise nextDouble(4.25, 7.0).
Now if K=0.5, it is like doing nextDouble(1.50, 7). But by increasing K, you will do nextDouble(1.50, 4.25) more often and favor it over nextDouble(4.25, 7.0). It is like flipping an unfair coin where K determines the extent of the cheating.

math question about random (x) and random() % x - Java [duplicate]

This question already has answers here:
Why do people say there is modulo bias when using a random number generator?
(10 answers)
Closed 2 years ago.
so my question is at Java but it can be in any programming language.
there is this declaration :
Random rnd = new Random();
We want to get a random number at range 0 to x
I want to know if there is any mathematical difference between the following:
rnd.nextInt() % x;
and
rnd.nextInt(x)
The main question is, are one of these solutions more random than the other? Is one solution more appropriate or "correct" than the other? If they are equal I will be happy to see the mathematics proof for it

Welcome to "mathematical insight" with "MS Paint".
So, from a statistical standpoint, it would depend on the distribution of the numbers being generated. First of all, we'll treat the probability of any one number coming up as an independant event (aka discarding the seed, which RNG, etc). Following that, a modulus simply takes a range of numbers (e.g. a from N, where 0<=a<N), and subdivides them based on the divisor (the x in a % x). While the numbers are technically from a discrete population (integers), the range of integers for a probability mass function would be so large that it'd end up looking like a continuous graph anyhow. So let's consider a graph of the probability distribution function for a range of numbers:
If your random number generator doesn't generate with a uniform distribution across the range of numbers (aka, any number is as likely to come up as another number), then modulo would (potentially) be breaking up the results of a non-uniform distribution. When you consider the individual integers in those ranges as discrete (and individual) outcomes, the probability of any number i (0 <= i < x) being the result is the multiplication of the individual probabilities (i_1 * i_2 * ... * i_(N/x)). To think of it another way, if we overlaid the subdivisions of the ranges, it's plain to see that in non-symmetric distributions, it's much more likely that a modulo would not result in equally likely outcomes:
Remember, the likelihood of an outcome i in the graph above would be achieved through multiplying the likelihood of the individuals numbers (i_1, ..., i_(N/x)) in the range N that could result in i. For further clarity, if your range N doesn't evenly divide by the modular divisor x, there will always be some amount of numbers N % x that will have 1 addditional integer that could produce their result. This means that most modulus divisors that aren't a power of 2 (and similarly, ranges that are not a multiple of their divisor) could be skewed towards their lower results, regardless of having a uniform distribution:
So to summarize the point, Random#nextInt(int bound) takes all of these things (and more!) into consideration, and will consistently produce an outcome with uniform probability across the range of bound. Random#nextInt() % bound is only a halfway step that works in some specific scenarios. To your teacher's point, I would argue it's more likely you'll see some specific subset of numbers when using the modulus approach, not less.

new Random(x) just creates the Random object with the given seed, it does not ifself yield a random value.
I presume you are asking what the difference is between nextInt() % x and nextInt(x).
The difference is as follows.
nextInt(x)
nextInt(x) yields a random number n where 0 ≤ n < x, evenly distributed.
nextInt() % x
nextInt() % x yields a random number in the full integer range1, and then applies modulo x. The full integer range includes negative numbers, so the result could also be a negative number. With other words, the range is −x < n < x.
Furthermore, the distribution is not even in by far the most cases. nextInt() has 232 possibilities, but, for simplicity's sake, let's assume it has 24 = 16 possibilities, and we choose x not to be 16 or greater. Let's assume that x is 10.
All possibilities are 0, 1, 2, …, 14, 15, 16. After applying the modulo 10, the results are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5. That means that some numbers have a greater likelihood to occur than others. That also means that the change of some numbers occurring twice has increased.
As we see, nextInt() % x has two problems:
Range is not as required.
Uneven distribution.
So you should definetely use nextInt(int bound) here. If the requirement is get only unique numbers, you must exclude the numbers already drawn from the number generator. See also Generating Unique Random Numbers in Java.
1 According to the Javadoc.

java random number generator and their distribution

I am relying on Java's Random class, specifically on the nextInt method to generate N random numbers. I do not know what N will be ahead of time, it is decided on the fly.
One of the requirements of my todo list is to have the random numbers be representative of the distribution.
For example, if N=100, in the range from 1-100 there should be 10 (approximately) numbers between 1-10, 20 numbers between 1-20 etc.
But N can potentially grow on the fly from 100 to 100,000 and as such the distribution of the generated randoms should adjust on the fly to represent 100,000 generated numbers between 1-100
I'm not sure if this is possible, hope it makes sense what I am trying to achieve.

You appear to be describing a uniform distribution.
Looking at the Javadoc of Random.nextInt(int):
Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive)
So, just use Random.nextInt, passing N as the parameter, and add 1 to the result to get it 1 to N instead of 0 to (N-1).

Why the random function in java is always generating high values?

I am implementing a test data generator in java that uses to generate random values for java primitive types.
The range of possible parameters values is not limited. For example, if I want to generate a random integer or float I will consider all possible values (MAX_INT-MIN_INT). To do so, I am using stuff like :
Random().nextInt()
Random().nextLong()
Random().nextFloat()*Float.MAX_VALUE
Random().nextDouble()*Double.MAX_VALUE
And so on...
However, doing like this, I note that the generated values are always high (close to the max and low value of the parameter type). After 100000 iteration for example, the random operator didn't generate a value in the range [-1000 - 1000]. The same thing for floats, longs. etc,...
Can you give me an explanation of how the random operator is performing in Java? Why the generated values are always high when we consider all possible values of the Java type?
Thanks in advance.

Your preception of "high" and "low" is wrong.
The probability of a single value (assuming uniform distribution) to be in [-1000,1000] is 2001/(MAX_INT-MIN_INT), which is around 0.00000046.
This probability is extremely small, and thus also the expected number of "small" variables will be small.
In fact, in uniform distribution over [MIN_INT,MAX_INT], approximately half of the element will be positive - and half negative.
Similarly, only quarter of them will be between 0 to MAX_INT/2 (which is much higher than 1000 as you know).
If you want more "low" values, narrow yourself to smaller range of elements, or use non uniform distribution that is expected to generate more values closer to 0 (gaussian for exmaple).
Have a look at this code snippest:
int count1 = 0, count2=0;
for (int i = 0; i < 10000; i++) {
float x = genFloat(null);
if (x < 1E38 && x > 0) count1++;
if (x > Float.MAX_VALUE - 1E38) count2++;
}
System.out.println(count1);
System.out.println(count2);
It generates 10000 random floats, and checks how much are in [0,1E38], and how much are in [MAX-1E38,MAX]
Note that when talking about floats, the theoretical probability of each is ~1/(2*MAX) ~= 14.7%.
And as you can see, both "close to 0" and "close to MAX" in the same range has similar empirical number of variables produced in their ranges.

How to be sure that random numbers are unique and not duplicated?

I have a simple code which generates random numbers
SecureRandom random = new SecureRandom();
...
public int getRandomNumber(int maxValue) {
return random.nextInt(maxValue);
}
The method above is called about 10 times (not in a loop). I want to ensure that all the numbers are unique (assuming that maxValue > 1000).
Can I be sure that I will get unique numbers every time I call it? If not, how can I fix it?
EDIT: I may have said it vaguely. I wanted to avoid manual checks if I really got unique numbers so I was wondering if there is a better solution.

There are different ways of achieving this and which is more appropriate will depend on how many numbers you need to pick from how many.
If you are selecting a small number of random numbers from a large range of potential numbers, then you're probably best just storing previously chosen numbers in a set and "manually" checking for duplicates. Most of the time, you won't actually get a duplicate and the test will have practically zero cost in practical terms. It might sound inelegant, but it's not actually as bad as it sounds.
Some underlying random number generation algorithms don't produce duplicates at their "raw" level. So for example, an algorithm called a XORShift generator can effectively produce all of the numbers within a certain range, shuffled without duplicates. So you basically choose a random starting point in the sequence then just generate the next n numbers and you know there won't be duplicates. But you can't arbitrarily choose "max" in this case: it has to be the natural maximum of the generator in question.
If the range of possible numbers is small-ish but the number of numbers you need to pick is within a couple of orders of magnitude of that range, then you could treat this as a random selection problem. For example, to choose 100,000 numbers within the range 10,000,000 without duplicates, I can do this:
Let m be the number of random numbers I've chosen so far
For i = 1 to 10,000,000
Generate a random (floating point) number, r, in the range 0-1
If (r < (100,000-m)/(10,000,000-i)), then add i to the list and increment m
Shuffle the list, then pick numbers sequentially from the list as required
But obviously, there's only much point in choosing the latter option if you need to pick some reasonably large proportion of the overall range of numbers. For choosing 10 numbers in the range 1 to a billion, you would be generating a billion random numbers when by just checking for duplicates as you go, you'd be very unlikely to actually get a duplicate and would only have ended up generating 10 random numbers.

A random sequence does not mean that all values are unique. The sequence 1,1,1,1 is exactly as likely as the sequence 712,4,22,424.
In other words, if you want to be guaranteed a sequence of unique numbers, generate 10 of them at once, check for the uniqueness condition of your choice and store them, then pick a number from that list instead of generating a random number in your 10 places.

Every time you call Random#nextInt(int) you will get
a pseudorandom, uniformly distributed int value between 0 (inclusive)
and the specified value (exclusive).
If you want x unique numbers, keep getting new numbers until you have that many, then select your "random" number from that list. However, since you are filtering the numbers generated, they won't truly be random anymore.

For such a small number of possible values, a trivial implementation would be to put your 1000 integers in a list, and have a loop which, at each iteration, generates a random number between 0 and list.size(), pick the number stored at this index, and remove it from the list.

This is code is very efficient with the CPU at the cost of memory. Each potiental value cost sizeof(int) * maxValue. An unsigned integer will work up to 65535 as a max. long can be used at the cost of a lot of memory 2000 bytes for 1000 values of 16 bit integers.
The whole purpose of the array is to say have you used this value before or not 1 = yes
'anything else = no
'The while loop will keep generating random numbers until a unique value is found.
'after a good random value is found it marks it as used and then returns it.
'Be careful of the scope of variable a as if it goes out of scope your array could erased.
' I have used this in c and it works.
' may take a bit of brushing up to get it working in Java.
unsigned int a(1000);
public int getRandomNumber(int maxValue) {
unsigned int rand;
while(a(rand)==1) {
rand=random.nextInt(maxValue);
if (a(rand)!=1) { a(rand)=1; return rand;}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.