How to simulate 60% probability in Java with Math.random() - java

I want to do something if there's a 60% chance (only by using Math.random()). For example, a situation where every value in an array has a 60% chance to be set to 1, otherwise 0.
My confusion is whether we should check
if (Math.random() < 0.6) { ... }
or
if (Math.random() <= 0.6) { ... }
My thought is that it should be the first because 0.0 counts as a possible returned value, but I would like to confirm which one I should use and why.

Use the < operator.
While this is a simplification, consider that computers store fractions by adding terms of the form 2−n. Most real numbers can't be represented exactly, and literals like 0.6 are converted to the nearest double value from a finite set.
Likewise, the random() function can't generate real values from a continuous range between zero and one. Instead, it chooses an integer from N elements in the range 0 to N − 1, then divides it by N to yield a (rational) result in the range [0, 1). If you want to satisfy a condition with a probability of P, it should be true for P ⋅ N elements from the set of N possibilties.
Since zero is counted as one of the elements, the maximum result value that should be included is ( P ⋅ N − 1 ) / N. Or, in other words, we should exclude  P ⋅ N / N, i.e., P.
That exclusion of P is what leads to the use of <.
It might be easier to reason about when you consider how you'd use a method like nextInt(), There, the effect of zero on a small range is more obvious, and your expression would clearly use the < operator: current().nextInt(5) < 3. That wouldn't change if you divide the result: current().nextInt(5) / 5.0 < 0.6. The difference between nextInt(5) / 5.0 and random() is only that the latter has many more, much smaller steps between 0 and 1.
I apologize for misleading people with my original answer, and thank the user who straightened me out.

Use <. If you ever want to set your odds to 0%, then you might change your comparison value to 0. Since Math.random() can return 0, using <= would not result in a 0% chance. As others have noted, the chance of Math.random() generating the exact number you are testing against is extremely low, so for all values other than 0 you will never notice a difference.

Related

Smart algorithm to randomize a Double in range but with odds

I use the following function to generate a random double in a specific range :
nextDouble(1.50, 7.00)
However, I've been trying to come up with an algorithm to make the randomization have higher probability to generate a double that is close to the 1.50 than it is to 7.00. Yet I don't even know where it starts. Anything come to mind ?
Java is also welcome.
You should start by discovering what probability distribution you need. Based on your requirements, and assuming that random number generations are independent, perhaps Poisson distribution is what you are looking for:
a call center receives an average of 180 calls per hour, 24 hours a day. The calls are independent; receiving one does not change the probability of when the next one will arrive. The number of calls received during any minute has a Poisson probability distribution with mean 3: the most likely numbers are 2 and 3 but 1 and 4 are also likely and there is a small probability of it being as low as zero and a very small probability it could be 10.
The usual probability distributions are already implemented in libraries e.g. org.apache.commons.math3.distribution.PoissonDistribution in Apache Commons Math3.
I suggest to not think about this problem in terms of generating a random number with irregular probability. Instead, think about generating a random number normally in a some range, but then map this range into another one in non-linear way.
Let's split our algorithm into 3 steps:
Generate a random number in [0, 1) range linearly (so using a standard random generator).
Map it into another [0, 1) range in non-linear way.
Map the resulting [0, 1) into [1.5, 7) linearly.
Steps 1. and 3. are easy, the core of our algorithm is 2. We need a way to map [0, 1) into another [0, 1), but non-linearly, so e.g. 0.7 does not have to produce 0.7. Classic math helps here, we just need to look at visual representations of algebraic functions.
In your case you expect that while the input number increases from 0 to 1, the result first grows very slowly (to stay near 1.5 for a longer time), but then it speeds up. This is exactly how e.g. y = x ^ 2 function looks like. Your resulting code could be something like:
fun generateDouble(): Double {
val step1 = Random.nextDouble()
val step2 = step1.pow(2.0)
val step3 = step2 * 5.5 + 1.5
return step3
}
or just:
fun generateDouble() = Random.nextDouble().pow(2.0) * 5.5 + 1.5
By changing the exponent to bigger numbers, the curve will be more aggressive, so it will favor 1.5 more. By making the exponent closer to 1 (e.g. 1.4), the result will be more close to linear, but still it will favor 1.5. Making the exponent smaller than 1 will start to favor 7.
You can also look at other algebraic functions with this shape, e.g. y = 2 ^ x - 1.
What you could do is to 'correct' the random with a factor in the direction of 1.5. You would create some sort of bias factor. Like this:
#Test
void DoubleTest() {
double origin = 1.50;
final double fiarRandom = new Random().nextDouble(origin, 7);
System.out.println(fiarRandom);
double biasFactor = 0.9;
final double biasedDiff = (fiarRandom - origin) * biasFactor;
double biasedRandom = origin + biasedDiff;
System.out.println(biasedRandom);
}
The lower you set the bias factor (must be >0 & <= 1), the stronger your bias towards 1.50.
You can take a straightforward approach. As you said you want a higher probability of getting the value closer to 1.5 than 7.00, you can even set the probability. So, here their average is (1.5+7)/2 = 4.25.
So let's say I want a 70% probability that the random value will be closer to 1.5 and a 30% probability closer to 7.
double finalResult;
double mid = (1.5+7)/2;
double p = nextDouble(0,100);
if(p<=70) finalResult = nextDouble(1.5,mid);
else finalResult = nextDouble(mid,7);
Here, the final result has 70% chance of being closer to 1.5 than 7.
As you did not specify the 70% probability you can even make it random.
you just have to generate nextDouble(50,100) which will give you a value more than or equal 50% and less than 100% which you can use later to apply this probability for your next calculation. Thanks
I missed that I am using the same solution strategy as in the reply by Nafiul Alam Fuji. But since I have already formulated my answer, I post it anyway.
One way is to split the range into two subranges, say nextDouble(1.50, 4.25) and nextDouble(4.25, 7.0). You select one of the subranges by generating a random number between 0.0 and 1.0 using nextDouble() and comparing it to a threshold K. If the random number is less than K, you do nextDouble(1.50, 4.25). Otherwise nextDouble(4.25, 7.0).
Now if K=0.5, it is like doing nextDouble(1.50, 7). But by increasing K, you will do nextDouble(1.50, 4.25) more often and favor it over nextDouble(4.25, 7.0). It is like flipping an unfair coin where K determines the extent of the cheating.

Generating random integers uniformly in log space

I want to generate random integers which are uniformly distributed in log space. That is, the log of the values of will be uniformly distributed.
A normal uniformly distributed unsigned int will have 75% of its magnitudes above 1 billion, and something like 99.98% above 1 million, so small values are underrepresented. A uniform value from log space would have the same number of values in the range 4-8, as 256-512, for example.
Ignoring negative values for now, one way I can think of is something like:
Random r = new Random();
return (int)Math.pow(2, r.nextDouble() * 31);
That should generate a 31-bit log-uniformly distributed. It's not going to be fast though, with an pow() operation in there and to introduce floating point values to generate integers is a bit of a smell. Furthermore, a lot of the range of double is lost by Random.nextDouble() and it is not clear to me if this code can even generate all 2^31-1 positive integer values.
Better solutions welcome.
There are two similar solutions below which both involve filling the integer with random bits, then shifting a random number of bits to the right. Something like:
int number = rand.nextInt(Integer.MAX_VALUE) >> rand.nextInt(Integer.SIZE);
This has two types of bias:
Step-wise bias
This produces sort of a stepwise log distributed value, not a smooth one. In particular, the right shift by a random value in [0,31], means there are 31 equally probable "sizes" of integers, and every value in that range is equally probable. Since there are 2^N values in range N, the values in one range are twice as probable as the ones in the next - so you get log behavior between the ranges, but the ranges themselves are flat.
I don't know of an easy way to get rid of this bias.
Top bit bias
A second form of bias occurs because the MSB is not always 1 (e.g., even a shift amount of 10, doesn't necessary produce a 31-10=21 bit value, there is an additional distortion. In effect, the ranges overlap. The value 1 is not just present (with p(1)=.5) for a shift amount of 30, but also for shifts of 29 (p(1)=0.25), 28 (p(1)=.125), and so on. That effect cancels out for smaller values (i.e., if you look at shift amounts of 30 and 29 only, 1 seems like it is 3x more likely than 2, rather than the predicted value of 2x, but once you look at more values it converges. It doesn't cancel out for large values, however, which is why you see the 20:32207 bucket be smaller than the others in #sprinter's answer.
I think this form of bias can pretty easily be removed simply by forcing the top bit to zero, so something like:
(r.nextInt(0x40000000) | 0x40000000) >> r.nextInt(31)
This has a couple of other tweaks - it a max of 2^30 for the rand, which is faster (special case for powers of 2 in nextInt(int) code), since we never want the second-from-MSB bit set anyway (we force it to 1). This also eliminates a microscopic additional source of bias which is that Integer.MAX_VALUE could never be generated, so one value is missing from full representation.
It shifts by [0,31) bits so you never get zero, if you want zeros too, change that to shift by [0,32) bits and you'll get zeros equal in frequency to ones (technically not log-distributed anymore, but useful in many cases). Another approach is to subtract one from the final value to get zeros (at the cost of never getting Integer.MAX_VALUE).
Incorrect answer provided for information only. This does not satisfy OP's requirements for the reasons given in the question.
int number = rand.nextInt(Integer.MAX_VALUE) >> rand.nextInt(Integer.SIZE);
My informal test of that seems to indicate there is the expected skew. I generated 1M numbers this way and had the following distribution of the log (ignoring zeros)
0:46819
1:47045
2:40663
3:44001
4:45306
5:43802
6:46447
7:43355
8:47366
9:42747
10:46387
11:43899
12:45179
13:45496
14:44431
15:46751
16:43055
17:47127
18:41243
19:41837
20:32207
21:11965

Do an action with some probability in java

In Java, I am trying to do an action with a probability p. p is a float variable in my code. I came up with this way of doing it:
if( new Random().nextFloat() < p)
do action
I wanted to confirm if this is the correct way of doing it.
There is a TL;DR at the end.
From javadocs for nextFloat() (emphasis by me):
public float nextFloat()
Returns the next pseudorandom, uniformly distributed float value
between 0.0 and 1.0 from this random number generator's sequence.
If you understand what uniform distribution is, knowing this about nextFloat() is going to be enough for you. Yet I am going to explain a little about uniform distribution.
In uniform distribution, U(a,b) each number in the interval [a,b], and also all sub-intervals of the same length within [a,b] are equally probable, i.e. they have equal probability.
In the figure, on the left is the PDF, and on the right the CDF for uniform distribution.
For uniform distribution, the probability of getting a number less than or equal to n, P(x <= n) from the distribution is equal to the number itself (look at the right graph, which is cumulative distribution function for uniform distribution). That is, P(x <= 0.5) = 0.5, P(x <= 0.9) = 0.9. You can learn more about uniform distribution from any good statistics book, or some googling.
Fitting to your situation:
Now, probability of getting a number less than or equal to p generated using nextFloat() is equal to p, as nextFloat() returns uniformly distributed number. So, to make an action happen with a probability equal to p all you have to do is:
if (condition that is true with a probability p) {
do action
}
From what is discussed about nextFloat() and uniform distribution, it turns out to be:
if(randObj.nextFloat() <= p) {
do action
}
Conclusion:
What you did is almost the right way to do what you intended. Just adding the equal sign after < is all that's needed, and it doesn't hurt much to leave out the equal sign either!
P.S.: You don't need to create a new Random object each time in your conditional, you can create one, say randObj before your loop, and then invoke its nextFloat() method whenever you want to generate a random number, as I have done in my code.
Comment by pjs:
Take a look at the comment on the question by pjs, which is very important and well said. I quote:
Do not create a new Random object each time, that's not how PRNGs are
meant to be used! A single Random object provides a sequence of values
with good distributional properties. Multiple Random objects created
in rapid succession are 1) computationally expensive, and 2) may have
highly correlated initial states, thus producing highly correlated
outcomes. Random actually works best when you create a single instance
per program and keep drawing from it, unless you really really know
what you're doing and have specific reasons for using correlation
induction strategies.
TL;DR
What you did is almost the right way to do it. Just adding the equal sign after < (to make it <=) is all that's needed, and it doesn't hurt much to leave out the equal sign either!
Yes. That is correct (from a pure probability perspective). Random().nextFloat() will generate a number between 0.0 and 1.0 exclusive. So as long as your probability is as a float in the range 0.0 and 1.0, this is the correct way of doing it.
You can read more of the exact nextFloat() documentation here.

Why the random function in java is always generating high values?

I am implementing a test data generator in java that uses to generate random values for java primitive types.
The range of possible parameters values is not limited. For example, if I want to generate a random integer or float I will consider all possible values (MAX_INT-MIN_INT). To do so, I am using stuff like :
Random().nextInt()
Random().nextLong()
Random().nextFloat()*Float.MAX_VALUE
Random().nextDouble()*Double.MAX_VALUE
And so on...
However, doing like this, I note that the generated values are always high (close to the max and low value of the parameter type). After 100000 iteration for example, the random operator didn't generate a value in the range [-1000 - 1000]. The same thing for floats, longs. etc,...
Can you give me an explanation of how the random operator is performing in Java? Why the generated values are always high when we consider all possible values of the Java type?
Thanks in advance.
Your preception of "high" and "low" is wrong.
The probability of a single value (assuming uniform distribution) to be in [-1000,1000] is 2001/(MAX_INT-MIN_INT), which is around 0.00000046.
This probability is extremely small, and thus also the expected number of "small" variables will be small.
In fact, in uniform distribution over [MIN_INT,MAX_INT], approximately half of the element will be positive - and half negative.
Similarly, only quarter of them will be between 0 to MAX_INT/2 (which is much higher than 1000 as you know).
If you want more "low" values, narrow yourself to smaller range of elements, or use non uniform distribution that is expected to generate more values closer to 0 (gaussian for exmaple).
Have a look at this code snippest:
int count1 = 0, count2=0;
for (int i = 0; i < 10000; i++) {
float x = genFloat(null);
if (x < 1E38 && x > 0) count1++;
if (x > Float.MAX_VALUE - 1E38) count2++;
}
System.out.println(count1);
System.out.println(count2);
It generates 10000 random floats, and checks how much are in [0,1E38], and how much are in [MAX-1E38,MAX]
Note that when talking about floats, the theoretical probability of each is ~1/(2*MAX) ~= 14.7%.
And as you can see, both "close to 0" and "close to MAX" in the same range has similar empirical number of variables produced in their ranges.

Normalized Iteration Count does not work. What am I doing wrong?

As you can see from the title, I'm busy programming a little programm for visualizing fractals in Java. Anybody who deals with fractals will come to the point where he/she searches for a solution to get these stupid "bands" away, when you just colour a pixel by the number of iterations it took to escape.
So I searched for a more advanced colouring algorithm, finding the "normalized iteration count". The formula I'm using is:
float loc = (float) 1 - Math.log(Math.log(c.abs())) / Math.log(2);
Everybody on the Internet is so happy about this algorithm, everybody uses it, everbody gets great results. Except me. I thought, this algorithm should provide a float between 0 and 1. But that doesn't happen. I did some calculations and came to the conclusion, that this algorithm only works for c.abs() >= Math.E && c.abs() <= Math.exp(2) (that is Math.E * Math.E).
In numbers this means, my input into this equation has to be between about 2.718 and 7.389.
But a complex number c is considerd to tend towards infinity when its magnitude gets greater than 2. But for any Input smaller than Math.E, I get a value greater than one. And for any number greater than Math.exp(2), it gets negative. That is the case if a complex number escapes really fast.
So please tell me: what am I doing wrong. I'm desperate.
Thanks.
EDIT:
I was wrong: the code I posted is correct, I just
1. used it the wrong way and so it didn't provide the right output.
2. had to set the bailout value of the mandelbrot/julia algorithm to 10, otherwise I would've got stupid bands again.
Problem solved!
As you've already discovered, you need to increase the bailout radius before smoothing will look right.
Two is the minimum length that a coordinate can have such that when you square it and add the initial value, it cannot result in a smaller length. If the previous length was 2.0, and you squared it, you'd have a length of 4.0 (pointing in whichever direction), and the most that any value of c could reduce that by is 2.0 (by pointing in precisely the opposite direction). If c were larger than that then it would start to escape right away.
Now, to estimate the fractional part of the number of iterations we look at the final |z|. If z had simply been squared and c not added to it, then it would have a length between 2.0 and 4.0 (the new value must be larger than 2.0 to bail out, and the old value must have been less than 2.0 to have not bailed out earlier).
Without c, taking |z|'s proportional position between 2 and 4 gives us a fractional part of the number of iterations. If |z| is close to 4 then the previous length must have been close to 2, so it was already close to bailing out in the previous iteration and the smoothed result should be close to the previous iteration count to represent that. If it's close to 2, then the previous iteration was further from bailing out, and so the smoothed result should be closer to the new iteration count.
Unfortunately c messes that up. The larger c is, the larger the potential error is in that simple relationship. Even if the old length was nearly at 2.0, it might have landed such that c's influence made it look like it must have been smaller.
Increasing the bailout mitigates the effect of adding c. If the bailout is 64 then the resulting length will be between 64 and 4096, and c's maximum offset of 2 has a proportionally smaller very impact on the result.
You have left out the iteration value, try this:
float loc = <iteration_value> + (float) 1 - Math.log(Math.log(c.abs())) / Math.log(2);
The iteration_value is the number of iterations which yielded c in the formula.

Categories