Do an action with some probability in java - java

In Java, I am trying to do an action with a probability p. p is a float variable in my code. I came up with this way of doing it:
if( new Random().nextFloat() < p)
do action
I wanted to confirm if this is the correct way of doing it.

There is a TL;DR at the end.
From javadocs for nextFloat() (emphasis by me):
public float nextFloat()
Returns the next pseudorandom, uniformly distributed float value
between 0.0 and 1.0 from this random number generator's sequence.
If you understand what uniform distribution is, knowing this about nextFloat() is going to be enough for you. Yet I am going to explain a little about uniform distribution.
In uniform distribution, U(a,b) each number in the interval [a,b], and also all sub-intervals of the same length within [a,b] are equally probable, i.e. they have equal probability.
In the figure, on the left is the PDF, and on the right the CDF for uniform distribution.
For uniform distribution, the probability of getting a number less than or equal to n, P(x <= n) from the distribution is equal to the number itself (look at the right graph, which is cumulative distribution function for uniform distribution). That is, P(x <= 0.5) = 0.5, P(x <= 0.9) = 0.9. You can learn more about uniform distribution from any good statistics book, or some googling.
Fitting to your situation:
Now, probability of getting a number less than or equal to p generated using nextFloat() is equal to p, as nextFloat() returns uniformly distributed number. So, to make an action happen with a probability equal to p all you have to do is:
if (condition that is true with a probability p) {
do action
}
From what is discussed about nextFloat() and uniform distribution, it turns out to be:
if(randObj.nextFloat() <= p) {
do action
}
Conclusion:
What you did is almost the right way to do what you intended. Just adding the equal sign after < is all that's needed, and it doesn't hurt much to leave out the equal sign either!
P.S.: You don't need to create a new Random object each time in your conditional, you can create one, say randObj before your loop, and then invoke its nextFloat() method whenever you want to generate a random number, as I have done in my code.
Comment by pjs:
Take a look at the comment on the question by pjs, which is very important and well said. I quote:
Do not create a new Random object each time, that's not how PRNGs are
meant to be used! A single Random object provides a sequence of values
with good distributional properties. Multiple Random objects created
in rapid succession are 1) computationally expensive, and 2) may have
highly correlated initial states, thus producing highly correlated
outcomes. Random actually works best when you create a single instance
per program and keep drawing from it, unless you really really know
what you're doing and have specific reasons for using correlation
induction strategies.
TL;DR
What you did is almost the right way to do it. Just adding the equal sign after < (to make it <=) is all that's needed, and it doesn't hurt much to leave out the equal sign either!

Yes. That is correct (from a pure probability perspective). Random().nextFloat() will generate a number between 0.0 and 1.0 exclusive. So as long as your probability is as a float in the range 0.0 and 1.0, this is the correct way of doing it.
You can read more of the exact nextFloat() documentation here.

Related

Smart algorithm to randomize a Double in range but with odds

I use the following function to generate a random double in a specific range :
nextDouble(1.50, 7.00)
However, I've been trying to come up with an algorithm to make the randomization have higher probability to generate a double that is close to the 1.50 than it is to 7.00. Yet I don't even know where it starts. Anything come to mind ?
Java is also welcome.
You should start by discovering what probability distribution you need. Based on your requirements, and assuming that random number generations are independent, perhaps Poisson distribution is what you are looking for:
a call center receives an average of 180 calls per hour, 24 hours a day. The calls are independent; receiving one does not change the probability of when the next one will arrive. The number of calls received during any minute has a Poisson probability distribution with mean 3: the most likely numbers are 2 and 3 but 1 and 4 are also likely and there is a small probability of it being as low as zero and a very small probability it could be 10.
The usual probability distributions are already implemented in libraries e.g. org.apache.commons.math3.distribution.PoissonDistribution in Apache Commons Math3.
I suggest to not think about this problem in terms of generating a random number with irregular probability. Instead, think about generating a random number normally in a some range, but then map this range into another one in non-linear way.
Let's split our algorithm into 3 steps:
Generate a random number in [0, 1) range linearly (so using a standard random generator).
Map it into another [0, 1) range in non-linear way.
Map the resulting [0, 1) into [1.5, 7) linearly.
Steps 1. and 3. are easy, the core of our algorithm is 2. We need a way to map [0, 1) into another [0, 1), but non-linearly, so e.g. 0.7 does not have to produce 0.7. Classic math helps here, we just need to look at visual representations of algebraic functions.
In your case you expect that while the input number increases from 0 to 1, the result first grows very slowly (to stay near 1.5 for a longer time), but then it speeds up. This is exactly how e.g. y = x ^ 2 function looks like. Your resulting code could be something like:
fun generateDouble(): Double {
val step1 = Random.nextDouble()
val step2 = step1.pow(2.0)
val step3 = step2 * 5.5 + 1.5
return step3
}
or just:
fun generateDouble() = Random.nextDouble().pow(2.0) * 5.5 + 1.5
By changing the exponent to bigger numbers, the curve will be more aggressive, so it will favor 1.5 more. By making the exponent closer to 1 (e.g. 1.4), the result will be more close to linear, but still it will favor 1.5. Making the exponent smaller than 1 will start to favor 7.
You can also look at other algebraic functions with this shape, e.g. y = 2 ^ x - 1.
What you could do is to 'correct' the random with a factor in the direction of 1.5. You would create some sort of bias factor. Like this:
#Test
void DoubleTest() {
double origin = 1.50;
final double fiarRandom = new Random().nextDouble(origin, 7);
System.out.println(fiarRandom);
double biasFactor = 0.9;
final double biasedDiff = (fiarRandom - origin) * biasFactor;
double biasedRandom = origin + biasedDiff;
System.out.println(biasedRandom);
}
The lower you set the bias factor (must be >0 & <= 1), the stronger your bias towards 1.50.
You can take a straightforward approach. As you said you want a higher probability of getting the value closer to 1.5 than 7.00, you can even set the probability. So, here their average is (1.5+7)/2 = 4.25.
So let's say I want a 70% probability that the random value will be closer to 1.5 and a 30% probability closer to 7.
double finalResult;
double mid = (1.5+7)/2;
double p = nextDouble(0,100);
if(p<=70) finalResult = nextDouble(1.5,mid);
else finalResult = nextDouble(mid,7);
Here, the final result has 70% chance of being closer to 1.5 than 7.
As you did not specify the 70% probability you can even make it random.
you just have to generate nextDouble(50,100) which will give you a value more than or equal 50% and less than 100% which you can use later to apply this probability for your next calculation. Thanks
I missed that I am using the same solution strategy as in the reply by Nafiul Alam Fuji. But since I have already formulated my answer, I post it anyway.
One way is to split the range into two subranges, say nextDouble(1.50, 4.25) and nextDouble(4.25, 7.0). You select one of the subranges by generating a random number between 0.0 and 1.0 using nextDouble() and comparing it to a threshold K. If the random number is less than K, you do nextDouble(1.50, 4.25). Otherwise nextDouble(4.25, 7.0).
Now if K=0.5, it is like doing nextDouble(1.50, 7). But by increasing K, you will do nextDouble(1.50, 4.25) more often and favor it over nextDouble(4.25, 7.0). It is like flipping an unfair coin where K determines the extent of the cheating.

How can i generate this special type of probability?

I'm creating a Java application that needs to randomly generate numbers with probabilities. These float numbers (or doubles doesn't change much) must be from 0 to 100 where 0 and 100 have the lowest probability of coming out while 50 (which is the middle) is the one with the highest probability... practically, moving away from the center the rarity that comes out that number is always greater until it becomes almost impossible. For example the number 99.9 comes out 1 time in 5 billion, but as I said it is just an example so the rarity of the numbers must be established by the function. Basically I would like to say that the closer you get to 100 or 0, the rarity tends to infinity.
However, I would like it to be a function with a min parameter and a max parameter to make it more versatile.
(Sorry if the question is not very clear but i'm not native and i'm still learning english...)
Perhaps you could use Random's nextGaussian() method which generates a random number based on the default mean of 0.0 and standard deviation of 1.0. In your case, I believe the mean would be 50, and you could calculate the standard deviation so that it fits your requirements. You could use this link: Java normal distribution in order to help answer your question.
Docs for Random.nextGaussian().
I would also suggest looking into normal distributions because I believe the match what you are asking for.
Hope that helped!

Why does Math.tan() return different values for different levels of precision?

I'm in the middle of testing a program that reads mathematical input, and calculates the answer according to order of operations. I've come across a problem. When calculating the tangent of Math.PI / 2, it returns the value 1.633123935319537E16.
However, somewhere along the way in my program, that value can get shortened to 1.5707964, instead of 1.5707963267948966. When I call Math.tan(1.5707964), it returns a value of -1.3660249798894601E7.
I'm not asking for help in figuring out the shortening, but rather, I want to understand the divergent answers, and any other things I should watch out for when calculating trigonometric functions.
I want to understand the divergent answers
tan(π/2) is undefined
tan(π/2 - tiny amount) is very large in magnitude and positive
tan(π/2 + tiny amount) is very large in magnitude and negative
The numbers that you are passing in are not exactly π/2:
1.5707963267948966192313216... is slightly more precise value of π/2 (calculated here; more decimal places aren't necessary to illustrate the point).
1.5707963267948966 is just smaller
1.5707964 is just larger.
To illustrate, here is graph from Math Is Fun:

apache.commons.math distributions, get probability greater than 1

I use apache commons math library for MixtureMultivariateNormalDistribution. And it appears that sometimes density() function returns me values that are very much greater than 1. What's it?
My mixture coefficients sum to 1 and my covariance matrices of each NormalDistribution are fine (since MixtureMultivariateNormalDistribution constructor doesn't throw any exception when i create the object) and means are fine too. Dimension is 39. And i have 3 normal distribution in a mixture. Is it some bug or what? Did anyone meet this problem?
Thanks
The cumulative distribution function is indeed something between 0 and 1. The DENSITY function, on the other hand, only has to be greater than or equal to zero; in particular, it is allowed to be greater than 1, so long as its integral (the cdf) is equal to 1.
For example, consider a single Gaussian bump centered at the origin. Its density function is exp(-1/2*x^2/sigma^2)/sigma/sqrt(2*pi). Its greatest value, at the origin, is 1/sigma/sqrt(2*pi), so for sigma < 1/sqrt(2*pi) (approximately 0.399), the peak is greater than 1.
If you are working with mixture components that have sufficiently small variance, the density can be greater than 1 in neighborhoods of their means.
"Returns the probability density function (PDF) of this distribution evaluated at the specified point x. In general, the PDF is the derivative of the cumulative distribution function. If the derivative does not exist at x, then an appropriate replacement should be returned, e.g. Double.POSITIVE_INFINITY, Double.NaN, or the limit inferior or limit superior of the difference quotient."
Does that help? I presume you're getting the limit superior of the difference quotient. Unfortunately I don't know what that means.

Random level function in skip list

I am looking at skip list implementation in Java , and I am wondering the purpose of the following method:
public static int randomLevel() {
int lvl = (int)(Math.log(1.-Math.random())/Math.log(1.-P));
return Math.min(lvl, MAX_LEVEL);
}
And what the difference between the above method and
Random.nextInt(6);
Can anyone explain that? Thanks.
Random.nextInt should provide a random variable whose probability distribution is (approximately) a discrete uniform distribution over the interval [0, 6).
You can learn more about this here.
http://puu.sh/XMwn
Note that internally Random uses a linear congruential generator where m = 2^48, a = 25214903917, and c = 11.
randomLevel instead (approximately) uses a geometric distribution where p = 0.5. You can learn more about the distribution here.
http://puu.sh/XMwT
Essentially, randomLevel returns 0 with probability 0.5, 1 with 0.25, 2 with 0.125, etc. until 6 with 0.5^7 i.e. *0.0078125** -- far different than the ~0.14 from Random.nextInt.
Now the importance of this is that a skip list is an inherently probabilistic data structure. By utilizing multiple sparse levels of linked lists, they can achieve average runtime performance of O(log n) search -- similar to a balanced binary search tree, but less complex and using less space. Using a uniform distribution here would not be appropriate, seeing how to as higher levels are less densely populated in comparison to lower ones (note: below, the levels grow downward) -- which is necessary for the fast searches.
Just like the link says...
"This gives us a 50% chance of the random_level() function returning 0, a 25% chance of returning 1, a 12.5% chance of returning 2 and so on..." The distribution is therefore not even. However, Random.nextInt() is. There is an equal likelihood that any number between 0 and 5 will be selected.
I haven't looked at the full implementation, but what probably happens is that randomLevel() us used to select a number, say n. Then, the element that needs to be added to the skiplist will have pointers 0, 1,...,n. You can think of each level as a separate list.
Why use a distribution like this? Well an even distribution will require too much memory for the benefit that it will have. By reducing the chance using a geometric distribution, the "sweet" spot is attained. Now the advantage of obtaining a value quickly, with a smaller memory footprint is realised.

Categories