This question already has answers here:
Java normal distribution
(2 answers)
Closed 7 years ago.
I'm experimenting with AI for an extremely simple simulation game.
The game will contain people (instances of objects with random properties) who have a set amount of money to spend.
I'd like the distribution of "wealth" to be statistically valid.
How can I generate a random number (money) which adheres to a standard deviation (e.g. mean:50, standard deviation: 10), whereby a value closer to the mean is more likely to be generated?
I think you're focusing on the wrong end of the problem. The first thing you need to do is identify the distribution you want to use to model wealth. A normal distribution with a mean of 50 and standard deviation of 10 nominally meets your needs, but so does a uniform distribution in the range [32.67949, 67.32051]. There are lots of statistical distributions that can have the same mean and standard deviation but which have completely different shapes, and it is the shape that will determine the validity of your distribution.
Income and wealth turn out to have very skewed distributions - they are bounded below by zero, while a few people have such large amounts compared to the rest of us that they drag the mean upward by quite noticeable amounts. Consequently, you don't want to use a naive distribution choice such as uniform or Gaussian, or anything else that is symmetric or can dip into negative territory. Using an exponential would be far more realistic, but still may not be sufficiently extreme to capture actual wealth distribution we see in the real world.
Once you've picked a distribution, there are many software libraries or sources of info that will help you generate values from that distribution.
Generating random numbers is a vast topic. But since you said it's a simple simulation, here's a simple approach to get going:
Generate several (say n) random numbers uniformly distributed on (0, 1). The built-in function Math.random can supply those numbers.
Add up those numbers. The sum has a distribution which is approximately normal, with mean = n/2 and standard deviation = sqrt(n)/sqrt(12). So if you subtract n/2, and then divide by sqrt(n)/sqrt(12), you'll have something which is approximately normal with mean 0 and standard deviation 1. Obviously if you pick n = 12 all you have to do is subtract 6 from the sum and you're done.
Now to get any other mean and standard deviation, just multiply by the standard deviation you want, and add the mean you want.
There are many other ways to go about it, but this is perhaps the simplest. I assume that's OK given your description of the problem.
Related
I'm creating a Java application that needs to randomly generate numbers with probabilities. These float numbers (or doubles doesn't change much) must be from 0 to 100 where 0 and 100 have the lowest probability of coming out while 50 (which is the middle) is the one with the highest probability... practically, moving away from the center the rarity that comes out that number is always greater until it becomes almost impossible. For example the number 99.9 comes out 1 time in 5 billion, but as I said it is just an example so the rarity of the numbers must be established by the function. Basically I would like to say that the closer you get to 100 or 0, the rarity tends to infinity.
However, I would like it to be a function with a min parameter and a max parameter to make it more versatile.
(Sorry if the question is not very clear but i'm not native and i'm still learning english...)
Perhaps you could use Random's nextGaussian() method which generates a random number based on the default mean of 0.0 and standard deviation of 1.0. In your case, I believe the mean would be 50, and you could calculate the standard deviation so that it fits your requirements. You could use this link: Java normal distribution in order to help answer your question.
Docs for Random.nextGaussian().
I would also suggest looking into normal distributions because I believe the match what you are asking for.
Hope that helped!
Goal
I would like to sample from a bi-variate uniform distributions with specified correlation coefficient in java.
Question
What method can I use to implement such multivariate uniform distribution?
or
Is there an existing package that would implement such thing so that I don't have to reinvent the wheel?
What I've got so far
The packages mvtnorm in R allow to sample from a multivariate normal distribution with specified correlation coefficients. I thought that understanding their method may help me out either by doing something similar with uniform distributions or by repeating their work and using copulas to transform the multivariate normal into a multivariate uniform (as I did in R there).
The source code is written in Fortran and I don't speak Fortran! The code is based on this paper by Genz and Bretz but it is too math heavy for me.
I have an idea. Typically, you generate U(0,1) by generating, say, 32 random bits and dividing by 232, getting back one float. For two U(0,1) you generate two 32bit values, divide, get two floats back. So far so good. Such bi-variate generator would be uncorrelated, very simple to check etc
Suppose you build it your bi-variate generator in following way. Inside, you get two random 32bit integers, and then produce two U(0,1) with shared parts. Say, you take 24bits from first integer and 24bits for second integer, but upper (or lower, or middle, or ...) 8bits would be the same (taken from first integer and copied to second) for both of them.
Clearly, those two U(0,1) would be correlated. We could write them as
U(0,1)0 = a0 + b
U(0,1)1 = a1 + b
I omit some coefficients etc for simplicity. Each one is U(0,1) with mean of 1/2 and variance of 1/12. Now you have to compute Pearson correlation as
r = ( E[U(0,1)0 U(0,1)1] - 1/4 ) / sqrt(1/12)2
Using expansion above it should be easy after some algebra to compute r and compare with one you want. You may vary size of the correlated part b, as well as its position (high bits, low bits, somewhere in the middle) to fit desired r.
Realistically speaking, there should be infinite possibilities to have the same r but different sampling code and different bi-variate distributions. You might want to add more constrains in the future
I need to implement the calculation of some special polynomials in Java (the language is not really important). These are calculated as a weighted sum of a number of base polynomials with fixed coefficients.
Each base polynomial has 2 to 10 coefficients and there are typically 10 base polynomials considered, giving a total of, say 20-50 coefficients.
Basically the calculation is no big deal but I am worried about typos. I only have a printed document as a template. So i would like to implement unit tests for the calculations. The issue is: How do I get reliable testing data. I do have another software that is supposed to calculate these functions but the process is complicated and also error prone - I would have to scale the input values, go through a number of menu selections in the software to produce the output and then paste it to my testing code.
I guess that there is no way around using the external software to generate some testing data, but maybe you have some recommendations for making this type of testing procedure safer or minimize the required number of test cases.
I am also worried about providing suitable input values: Depending on the value of the independent variable, certain terms will only have a tiny contribution to the output, while for other values they might dominate.
The types of errors I expect (and need to avoid) are:
Typos in coefficients
Coefficients applied to wrong power (i.e. a_7*x^6 instead of a_7*x^7 - just for demonstration, I am not calculating this way but am using Horner's scheme)
Off-by one errors (i.e. missing zero order or highest order term)
Since you have a polynomial of degree 10, testing at 11 distinct points should give certainty.
However, already a test at one well-randomized point, x=1.23004 to give an idea (away from small fractions like 2/3, 4/5), will with high probability show a difference if there is an error, because it is unlikely that the difference between the wrong and the true polynomial has a root at exactly this place.
I have a time series of recoreded frequencies, from which I would like to calculate secondly means. However the sample rate is not constant, which means that a simple arithmetic mean is wrong. What I would actually like to compute is the integral of the step function (described by the timeseries) within each secondly interval.
Consider for example this time series:
08:11:23.400 -> 49.9 Hz
08:11:24.200 -> 50.1 Hz
08:11:24.600 -> 50.15 Hz
08:11:24.800 -> 50.05 Hz
08:11:25.100 -> 49.95 Hz
The arithmetic mean of the second 08:11:24.000 - 08:11:25.000 would be (50.1 + 50.15 + 50.05)/3 = 50.1. But this is not the mean fequency measured in that second. It is instead:
(200*49.9 + 400*50.1 + 200*50.15 + 200*50.05)/1000 = 50.06, because the measured frequencies were true for different amounts of time.
This is the calculation of a weighted mean (with the hold times as weights) or equivalently the calculation of the integral of the step function (and then deviding by the time).
First of all: Is there a name for this specific calculation? It seems a rather standard computation on time series to me. Not knowing a name for this makes it hard to google for it.
Second: Which java library supports such a calculation? I would like to avoid implementing this by myself. I refuse to believe that there is no good standard java library offering this. I was looking into the apache commons math library but without any luck (but again: maybe I'm just missing the correct term to look for).
I am not sure the formula 200*49.9 + 400*50.1 + ... is correct. It implies that the frequency 49.9 is in effect from 08:11:23.400 to 08:11:24.200, as if the frequency sensor meters future frequency. I would rather think that it meters mean past frequency. Then, is the frequency really a step function? Or is a saw-tooth function closer to reality? Or even a smooth function, reconstructed with splines?
As a result, I would recommend to compute the integral by yourself, and be ready to change the calculation formula. As for bugs, you equally can make errors while choosing function from a library, and when setting its parameters.
There are not many libraries in Java which can do this properly. But the basic thing you are looking for is digital signal processing.
Here's a similar question: Signal processing library in Java?
I'm using java 6 random (java.util.Random,linux 64) to randomly decide between serving one version of a page to a second one (Normal A/B testing), technically i initialize the class once with the default empty constructor and it's injected to a bean (Spring) as a property .
Most of the times the copies of the pages are within 8%(+-) of each other but from time to time i see deviations of up to 20 percent , e.g :
I now have two copies that split : 680 / 570 is that considered normal ?
Is there a better/faster version to use than java random ?
Thanks
A deviation of 20% does seem rather large, but you would need to talk to a trained statistician to find out if it is statistically anomalous.
UPDATE - and the answer is that it is not necessarily anomalous. The statistics predict that you would get an outlier like this roughly 0.3% of the time.
It is certainly plausible for a result like this to be caused by the random number generator. The Random class uses a simple "linear congruential" algorithm, and this class of algorithms are strongly auto-correlated. Depending on how you use the random number, this could lead anomalies at the application level.
If this is the cause of your problem, then you could try replacing it with a crypto-strength random number generator. See the javadocs for SecureRandom. SecureRandom is more expensive than Random, but it is unlikely that this will make any difference in your use-case.
On the other hand, if these outliers are actually happening at roughly the rate predicted by the theory, changing the random number generator shouldn't make any difference.
If these outliers are really troublesome, then you need to take a different approach. Instead of generating N random choices, generate a list of false / true with exactly the required ratio, and then shuffle the list; e.g. using Collections.shuffle.
I believe this is fairly normal as it is meant to generate random sequences. If you want repeated patterns after certain interval, I think you may want to use a specific seed value in the constructor and reset the random with same seed after certain interval.
e.g. after every 100/500/n calls to Random.next.., reset the seed with old value using Random.setSeed(long seed) method.
java.util.Random.nextBoolean() is an approach for a standard binomial distribution, which has standard deviation of sqrt(n*p*(1-p)), with p=0.5.
So if you do 900 iterations, the standard deviation is sqrt(900*.5*.5) = 15, so most times the distribution would be in the range 435 - 465.
However, it is pseudo-random, and has a limited cycle of numbers it will go through before starting over. So if you have enough iterations, the actual deviation will be much smaller than the theoretical one. Java uses the formula seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1). You could write a different formula with smaller numbers to purposely obtain a smaller deviation, which would make it a worse random number generator, but better fitted for your purpose.
You could for example create a list of 5 trues and 5 falses in it, and use Collections.shuffle to randomize the list. Then you iterate over them sequentially. After 10 iterations you re-shuffle the list and start from the beginning. That way you'll never deviate more than 5.
See http://en.wikipedia.org/wiki/Linear_congruential_generator for the mathematics.