How to generate random values across many requests - java

I want to generate some random numbers for many Servlet requests.
The problem is if i use a new Random object in each servlet, the overall probability will be incorrect.
E.g. with around 10000+ reqeusts, i expect all random value should be evenly distributed within the range.

So why not use a global Random instance?
Or you can use ThreadLocalRandom which is faster. And it is kind of global because you cannot really create an instance of it. You can get an instance by calling ThreadLocalRandom.current(). In Java 7, it returns a per-thread instance. In Java 8, it is further optimized, it'll always return the same singleton.

It is a little bit complicated to get really random sequence using Random. Random is LCG-based with 2^48 period and you need to be very careful with seed. There is a way to generate single sequence using DataStore to keep your current value, but performance will be not very good because you will need to update value every time you generate new random number. It means that you will be able too reach 10-20 request/sec without memcahce and probably around 100 req/sec with memcache. Sharding will not be very helpful because you need to keep atomic seed value.
Algorithm will looks like:
Generate first Integer (to use long you will need additional processing. It is much simpler to generate int).
set seed = random <<< 16
Save seed to DataStore (do not forget about transaction)
On every next request (whole op should be in a single transcation):
4.1. Read seed from DataStore
4.2. Create new Random with your seed.
4.3. Generate new int.
4.4. Set seed = random <<< 16
4.5. Save seed to DataStore

Related

Unique random - srand in java ? (like in c++)

Currently I'm using a
Random rand = new Random();
myNumber = rand.nextInt((100-0)+1)+0;
and it's a pseudo-random, because I always get the same sequence of numbers.
I remember in c++ you could just use the ctime and use srand to make it unique. How do you do it in java ?
If you require a unique random in Java, use SecureRandom. This class provides a cryptographically strong random number generator (RNG), meaning it can be used when creating certificates and this class is slow compared to the other solutions.
If you require a predictable random that you can reset, use a Random, where you call .setSeed(number) to seed it to make the values predictable from that point.
If you want a pseudo-random that is random every time you start the program, use a normal Random. A random instance is seed by default by some hash of the current time.
For the best randomness in every solution, it is important to RE-USE the random instance. If this isn't done, most of the output it gives will be similar over the same timespan, and at the case of a thrown away SecureRandom, it will spend a lot of time recreating the new random.

What's the importance of using Random.setSeed?

When writing Java program, we use setSeed in the Random class. Why would we use this method?
Can't we just use Random without using setSeed? What is the main purpose of using setSeed?
One use of this is that it enables you to reproduce the results of your program in future.
As an example, I wanted to compute a random variable for each row in a database. I wanted the program to be reproducible, but I wanted randomness between rows. To do this, I set the random number seed to the primary key of each row. That way, when I ran the program again, I got the same results, but between rows, the random variable was pseudo random.
The seed is used to initialize the random number generator. A seed is used to set the starting point for generating a series of random numbers. The seed sets the generator to a random starting point. A unique seed returns a unique random number sequence.
This might be of help .
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator DRBG, is an algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence is not truly random in that it is completely determined by a relatively small set of initial values, called the PRNG's state, which includes a truly random seed.
I can see two reasons for doing this:
You can create a reproducible random stream. For a given seed, the same results will be returned from consecutive calls to (the same) nextX methods.
If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers
You feel, for some reason, that your seed is of a higher quality than the default source (which I'm guessing is derived from the current time on your PC).
A specific seed will always give the same sequence of "pseudo-random" numbers. So there are only 2^48 different sequences in Random because setSeed only uses 48-bits of the seed parameter! Besides setSeed, one may also use a constructor with a seed (e.g. new Random(seed)).
When setSeed(seed) or new Random(seed) are not used, the Random() constructor sets the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor.
Java reference for the above information: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html
In the ordinary case, don't use a seed. Just use the empty constructor Random() and don't call setSeed. This way you'll likely get different pseudo-random numbers each time the class is constructed and invoked.
For data dependent debugging, where you want to repeat the same pseudo-random numbers, use a specific seed. In this case, use Random(seed) or setSeed(seed).
For non-security critical uses, there's no need to worry whether the specific seed/sequence might be recognized and subsequent numbers predicted, because of the large range of seeds. However, "Instances of java.util.Random are not cryptographically secure. Consider instead using SecureRandom to get a cryptographically secure pseudo-random number generator for use by security-sensitive applications." source
Several others have mentioned reproducibility. Reproducibility is at the heart of debugging, you need to be able to reproduce the circumstances in which the bug occurred.
Another important use of reproducibility is that you can play some statistical games to reduce the variability of some estimates. See Wikipedia's Variance Reduction article for more details, but the intuition is as follows. Suppose you're considering two different layouts for a bank or a grocery store. You can't build them both and see which works better, so you use simulation. You know from queueing theory that the size of lines and delays customers experience are partly due to the layout, but also partly due to the variation in arrival times, demand loads, etc, so you use randomness in your two models. If you run the models completely independently, and find that the lines are bigger in layout 1 than in layout 2, it might be because of the layout or it might be because layout 1 just happened to get more customers or a more demanding mix of transactions due to the luck of the draw. However, if both systems use the exact same set of customers arriving at the same times and having the same transaction demands, it's a "fairer" comparison. The differences you observe are more likely to be because of the layout. You can accomplish this by reproducing the randomness in both systems - use the same seeds, and synchronize so that the same random numbers are used for the same purpose in both systems.

What would be considered a standard deviation boundry for java random?

I'm using java 6 random (java.util.Random,linux 64) to randomly decide between serving one version of a page to a second one (Normal A/B testing), technically i initialize the class once with the default empty constructor and it's injected to a bean (Spring) as a property .
Most of the times the copies of the pages are within 8%(+-) of each other but from time to time i see deviations of up to 20 percent , e.g :
I now have two copies that split : 680 / 570 is that considered normal ?
Is there a better/faster version to use than java random ?
Thanks
A deviation of 20% does seem rather large, but you would need to talk to a trained statistician to find out if it is statistically anomalous.
UPDATE - and the answer is that it is not necessarily anomalous. The statistics predict that you would get an outlier like this roughly 0.3% of the time.
It is certainly plausible for a result like this to be caused by the random number generator. The Random class uses a simple "linear congruential" algorithm, and this class of algorithms are strongly auto-correlated. Depending on how you use the random number, this could lead anomalies at the application level.
If this is the cause of your problem, then you could try replacing it with a crypto-strength random number generator. See the javadocs for SecureRandom. SecureRandom is more expensive than Random, but it is unlikely that this will make any difference in your use-case.
On the other hand, if these outliers are actually happening at roughly the rate predicted by the theory, changing the random number generator shouldn't make any difference.
If these outliers are really troublesome, then you need to take a different approach. Instead of generating N random choices, generate a list of false / true with exactly the required ratio, and then shuffle the list; e.g. using Collections.shuffle.
I believe this is fairly normal as it is meant to generate random sequences. If you want repeated patterns after certain interval, I think you may want to use a specific seed value in the constructor and reset the random with same seed after certain interval.
e.g. after every 100/500/n calls to Random.next.., reset the seed with old value using Random.setSeed(long seed) method.
java.util.Random.nextBoolean() is an approach for a standard binomial distribution, which has standard deviation of sqrt(n*p*(1-p)), with p=0.5.
So if you do 900 iterations, the standard deviation is sqrt(900*.5*.5) = 15, so most times the distribution would be in the range 435 - 465.
However, it is pseudo-random, and has a limited cycle of numbers it will go through before starting over. So if you have enough iterations, the actual deviation will be much smaller than the theoretical one. Java uses the formula seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1). You could write a different formula with smaller numbers to purposely obtain a smaller deviation, which would make it a worse random number generator, but better fitted for your purpose.
You could for example create a list of 5 trues and 5 falses in it, and use Collections.shuffle to randomize the list. Then you iterate over them sequentially. After 10 iterations you re-shuffle the list and start from the beginning. That way you'll never deviate more than 5.
See http://en.wikipedia.org/wiki/Linear_congruential_generator for the mathematics.

Are random Numbers really unPredictable in Java?

there is some ways to generate random Numbers in java
one of them is this:
Random rand=new Random();
int randomInteger=rand.nextInt();
now my question is this: can we predict next random Number?
edited after 4 answers:
my real problem is this:
I'm working on a Snake Game( nibbles in Linux) and I'm programing the snake to move, now I want to know if it's possible to Predict the next place that the apple will appear.
is it possible?
You can not only predict it, but know it absolutely, if you know exactly what System.currentTimeMillis would return when you called new Random(). That's because new Random() is a shortcut for new Random(System.currentTimeMillis()), which sets the seed of a pseudo-random generator. (Well, that's what it did when I last looked at the source; the docs don't actually say it has to use that.) if you know the seed that new Random() used. Pseudo-random generators are deterministic, if you know the seed, you know the sequence. Update: Looking at the Java 6 source [I don't have Java 7 source handy], the default seed is a combination of a seed number that gets incremented on use, plus System.nanoTime. So you'd need to know both of those. Raises the bar.
If you don't know the exact value of System.currentTimeMillis() as of when new Random() occurs the seed used by new Random(), then it's very difficult indeed to predict what the next value will be. That's the point of pseudo-random generators. I won't say it's impossible. Just really, really hard to do with any degree of confidence.
Update after question edit: It's possible, but very, very hard, and in terms of doing so in a way that would allow a player to improve their score in the game, I'd say you can ignore it.
The "random" numbers generated by the Random class are generated algorithmically, and as such are really pseudo-random numbers. So yes, in theory, you can predict the next number. Knowing one number that Random has produced, though, or even a series of numbers, isn't enough information to predict the next number; you would also need to know the seed that the Random object is using, and you would need to follow its pseudo-random number generation algorithm.
If you would like a repeatable set of "random" numbers, you can specify your own seed when creating an instance of Random, e.g.
Random rand = new Random(1234); // Replace 1234 with any value you'd like
Every time you instantiate Random with the same seed, you'll get the same series of numbers. So, for example, you could write a small command-line program that instantiates Random with some seed and prints a list of the numbers it returns, and then instantiate Random with the same seed in your code. Then you would know which numbers your code will receive and in what order. That's very handy for debugging.
There could be NO really random numbers on deterministic devices like computer. But.
If you want a cryptographically secure random number, use SecureRandom: http://docs.oracle.com/javase/6/docs/api/java/security/SecureRandom.html
Random uses a deterministic algorithm:
If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers.
http://docs.oracle.com/javase/6/docs/api/java/util/Random.html#Random
Essentially, if you know the seed of the random number generator, you can predict the entire sequence with certainty. If you don't, no matter how many numbers you generate, there's no way to accurately predict the next one.
Note that if you're relying on the numbers being unpredictable for security, you should be using java.secure.SecureRandom rather than java.util.Random.
As others answer this question, it is possible to predict randomness of java.util.Random if you know the starting seed.
If you are working on a linux like system, take a look at these special files /dev/random and dev/urandom. Reads from these files are said to return "better" random numbers, the randomness depends on keyboard activity, mouse movement and some other exotic factors.
See this Wikipedia page for details. This page also says equivalent APIs exist in Windows.

how rand (timestamp) works if running on webserver?

while studying some security things, there was a question that one can guess the generation of some sequence for rand (timestamp) running in webserver. He said that our first goal should crash the server (assuming that server will get up in 1 min), we can sync our generator with server and then rand (timestamp) generated by the webserver could be same with our generator.
I am confuse, if we have a function rand (timestamp) would not it be depend on system timestamp or on server "up time" stamp?
P.S:
Asking a general question - its not dependent if it is in JAVA/PHP/ASP. Just asking how webserver/compiler work for such code?
May be its vague question but i would like to make clarification.
The default behaviour of many implementions of rand(), is to use the system time as a seed if a seed value is not supplied. Even if that is not the default behaviour, it is almost guaranteed that an application will pass the system time to srand() as a seed to randomise the sequence.
So, if you know the precise system time, you can generate the same sequence that would be produced from the remote system calling rand(). Several years ago, an online casino was attacked using this random sequence prediction technique.
The solution is two-fold: derive the seed from a non-predictable hardware source (there are commercial units to this) AND use the longest pseudo-number generator available.
There have been many questions on SO on the topic of hardware generators, for instance:
What Type of Random Number Generator is Used in the Gaming Industry?
Alternative Entropy Sources
rand() returns a pseudo random number. The pseudo random number generator is typically initialized with a seed. If two instances of the pseudo random generator are initialized with the same seed, then they will produce the same sequence on successive calls to rand.
By crashing the server, you are forcing the application to initialize the pseudo random generator with the current unix timestamp since that is what it uses as seed. An attacker can easily guess the seed/timestamp in a few attempts (server may use ntp which makes it even easier).
That is why it is not a good idea to use the unix timestamp as the seed. In any case for cryptographic uses typically the random number generator that comes with a crypto library is used. For example Openssl has RAND_bytes that makes available cryptographically strong pseudo random bytes. On many unix systems this pseudo random number generator is automatically seeded with bytes from /dev/urandom. See http://www.openssl.org/docs/crypto/RAND_add.html for more details.

Categories