Random vs Shuffle

Random vs Shuffle - java

I have an ArrayList which I want to grab a random value from. To do this I thought of two simple methods:
Method 1: Uses Random to generate a random number between 0 and the size of the ArrayList and then use that number for arrayList.get(x)
Method 2: Use arrayList.shuffle() and then arrayList.get(0).
Is one method preferable to the other in terms of randomness, I know it is impossible for one to be truly random but I want the result to be as random as possible.
EDIT: I only need one value from that ArrayList

It depends on the context.
Benefits of shuffling:
Once a shuffle, then just sequential grabbing
No repeated values
Benefits of randomizing:
Great for a small amount of values
Can repeat values

To answer your direct question: neither one of these is "more random" than the other. The results of the two methods are statistically indistinguishable. After all, the first step in shuffling an array is (basically) picking a number between 0 and N-1 (where N is the length of the array) and moving that element into the first position.
That being said, there are valid reasons to pick one or the other, depending on your specific needs. Jeroen's answer summarizes those well.

I would say the random number option is the best (Method 1).
Shuffling the objects takes up extra resources, because it has to move all of the objects around in the ArrayList, where generating a random number gives you the same effect without needing to use CPU time to cycle through array elements!
Also, be sure to generate a number between 0 and the size MINUS ONE. :)

If you just want one random selection, use Method 1. If you want to get a sequence of random selections with no duplicates, use Method 2.

Randomness depends on two factors, the algorithm (a.k.a the "generator") and the seed.
What generators does each method use?
The second overload of Collections.Shuffle() actually accepts a seeded Random. If you choose the default overload, it uses a Random anyway, as specified in the Javadoc. You're using a Random no matter what.
Are the generators seeded differently?
Another look at Random in the Javadoc shows that it is seeded by with some time value unless you specify a seed. Shuffle doesn't specify a time if you look at the implementation. You're using the default seed unless you specify one.
Because both use Random and both use the same default seed, they are equally random.
Which one has a higher time complexity?
Shuffling a list is O(n) (the Javadoc for Shuffle actually specifies linear time). The time complexity of Random.nextInt() is O(1). Obviously, the latter is faster in a case where only one value is needed.

Related

Collections sort method vs iteration

I was working on a playing cards shuffle problem and found two solutions for it.
The target is to shuffle all 52 playing cards stored in a array as Card objects. Card class has id and name associated to it.
Now, one way is to iterate using for loop and then with the help of a temp card object holder and a random number generator, we can swap two objects. This continues until we reach half of the cards.
Another way is to implement comparable overriding compareto method with a random generator number, so we get a random response each time we call the method.
Which way is better you think?

You should not do it by sorting with a comparator that returns random results, because then those random results can be inconsistent with one another (e.g., saying that a<b<c<a), and this can actually result in the distribution of orderings you get being far from uniform. See e.g. this demonstration by Mike Bostock. Also, it takes longer, not that that should matter for shuffling 52 objects.
The standard way to do it does involve a loop, but your description sounds peculiar and I suspect what you have in mind may also not produce the ideal results. (If the question is updated to make it clearer what the "iterate using for loop" approach is meant to be, I will update this.)
(There is a way to get good shuffling by sorting: pair each element up with a random number -- e.g., a random floating-point number in the range 0..1 -- and then sort using that number as key. But this is slower than Fisher-Yates and requires extra memory. In lower-level languages it generally also takes more code; in higher-level languages it can be terser; I'd guess that for Java it ends up being about equal.)
[EDITED to add:] As Louis Wasserman very wisely says in comments, when your language's standard library has a ready-made function to do a thing, you should generally use it. Unless you're doing this for, e.g., a homework assignment that requires you to find and implement an algorithm to solve the problem.

First of all, the comparator you've described wont work. More on this here. TLDR: comparsions must be reproducible, so if your comparator says that a is less then b next time when comparing b to a it should return "greater", not a random value. The same for Comparable.
If I were you, I'd rather use Collections#shuffle method, which "randomly permutes the specified list using a default source of randomness. All permutations occur with approximately equal likelihood". It's always better to rely on someone's code, then write your own, especially if it is in a standard library.

At what point is True randomness lost? True random number as a java.util.Random seed?

Let's assume I have a reliably truly random source of random numbers, but it is very slow. It only give me a few hundreds of numbers every couple of hours.
Since I need way more than that I was thinking to use those few precious TRN I can get as seeds for java.util.Random (or scala.util.Random). I also always will pick a new one to generate the next random number.
So I guess my questions are:
Can the numbers I generate from those Random instance in Java be considered truly random since the seed is truly random?
Is there still a condition that is not met for true randomness?
If I keep on adding levels at what point will randomness be lost?
Or (as I thought when I came up with it) is truly random as long as the stream of seeds is?
I am assuming that nobody has intercepted the stream of seeds, but I do not plan to use those numbers for security purposes.

For a pseudo random generator like java.util.Random, the next generated number in the sequence becomes predictable given only a few numbers from the sequence, so you will loose your "true randomness" very fast. Better use one of the generators provided by java.security.SecureRandom - these are all strong random generators with an VERY long sequence length, which should be pretty hard to be predicted.

Our java Random gives uniformly spread random numbers. That is not true randomness, which may yield five times the same number.
Furthermore for every specific seed the same sequence is generated (intentionally). With 2^64 seeds in general irrelevant. (Note hackers could store the first ten numbers of every sequence; thereby rapidly catching up.)
So if you at large intervals use a truely random number as seed, you will get a uniform distribution during that interval. In effect not very different from not using the true randomizers.
Now combining random sequences might reduce the randomness. Maybe translating the true random number to bytes, and xor-ing every new random number with another byte, might give a wilder variance.
Please do not take my word only - I cannot guarantee the mathematical correctness of the above. A math/algorithmic forum might give more info.

When you take out more bits, than you have put in they are for sure no longer truly random. The break point may even occur earlier if the random number generator is bad. This can be seen by considering the entropy of the sequences. The seed value determines the sequence completely, so there are at most as many sequences as seed values. If they are all distinct, the entropy is the same as that of the seeds (which is essentially the number of seed bits, assuming the seed is truly random).
However, if different seeds lead to the same pseudo random sequence the entropy of the sequences will be lower than that of the seeds. If we cut off the sequences after n bits, the entropy may be even lower.
But why care if you don't use it for security purposes? Are you sure the pseudo random numbers are not good enough for your application?

What's the importance of using Random.setSeed?

When writing Java program, we use setSeed in the Random class. Why would we use this method?
Can't we just use Random without using setSeed? What is the main purpose of using setSeed?

One use of this is that it enables you to reproduce the results of your program in future.
As an example, I wanted to compute a random variable for each row in a database. I wanted the program to be reproducible, but I wanted randomness between rows. To do this, I set the random number seed to the primary key of each row. That way, when I ran the program again, I got the same results, but between rows, the random variable was pseudo random.

The seed is used to initialize the random number generator. A seed is used to set the starting point for generating a series of random numbers. The seed sets the generator to a random starting point. A unique seed returns a unique random number sequence.
This might be of help .
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator DRBG, is an algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence is not truly random in that it is completely determined by a relatively small set of initial values, called the PRNG's state, which includes a truly random seed.

I can see two reasons for doing this:
You can create a reproducible random stream. For a given seed, the same results will be returned from consecutive calls to (the same) nextX methods.
If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers
You feel, for some reason, that your seed is of a higher quality than the default source (which I'm guessing is derived from the current time on your PC).

A specific seed will always give the same sequence of "pseudo-random" numbers. So there are only 2^48 different sequences in Random because setSeed only uses 48-bits of the seed parameter! Besides setSeed, one may also use a constructor with a seed (e.g. new Random(seed)).
When setSeed(seed) or new Random(seed) are not used, the Random() constructor sets the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor.
Java reference for the above information: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html
In the ordinary case, don't use a seed. Just use the empty constructor Random() and don't call setSeed. This way you'll likely get different pseudo-random numbers each time the class is constructed and invoked.
For data dependent debugging, where you want to repeat the same pseudo-random numbers, use a specific seed. In this case, use Random(seed) or setSeed(seed).
For non-security critical uses, there's no need to worry whether the specific seed/sequence might be recognized and subsequent numbers predicted, because of the large range of seeds. However, "Instances of java.util.Random are not cryptographically secure. Consider instead using SecureRandom to get a cryptographically secure pseudo-random number generator for use by security-sensitive applications." source

Several others have mentioned reproducibility. Reproducibility is at the heart of debugging, you need to be able to reproduce the circumstances in which the bug occurred.
Another important use of reproducibility is that you can play some statistical games to reduce the variability of some estimates. See Wikipedia's Variance Reduction article for more details, but the intuition is as follows. Suppose you're considering two different layouts for a bank or a grocery store. You can't build them both and see which works better, so you use simulation. You know from queueing theory that the size of lines and delays customers experience are partly due to the layout, but also partly due to the variation in arrival times, demand loads, etc, so you use randomness in your two models. If you run the models completely independently, and find that the lines are bigger in layout 1 than in layout 2, it might be because of the layout or it might be because layout 1 just happened to get more customers or a more demanding mix of transactions due to the luck of the draw. However, if both systems use the exact same set of customers arriving at the same times and having the same transaction demands, it's a "fairer" comparison. The differences you observe are more likely to be because of the layout. You can accomplish this by reproducing the randomness in both systems - use the same seeds, and synchronize so that the same random numbers are used for the same purpose in both systems.

Generate random number using longer seed in java

So I know I can call random(long val) in Java to generate a random number with a seed of 2^63. How would one do better (larger seed value)? I'm assuming this would have to be a manual class to perform such an accomplishment, but I'm a little lost as how to begin.

Two ways to increase the amount of seed material are:
use an RNG designed to accept a lot of seed material.
combine two different RNGs.
For the first, look at some of George Marsaglia's methods, which use arrays to hold their state. There is an example at http://programmingpraxis.com/2010/10/05/george-marsaglias-random-number-generators/ (be careful to note the correction in the comments:
#define SHR3 (jsr^=(jsr<>17), jsr^=(jsr<<5))
The array t[256] is where most of the seed is held.
For the second, look at Pierre L'Ecuyer's work, for example, Efficient and Portable Combined Random Number Generators

Generate random integers in java

How to generate random integers but making sure that they don't ever repeat?
For now I use :
Random randomGenerator = new Random();
randomGenerator.nextInt(100);
EDIT I
I'm looking for most efficient way, or least bad
EDIT II
Range is not important

ArrayList<Integer> list = new ArrayList<Integer>(100);
for(int i = 0; i < 100; i++)
{
list.add(i);
}
Collections.shuffle(list);
Now, list contains the numbers 0 through 99, but in a random order.

If what you want is a pseudo-random non-repeating sequence of numbers then you should look at a linear feedback shift register. It will produce all the numbers between 0 and a given power of 2 without ever repeating. You can easily limit it to N by picking the nearest larger power of 2 and discarding all results over N. It doesn't have the memory constraints the other colleciton based solutions here have.
You can find java implementations here

How to generate random integers but making sure that they don't ever repeat?
First, I'd just like to point out that the constraint that the numbers don't repeat makes them non-random by definition.
I think that what you really need is a randomly generated permutation of the numbers in some range; e.g. 0 to 99. Even then, once you have used all numbers in the range, a repeat is unavoidable.
Obviously, you can increase the size of your range so that you can get a larger number without any repeats. But when you do this you run into the problem that your generator needs to remember all previously generated numbers. For large N that takes a lot of memory.
The alternative to remembering lots of numbers is to use a pseudo-random number generator with a long cycle length, and return the entire state of the generator as the "random" number. That guarantees no repeated numbers ... until the generator cycles.
(This answer is probably way beyond what the OP is interested in ... but someone might find it useful.)

If you have a very large range of integers (>>100), then you could put the generated integers into a hash table. When generating new random numbers, keep generating until you get a number which isn't in your hash table.

Depending on the application, you could also generate a strictly increasing sequence, i.e. start with a seed and add a random number within a range to it, then re-use that result as the seed for the next number. You can set how guessable it is by adjusting the range, balancing this with how many numbers you will need (if you made incremental steps of up to e.g., 1,000, you're not going to exhaust a 64-bit unsigned integer very quickly, for example).
Of course, this is pretty bad if you're trying to create some kind of unguessable number in the cryptographic sense, however having a non-repeating sequence would probably provide a reasonably effective attack on any cypher based on it, so I'm hoping you're not employing this in any kind of security context.
That said, this solution is not prone to timing attacks, which some of the others suggested are.

Matthew Flaschen has the solution that will work for small numbers. If your range is really big, it could be better to keep track of used numbers using some sort of Set:
Set usedNumbers = new HashSet();
Random randomGenerator = new Random();
int currentNumber;
while(IStillWantMoreNumbers) {
do {
currentNumber = randomGenerator.nextInt(100000);
} while (usedNumbers.contains(currentNumber));
}
You'll have to be careful with this though, because as the proportion of "used" numbers increases, the amount of time this function takes will increase exponentially. It's really only a good idea if your range is much larger than the amount of numbers you need to generate.

Since I can't comment on the earlier answers above due to not having enough reputation (which seems backwards... shouldn't I be able to comment on others' answers, but not provide my own answers?... anyway...), I'd like to mention that there is a major flaw with relying on Collections.shuffle() which has little to do with the memory constraints of your collection:
Collections.shuffle() uses a Random object, which in Java uses a 48-bit seed. This means there are 281,474,976,710,656 possible seed values. That seems like a lot. But consider if you want to use this method to shuffle a 52-card deck. A 52-card deck has 52! (over 8*10^67 possible configurations). Since you'll always get the same shuffled results if you use the same seed, you can see that the possible configurations of a 52-card deck that Collections.shuffle() can produce is but a small fraction of all the possible configurations.
In fact, Collections.shuffle() is not a good solution for shuffling any collection over 16 elements. A 17-element collection has 17! or 355,687,428,096,000 configurations, meaning 74,212,451,385,344 configurations will never be the outcome of Collections.shuffle() for a 17-element list.
Depending on your needs, this can be extremely important. Poor choice of shuffle/randomization techniques can leave your software vulnerable to attack. For instance, if you used Collections.shuffle() or a similar algorithm to implement a commercial poker server, your shuffling would be biased and a savvy computer-assisted player could use that knowledge to their benefit, as it skews the odds.

If you want 256 random numbers between 0 and 255, generate one random byte, then XOR a counter with it.
byte randomSeed = rng.nextInt(255);
for (int i = 0; i < 256; i++) {
byte randomResult = randomSeed ^ (byte) i;
<< Do something with randomResult >>
}
Works for any power of 2.

If the Range of values is not finite, then you can create an object which uses a List to keep track of the Ranges of used Integers. Each time a new random integer is needed, one would be generated and checked against the used ranges. If the integer is unused, then it would add that integer as a new used Range, add it to an existing used Range, or merge two Ranges as appropriate.
But you probably really want Matthew Flaschen's solution.

Linear Congruential Generator can be used to generate a cycle with different random numbers (full cycle).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.