I have a logic to generate unique ID in java as below,
private static String generateUniqueNum()
{
final int LENGTH = 20;
final long uniqueNumber = abs(Long.valueOf(UUID.randomUUID().hashCode())) + System.currentTimeMillis() + abs(random.nextLong());
String value = Long.toString(uniqueNumber).replaceAll("-", "");
final int endIndex = value.length() <= LENGTH ? value.length() : LENGTH;
return String.format("MN%s", value.substring(0, endIndex));
}
private static Long abs(Long number){
if(null == number || 0 < number){
return 0L;
}
return Math.abs(number);
}
For the above code I have tested with jmeter with 3000 requests simultaneously and when I checked the result value generted, there are lot of duplicate valus generated. I dont know the reason how the duplicate value generated. Can someone has any idea on this.
Thanks in advance.
UUID is unique but uses 128 bits (two longs).
It would be best to use for a database key for instance the String representation. For a less safe long:
return uuid.getLeastSignificantBits() ^ uuid.getMostSignificantBits();
Your clashes stem from hash code being an int (32 bits, a quarter), and combining with other properties will not necessarily make the number more random; even less so.
There are few problems with your current approach:
java.util.Random is not thread-safe. If you have 3000 concurrent requests on different threads to the same Random object you may get strange behavior. Try ThreadLocalRandom instead.
Generated System.currentTimeMillis() values will be close together if the 3000 concurrent requests are simultaneous.
Don't over complicate the problem, if you need a unique identifier:
Use UUID v4 (UUID.randomUUID()) or if you need stronger guarantees UUID v1. There is enough random bits in the UUID to guarantee that collisions are extremely unlikely.
If you need something more concise maintain a shared counter e.g. database sequence. Simply increment by one each time you want new unique number.
Q: Why is your scheme giving significant non-uniqueness?
A: There is a howler of bug!
private static Long abs(Long number){
if(null == number || 0 < number){
return 0L;
}
return Math.abs(number);
}
If number is less than zero, the 0 < number means that this abs method will returning zero as the "absolute" value.
Why does that matter? Well, your random number generation effectively does this:
long number = yourAbs(random int) + current time in millis + yourAbs(random long)
But because of your bug, there is a 50% probability that first term will be zero and a 50% probability that the last term will be zero.
So there is a 25% probablility, your "random" number will be the current time in milliseconds! Now suppose that you call generateUniqueNum() twice in a millisecond .....
Oooops!!!
Note that without this bug, your code would still have a chance of 1 in 264 that any pair of numbers generated will be equal. If you then combine this with a "birthday paradox" analysis, the probability of any collisions becomes significant.
Related
Assume I have a Java BitSet. I now need to make combinations of the BitSet such that only Bits which are Set can be flipped. i.e. only need combinations of Bits which are set.
For Eg. BitSet - 1010, Combinations - 1010, 1000, 0010, 0000
BitSet - 1100, Combination - 1100, 1000, 0100, 0000
I can think of a few solutions E.g. I can take combinations of all 4 bits and then XOR the combinations with the original Bitset. But this would be very resource-intensive for large sparse BitSets. So I was looking for a more elegant solution.
It appears that you want to get the power set of the bit set. There is already an answer here about how to get the power set of a Set<T>. Here, I will show a modified version of the algorithm shown in that post, using BitSets:
private static Set<BitSet> powerset(BitSet set) {
Set<BitSet> sets = new HashSet<>();
if (set.isEmpty()) {
sets.add(new BitSet(0));
return sets;
}
Integer head = set.nextSetBit(0);
BitSet rest = set.get(0, set.size());
rest.clear(head);
for (BitSet s : powerset(rest)) {
BitSet newSet = s.get(0, s.size());
newSet.set(head);
sets.add(newSet);
sets.add(s);
}
return sets;
}
You can perform the operation in a single linear pass instead of recursion, if you realize the integer numbers are a computer’s intrinsic variant of “on off” patterns and iterating over the appropriate integer range will ultimately produce all possible permutations. The only challenge in your case, is to transfer the densely packed bits of an integer number to the target bits of a BitSet.
Here is such a solution:
static List<BitSet> powerset(BitSet set) {
int nBits = set.cardinality();
if(nBits > 30) throw new OutOfMemoryError(
"Not enough memory for "+BigInteger.ONE.shiftLeft(nBits)+" BitSets");
int max = 1 << nBits;
int[] targetBits = set.stream().toArray();
List<BitSet> sets = new ArrayList<>(max);
for(int onOff = 0; onOff < max; onOff++) {
BitSet next = new BitSet(set.size());
for(int bitsToSet = onOff, ix = 0; bitsToSet != 0; ix++, bitsToSet>>>=1) {
if((bitsToSet & 1) == 0) {
int skip = Integer.numberOfTrailingZeros(bitsToSet);
ix += skip;
bitsToSet >>>= skip;
}
next.set(targetBits[ix]);
}
sets.add(next);
}
return sets;
}
It uses an int value for the iteration, which is already enough to represent all permutations that can ever be stored in one of Java’s builtin collections. If your source BitSet has 2³¹ one bits, the 2³² possible combinations do not only require a hundred GB heap, but also a collection supporting 2³² elements, i.e. a size not representable as int.
So the code above terminates early if the number exceeds the capabilities, without even trying. You could rewrite it to use a long or even BigInteger instead, to keep it busy in such cases, until it will fail with an OutOfMemoryError anyway.
For the working cases, the int solution is the most efficient variant.
Note that the code returns a List rather than a HashSet to avoid the costs of hashing. The values are already known to be unique and hashing would only pay off if you want to perform lookups, i.e. call contains with another BitSet. But to test whether an existing BitSet is a permutation of your input BitSet, you wouldn’t even need to generate all permutations, a simple bit operation, e.g. andNot would tell you that already. So for storing and iterating the permutations, an ArrayList is more efficient.
I'm try to see if large numbers are prime or not, number whose length are 11. Here is the code I am using:
private static boolean isPrime(BigInteger eval_number){
for(int i=2;i < eval_number.intValue();i++) {
if(eval_number.intValue() % i==0)
return false;
}
return true;
}
Now the number I'm inspecting in the debugger is eval_number which equals 11235813213. However when I inspect the eval_number.intValue() in the debugger instead of the value being 11235813213 the value is -1649088675. How is this happening? Also what would be a better way in inspecting large numbers to see if they are prime?
The strange value is a result of an overflow. The number held by the BigInteger instance is greater than 2^31-1 (Integer.MAX_VALUE) thus it can't be represented by an int. For the primcheck: BigInteger provides isProbablePrime(int) and there are several other fast (more or less) algorithms that allow to check whether a number is a primnumber with a given failure-rate. If you prefer 100% certainty you can optimize your code by reducing the upper-bounds for numbers to check to sqrt(input) and increasing the step-size by two. Or generate a prim-table, if the algorithm is used several times.
intValue() returns an integer equivalent for the given BigInteger number.
Since you are passing the value 11235813213, which is much larger than Integer.MAX_VALUE(maximum possible value for an int variable), which is 2147483647. So , it resulted in overflowing of the integer.
Also what would be a better way in inspecting large numbers to see if
they are prime?
You should use only BigInteger numbers for finding out large primes. Also, check this question (Determining if a BigInteger is Prime in Java) which I asked a year ago.
As others have said the number you are checking is ouside of the range of int.
You could use a long, but that only delays the problem, it will still fail on numbers beyond long's range.
The solution is to use BigInteger arithmetic :
private static boolean isPrime(BigInteger eval_number) {
for (BigInteger i = BigInteger.valueOf(2); i.compareTo(eval_number) < 0; i = i.add(BigInteger.ONE)) {
if (eval_number.mod(i).equals(BigInteger.ZERO)) {
return false;
}
}
return true;
}
That is just a correction of the inmediate problem your question is about. There are still things to improve there. Checking for being prime can be made more efficient. You don't have to check even numbers except 2 and you only need to check till the square root of the number in question.
You convert BigInteger to 32bit integer. If it is bigger than 2^31, it will return incorrect value. You need to do all the operations over BigInteger instances. I assume that you use BigInteger because of long being insufficient for other cases, but for number you stated as an example would be use of long instead of int sufficient. (long will be enough for numbers up to 2^63).
You have to make all operations with BigInteger, without converting it to int :
private static boolean isPrime(BigInteger eval_number) {
for (BigInteger i = BigInteger.valueOf(2); i.compareTo(eval_number) < 0; i = i.add(BigInteger.ONE)) {
if (eval_number.divideAndRemainder(i)[1].equals(BigInteger.ZERO)) {
System.out.println(i);
return false;
}
}
return true;
}
If you want to check whether a BigInteger is Prime or not you can use java.math.BigInteger.isProbablePrime(int certainty) it will returns true if this BigInteger is probably prime, false if it's definitely composite. If certainty is ≤ 0, true is returned.
I would like to generate random identifier in java. The identifier should have a fixed size, and the probability of generating the same identifier twice should be very low(The system has about 500 000 users).In addition; the identifier should be so long that it’s unfeasible to “guess it” by a brute force attack.
My approach so far is something along the lines of this:
String alphabet = "0123456789ABCDE....and so on";
int lengthOfAlphabet = 42;
long length = 12;
public String generateIdentifier(){
String identifier = "";
Random random = new Random();
for(int i = 0;i<length;i++){
identifier+= alphabet.charAt(random.nextInt(lengthOfAlphabet));
}
return identifier;
}
I’m enforcing the uniqueness by a constraint in the database. If I hit an identifier that already has been created, I’ll keep generating until I find one that’s not in use.
My assumption is that I can tweak lenghtOfAlpahbet and length to get the properties I’m looking for:
Rare collisions
Unfeasible to brute force
The identifier should be as short as possible, as the users of the system will have to type it.
Is this a good approach? Does anyone have any thoughts on the value of “length”?
I think randomUUID is your friend. It is fixed width. http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID()
If I remember my math correctly, since the UUID is 32 hex numbers (0-f) then the number of permutations are 16^32, which is a big number, and therefore pretty hard to guess.
I would suggest keeping it simple, and use built in methods to represent normal pseudo-random integers encoded as Strings:
Random random = new Random();
/**
* Generates random Strings of 1 to 6 characters. 0 to zik0zj
*/
public String generateShortIdentifier() {
int number;
while((number=random.nextInt())<0);
return Integer.toString(number, Character.MAX_RADIX);
}
/**
* Generates random Strings of 1 to 13 characters. 0 to 1y2p0ij32e8e7
*/
public String generateLongIdentifier() {
long number;
while((number=random.nextLong())<0);
return Long.toString(number, Character.MAX_RADIX);
}
Character.MAX_RADIX is 36, which would equal an alphabet of all 0 to 9 and A to Z. In short, you would be converting the random integers to a number of base 36.
If you want, you can tweak the length you want, but in just 13 characters you can encode 2^63 numbers.
EDIT: Modified it to generate only 0 to 2^63, no negative numbers, but that's up to you.
Consider the following method:
public static boolean isPrime(int n) {
return ! (new String(new char[n])).matches(".?|(..+?)\\1+");
}
I've never been a regular expression guru, so can anyone fully explain how this method actually works? Furthermore, is it efficient compared to other possible methods for determining whether an integer is prime?
First, note that this regex applies to numbers represented in a unary counting system, i.e.
1 is 1
11 is 2
111 is 3
1111 is 4
11111 is 5
111111 is 6
1111111 is 7
and so on. Really, any character can be used (hence the .s in the expression), but I'll use "1".
Second, note that this regex matches composite (non-prime) numbers; thus negation detects primality.
Explanation:
The first half of the expression,
.?
says that the strings "" (0) and "1" (1) are matches, i.e. not prime (by definition, though arguable.)
The second half, in simple English, says:
Match the shortest string whose length is at least 2, for example, "11" (2). Now, see if we can match the entire string by repeating it. Does "1111" (4) match? Does "111111" (6) match? Does "11111111" (8) match? And so on. If not, then try it again for the next shortest string, "111" (3). Etc.
You can now see how, if the original string can't be matched as a multiple of its substrings, then by definition, it's prime!
BTW, the non-greedy operator ? is what makes the "algorithm" start from the shortest and count up.
Efficiency:
It's interesting, but certainly not efficient, by various arguments, some of which I'll consolidate below:
As #TeddHopp notes, the well-known sieve-of-Eratosthenes approach would not bother to check multiples of integers such as 4, 6, and 9, having been "visited" already while checking multiples of 2 and 3. Alas, this regex approach exhaustively checks every smaller integer.
As #PetarMinchev notes, we can "short-circuit" the multiples-checking scheme once we reach the square root of the number. We should be able to because a factor greater than the square root must partner with a factor lesser than the square root (since otherwise two factors greater than the square root would produce a product greater than the number), and if this greater factor exists, then we should have already encountered (and thus, matched) the lesser factor.
As #Jesper and #Brian note with concision, from a non-algorithmic perspective, consider how a regular expression would begin by allocating memory to store the string, e.g. char[9000] for 9000. Well, that was easy, wasn't it? ;)
As #Foon notes, there exist probabilistic methods which may be more efficient for larger numbers, though they may not always be correct (turning up pseudoprimes instead). But also there are deterministic tests that are 100% accurate and far more efficient than sieve-based methods. Wolfram's has a nice summary.
The unary characteristics of primes and why this works has already been covered. So here's a test using conventional approaches and this approach:
public class Main {
public static void main(String[] args) {
long time = System.nanoTime();
for (int i = 2; i < 10000; i++) {
isPrimeOld(i);
}
time = System.nanoTime() - time;
System.out.println(time + " ns (" + time / 1000000 + " ms)");
time = System.nanoTime();
for (int i = 2; i < 10000; i++) {
isPrimeRegex(i);
}
time = System.nanoTime() - time;
System.out.println(time + " ns (" + time / 1000000 + " ms)");
System.out.println("Done");
}
public static boolean isPrimeRegex(int n) {
return !(new String(new char[n])).matches(".?|(..+?)\\1+");
}
public static boolean isPrimeOld(int n) {
if (n == 2)
return true;
if (n < 2)
return false;
if ((n & 1) == 0)
return false;
int limit = (int) Math.round(Math.sqrt(n));
for (int i = 3; i <= limit; i += 2) {
if (n % i == 0)
return false;
}
return true;
}
}
This test computes whether or not the number is prime up to 9,999, starting from 2. And here's its output on a relatively powerful server:
8537795 ns (8 ms)
30842526146 ns (30842 ms)
Done
So it is grossly inefficient once the numbers get large enough. (For up to 999, the regex runs in about 400 ms.) For small numbers, it's fast, but it's still faster to generate the primes up to 9,999 the conventional way than it is to even generate primes up to 99 the old way (23 ms).
This is not a really efficient way to check if a number is prime(it checks every divisor).
An efficient way is to check for divisors up to sqrt(number). This is if you want to be certain if a number is prime. Otherwise there are probabilistic primality checks which are faster, but not 100% correct.
I'm looking to randomize a BigInteger. The intent is to pick a number from 1 to 8180385048. Though, from what I noticed, the BigInteger(BitLen, Random) does it from n to X2-1, I'd want some unpredictable number. I tried to make a method that would do it, but I keep running into bugs and have finally given in to asking on here. :P Does anyone have any suggestions on how to do this?
Judging from the docs of Random.nextInt(int n) which obviously needs to solve the same problem, they seem to have concluded that you can't do better than "resampling if out of range", but that the penalty is expected to be negligible.
From the docs:
The algorithm is slightly tricky. It rejects values that would result in an uneven distribution (due to the fact that 231 is not divisible by n). The probability of a value being rejected depends on n. The worst case is n=230+1, for which the probability of a reject is 1/2, and the expected number of iterations before the loop terminates is 2.
I'd suggest you simply use the randomizing constructor you mentioned and iterate until you reach a value that is in range, for instance like this:
public static BigInteger rndBigInt(BigInteger max) {
Random rnd = new Random();
do {
BigInteger i = new BigInteger(max.bitLength(), rnd);
if (i.compareTo(max) <= 0)
return i;
} while (true);
}
public static void main(String... args) {
System.out.println(rndBigInt(new BigInteger("8180385048")));
}
For your particular case (with max = 8180385048), the probability of having to reiterate, even once, is about 4.8 %, so no worries :-)
Make a loop and get random BigIntegers of the minimum bit length that covers your range until you obtain one number in range. That should preserve the distribution of random numbers.
Reiterating if out of range, as suggested in other answers, is a solution to this problem. However if you want to avoid this, another option is to use the modulus operator:
BigInteger i = new BigInteger(max.bitLength(), rnd);
i = i.mod(max); // Now 0 <= i <= max - 1
i = i.add(BigInteger.ONE); // Now 1 <= i <= max