How to generate unique positive Long using UUID - java

I have a requirement to generate unique Long ids for my database primary key column.
I thought i can use UUID.randomUUID().getMostSignificantBits() but sometimes its generating some negative long also which is problem for me.
Is it possible to generate only positive long from UUID ?There will be like billions of entries so i want that each generated key must be unique.

UUID.randomUUID().getMostSignificantBits() & Long.MAX_VALUE
The reason why this works is, when you do bitwise & with 1 it allows the same digit to pass as it is and when you do bitwise & with 0 it blocks it and result is 0. Now, Long.MAX_Value in binary is
0111111111111111111111111111111111111111111111111111111111111111
this is 0 followed by 63 1s (total is 64 bits, it's long in java)
So when you bitwise & a number X with this above number then you will get the same number X except that the leftmost bit is now turned into a zero. Which means you've only changed the sign of that number and not the value.

As the others have written, long does not have enough space for a unique number. But in many cases a number may be unique enough for a specific use.
For example, a timestamp with the nanosecond precision is often good enough.
To get it, shift the current milliseconds 20 bits left to allocate space for nanoseconds and then overlay it with the nanoseconds:
(System.currentTimeMillis() << 20) | (System.nanoTime() & ~9223372036854251520L);
The nano & ~9223372036854251520L part takes the current nanoseconds and sets the first 44 bytes to 0, leaving only the right 20 bits which represent nanoseconds up to one millisecond (999999 nanos)
It is the same as:
nanoseconds & ~1111111111111111111111111111111111111111111100000000000000000000
Side note: nanoseconds should not be used to represent the current time because their starting point is not fixed in time and because they are recycled when they reach the maximum.
You can use any other bit manipulation. It is usually good to take into account the current time and something else such as the current thread id, process id, ip.

Take a look at http://commons.apache.org/sandbox/commons-id//index.html
It has a LongGenerator that can give you exactly what you need.
In addition if you are using Hibernate then you can ask it to generate IDs for you (it has several algorithms you can choose from), in if not you can just take a look at their implementation for example http://grepcode.com/file/repo1.maven.org/maven2/hibernate/hibernate/2.1.8/net/sf/hibernate/id/TableHiLoGenerator.java#TableHiLoGenerator)

This code is inspired by #Daniel Nuriyev's answer. But, instead of using nano-time, a counter (or discriminator as I've seen it called) is used when collisions occur in the same millisecond:
private static long previousTimeMillis = System.currentTimeMillis();
private static long counter = 0L;
public static synchronized long nextID() {
long currentTimeMillis = System.currentTimeMillis();
counter = (currentTimeMillis == previousTimeMillis) ? (counter + 1L) & 1048575L : 0L;
previousTimeMillis = currentTimeMillis;
long timeComponent = (currentTimeMillis & 8796093022207L) << 20;
return timeComponent | counter;
}
This method generates a semi-unique ID by packing a millisecond timestamp-component together with a counter-component. The algorithm allows for roughly a million (or 1048575 to be exact) unique IDs to be generated in the same millisecond before collisions start to occur. Unique IDs are generated until the year 2248 at which point it will wrap around and start at 0 again.
The ID-generation is done as follows:
Milliseconds since epoch:
|0|000000000000000000000010110111101111100110001001111100101011111|
Bitwise AND with (8796093022207L):
|0|000000000000000000001111111111111111111111111111111111111111111|
to give you the 43 least significant bits as the time-component.
Then shift this to the left by 20 bits to give you:
|0|0010110111101111100110001001111100101011111|00000000000000000000|
Bitwise OR with 20 bits of counter (e.g. if counter is 3) to give you:
|0|0010110111101111100110001001111100101011111|00000000000000000101|
Only 43 bits (and not 44) are used for the time-component as we do not want to allow the most significant bit (which is the sign of the number) to be changed. This results in only positive IDs to be generated.

I just came across this solution. I am for the time being trying to understand the solution.It says Java implementation of twitter snowflake. 64 bit sequential ID generator based on twitter snowflake ID generation algorithm.
https://github.com/Predictor/javasnowflake
Any suggestions are welcome.

I want to do it in application side because if i will do it in database side i have to fire one more query again to get the id of the row..and i want to avoid that.
NO! You can use an AUTOINCREMENT primary key, and in JDBC retrieve the generated key with the INSERT.
String insertSQL = "INSERT INTO table... (name, ...)"
+ " VALUES(?, ..., ?)";
try (Connection connection = getConnection();
PreparedStatement stmt = connection.prepareStatement(insertSQL,
Statement.RETURN_GENERATED_KEYS)) {
stmt.setString(1, ...);
stmt.setInt(2, ...);
stmt.setBigDecimal(3, ...);
...
stmt.executeUpdate();
ResultSet keysRS = stmt.getGeneratedKeys();
if (keysRS.next()) {
long id = keysRS.getInt(1);
}
}
This is more efficient, and definitely easier, and safer. UUID are 128 bits. Taking just 64 bits reduces its uniqueness. So at least subjectively not 100% perfect. At least XOR (^) both long parts.

A bit late to reply but anyone reading this now, you can also implement LUHN algorithm to generate unique Id for your Primary Key. We have been using it for more than 5 years in our product and it does the job.

Related

Java program to generate a unique and random six alpha numeric code

I need to generate a reservation code of 6 alpha numeric characters, that is random and unique in java.
Tried using UUID.randomuuid().toString(), However the id is too long and the requirement demands that it should only be 6 characters.
What approaches are possible to achieve this?
Just to clarify, (Since this question is getting marked as duplicate).
The other solutions I've found are simply generating random characters, which is not enough in this case. I need to reasonably ensure that a random code is not generated again.
Consider using the hashids library to generate salted hashes of integers (your database ids or other random integers which is probably better).
http://hashids.org/java/
Hashids hashids = new Hashids("this is my salt",6);
String id = hashids.encode(1, 2, 3);
long[] numbers = hashids.decode(id);
You have 36 characters in the alphanumeric character set (0-9 digits + a-z letters). With 6 places you achieve 366 = 2.176.782.336 different options, that is slightly larger than 231.
Therefore you can use Unix time to create a unique ID. However, you must assure that no ID generated within the same second.
If you cannot guarantee that, you end up with a (synchronized) counter within your class. Also, if you want to survive a JVM restart, you should save the current value (e.g. to a database, file, etc. whatever options you have).
Despite its name, UUIDs are not unique. It's simply extremely unlikely to get a 128 bit collision. With 6 (less than 32 bit) it's very likely that you get a collision if you just hash stuff or generate a random string.
If the uniqueness constraint is necessary then you need to
generate a random 6 character string
Check if you generated that string before by querying your database
If you generated it before, go back to 1
Another way would be to use a pseadorandom permutation (PRP) of size 32 bit. Block ciphers are modeled as PRP functions, but there aren't many that support 32 bit block sizes. Some are Speck by the NSA and the Hasty Pudding Cipher.
With a PRP you could for example take an already unique value like your database primary key and encrypt it with the block cipher. If the input is not bigger than 32 bit then the output will still be unique.
Then you would run Base62 (or at least Base 41) over the output and remove the padding characters to get a 6 character output.
if you do a substring that value may not be unique
for more info please see following similar link
Generating 8-character only UUIDs
Lets say your corpus is the collection of alpha numberic letters. a-zA-Z0-9.
char[] corpus = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
We can use SecureRandom to generate a seed, which will ask the OS for entropy, depending on the os. The trick here is to keep a uniform distribution, each byte has 255 values, but we only need around 62 so I will propose rejection sampling.
int generated = 0;
int desired=6;
char[] result= new char[desired];
while(generated<desired){
byte[] ran = SecureRandom.getSeed(desired);
for(byte b: ran){
if(b>=0&&b<corpus.length){
result[generated] = corpus[b];
generated+=1;
if(generated==desired) break;
}
}
}
Improvements could include, smarter wrapping of generated values.
When can we expect a repeat? Lets stick with the corpus of 62 and assume that the distribution is completely random. In that case we have the birthday problem. That gives us N = 62^6 possiblities. We want to find n where the chance of a repeat around 10%.
p(r)= 1 - N!/(N^n (N-n)!)
And using the approximation given in the wikipedia page.
n = sqrt(-ln(0.9)2N)
Which gives us about 109000 numbers for 10% chance. For a 0.1% chance it woul take about 10000 numbers.
you can trying to make substring out of your generated UUID.
String uuid = UUID.randomUUID().toString();
System.out.println("uuid = " + uuid.substring(0,5);

java unique number less then 12 characters

I have a user case which involves generating a number which a user enters into a website to link a transaction to their account.
So I have the following code which generates a random 12 digit number:
public String getRedemptionCode(long utid, long userId) {
long nano = System.nanoTime();
long temp = nano + utid + 1232;
long redemptionCode = temp + userId + 5465;
if (redemptionCode < 0) {
redemptionCode = Math.abs(redemptionCode);
}
String redemptionCodeFinal = StringUtils.rightPad(String.valueOf(redemptionCode), 12, '1');
redemptionCodeFinal = redemptionCodeFinal.substring(0, 12);
return redemptionCodeFinal;
}
This method takes in two params which are generated by a DB.
What I need to understand is:
Is this random? I have a test which ran this method 1 million times and it always seem to be random.
Can I cut this down to 8 characters?
No it is neither unique nor random.
It is not "random" in the sense of highly entropic / uncorrelated with other values.
The only source of non-determinism is System.nanoTime, so all the entropy comes from a few of the least significant bits of the system clock. Simply adding numbers like 1232 and 5465 does not make the result less correlated with subsequent results.
Is this random? I have a test which ran this method 1 million times and it always seem to be random.
If this code is used in multiple threads on the same machine, or on multiple machines with synced clocks, you will see duplicates more quickly.
Since there is low entropy, you are likely to see duplicates by random chance fairly quickly. Math.se addresses the likelihood depending on how many of these you generate.
Can I cut this down to 8 characters?
Only if you don't lose entropy. Consider two ways of truncating a timestamp:
long time = ...; // Least significant bits have randomness.
String s = "" + time;
// Cut off the right-most, most entropic bits
String bad = time.substring(0, 8);
// Cut off the left-most, least entropic bits
String better = time.substring(time.length() - 8);
Since it is a straightforward calculation from an increasing counter, an attacker who can try multiple times can predict the value produced in a way that they would not be able to had you used a crypto-strong random number generator like java.util.SecureRandom.
Is this random?
You are asking, is your function based on System.nanoTime() a random number generator (RNG)?
The definition of RNG is: generator, which generates numbers that lack any pattern.
So, are numbers returned from your function without any pattern?
No, they have an easily-observable pattern, because they depend on System.nanoTime() (system clock).
Can I cut this down to 8 characters?
Yes, you can, but it's still not random. Adding or padding won't help too.
Use SecureRandom instead.

Why are initial random numbers similar when using similar seeds?

I discovered something strange with the generation of random numbers using Java's Random class.
Basically, if you create multiple Random objects using close seeds (for example between 1 and 1000) the first value generated by each generator will be almost the same, but the next values looks fine (i didn't search further).
Here are the two first generated doubles with seeds from 0 to 9 :
0 0.730967787376657 0.24053641567148587
1 0.7308781907032909 0.41008081149220166
2 0.7311469360199058 0.9014476240300544
3 0.731057369148862 0.07099203475193139
4 0.7306094602878371 0.9187140138555101
5 0.730519863614471 0.08825840967622589
6 0.7307886238322471 0.5796252073129174
7 0.7306990420600421 0.7491696031336331
8 0.7302511331990172 0.5968915822372118
9 0.7301615514268123 0.7664359929590888
And from 991 to 1000 :
991 0.7142160704801332 0.9453385235522973
992 0.7109015598097105 0.21848118381994108
993 0.7108119780375055 0.38802559454181795
994 0.7110807233541204 0.8793923921785096
995 0.7109911564830766 0.048936787999225295
996 0.7105432327208906 0.896658767102804
997 0.7104536509486856 0.0662031629235198
998 0.7107223962653005 0.5575699754613725
999 0.7106328293942568 0.7271143712820883
1000 0.7101849056320707 0.574836350385667
And here is a figure showing the first value generated with seeds from 0 to 100,000.
First random double generated based on the seed :
I searched for information about this, but I didn't see anything referring to this precise problem. I know that there is many issues with LCGs algorithms, but I didn't know about this one, and I was wondering if this was a known issue.
And also, do you know if this problem only for the first value (or first few values), or if it is more general and using close seeds should be avoided?
Thanks.
You'd be best served by downloading and reading the Random source, as well as some papers on pseudo-random generators, but here are some of the relevant parts of the source. To begin with, there are three constant parameters that control the algorithm:
private final static long multiplier = 0x5DEECE66DL;
private final static long addend = 0xBL;
private final static long mask = (1L << 48) - 1;
The multiplier works out to approximately 2^34 and change, the mask 2^48 - 1, and the addend is pretty close to 0 for this analysis.
When you create a Random with a seed, the constructor calls setSeed:
synchronized public void setSeed(long seed) {
seed = (seed ^ multiplier) & mask;
this.seed.set(seed);
haveNextNextGaussian = false;
}
You're providing a seed pretty close to zero, so initial seed value that gets set is dominated by multiplier when the two are OR'ed together. In all your test cases with seeds close to zero, the seed that is used internally is roughly 2^34; but it's easy to see that even if you provided very large seed numbers, similar user-provided seeds will yield similar internal seeds.
The final piece is the next(int) method, which actually generates a random integer of the requested length based on the current seed, and then updates the seed:
protected int next(int bits) {
long oldseed, nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
nextseed = (oldseed * multiplier + addend) & mask;
} while (!seed.compareAndSet(oldseed, nextseed));
return (int)(nextseed >>> (48 - bits));
}
This is called a 'linear congruential' pseudo-random generator, meaning that it generates each successive seed by multiplying the current seed by a constant multiplier and then adding a constant addend (and then masking to take the lower 48 bits, in this case). The quality of the generator is determined by the choice of multiplier and addend, but the ouput from all such generators can be easily predicted based on the current input and has a set period before it repeats itself (hence the recommendation not to use them in sensitive applications).
The reason you're seeing similar initial output from nextDouble given similar seeds is that, because the computation of the next integer only involves a multiplication and addition, the magnitude of the next integer is not much affected by differences in the lower bits. Calculation of the next double involves computing a large integer based on the seed and dividing it by another (constant) large integer, and the magnitude of the result is mostly affected by the magnitude of the integer.
Repeated calculations of the next seed will magnify the differences in the lower bits of the seed because of the repeated multiplication by the constant multiplier, and because the 48-bit mask throws out the highest bits each time, until eventually you see what looks like an even spread.
I wouldn't have called this an "issue".
And also, do you know if this problem only for the first value (or first few values), or if it is more general and using close seeds should be avoided?
Correlation patterns between successive numbers is a common problem with non-crypto PRNGs, and this is just one manifestation. The correlation (strictly auto-correlation) is inherent in the mathematics underlying the algorithm(s). If you want to understand that, you should probably start by reading the relevant part of Knuth's Art of Computer Programming Chapter 3.
If you need non-predictability you should use a (true) random seed for Random ... or let the system pick a "pretty random" one for you; e.g. using the no-args constructor. Or better still, use a real random number source or a crypto-quality PRNG instead of Random.
For the record:
The javadoc (Java 7) does not specify how Random() seeds itself.
The implementation of Random() on Java 7 for Linux, is seeded from the nanosecond clock, XORed with a 'uniquifier' sequence. The 'uniquifier' sequence is LCG which uses different multiplier, and whose state is static. This is intended to avoid auto-correlation of the seeds ...
This is a fairly typical behaviour for pseudo-random seeds - they aren't required to provide completely different random sequences, they only provide a guarantee that you can get the same sequence again if you use the same seed.
The behaviour happens because of the mathematical form of the PRNG - the Java one uses a linear congruential generator, so you are just seeing the results running the seed through one round of the linear congruential generator. This isn't enough to completely mix up all the bit patterns, hence you see similar results for similar seeds.
Your best strategy is probably just to use very different seeds - one option would be to obtain these by hashing the seed values that you are currently using.
By making random seeds (for instance, using some mathematical functions on System.currentTimeMillis() or System.nanoTime() for seed generation) you can get better random result. Also can look at here for more information

Unique number generation using shift operator and validate same with the bitwise & operator in java

I'm using LEFT SHIFT operator from java to generate some unique number and validate same with the bitwise & operator like below.
// Number generation
public final static long UNIQUE_NUMBER8 = (long) 1 << 8;
public final static long UNIQUE_NUMBER9 = (long) 1 << 9;
public final static long UNIQUE_NUMBER10 = (long) 1 << 10;
till
public final static long UNIQUE_NUMBER62 = (long) 1 << 62;
And validation of the same, I'm doing using below condition,
where request_number is coming from the url, which is assigned to
url the same generated number -
if ( request_number >= 0 && (request_number & UNIQUE_NUMBER10) != 0){
System.out.println("Valid");
}else{
System.out.println("Invalid");
}
But using above condition i'm not able to validate numbers above 1<< 62,
since all the numbers till 62 are being used and above it will generate
again same numbers which are already being used and
hence the bitwise & condition is failing.
Please help me too generate unique number using above or similar logic
and validate same with the bitwise & operator.
Thanks
You check that the number is not negative. You can only have bits 0 to 62 set and still be non-negative.
Perhaps you should be using BitSet, you don't need all the constants and you can have almost any number of bits.
However, to generate unique id you can either create UUID, or use System.currentTimeMillis() (checking for duplicates) or just AtomicLong.incrementAndGet() depending on what type of unique id you need.
These approaches avoid the need to remember previous ids by always increasing the number used to generate the id. UUID is unique across systems but is relatively combersome, using the currentTimeMillis can be unique even if the system is restarted (and has a built in timestamp), AtomicLong is the lightest, but restarts when the system does.
You could generate a UUID everytime you have a new request. Add it to a Set to check if its already been created/used. Thus you have an almost unlimited number of ids.
See here: http://www.javapractices.com/topic/TopicAction.do?Id=56
Using BitSet is a better option for you.

BitMask operation in java

Consider the scenario
I have values assigned like these
Amazon -1
Walmart -2
Target -4
Costco -8
Bjs -16
In DB, data is stored by masking these values based on their availability for each product.
eg.,
Mask product description
1 laptop Available in Amazon
17 iPhone Available in Amazon
and BJ
24 Mattress Available in
Costco and BJ's
Like these all the products are masked and stored in the DB.
How do I retrieve all the Retailers based on the Masked value.,
eg., For Mattress the masked value is 24. Then how would I find or list Costco & BJ's programmatically. Any algorithm/logic would be highly appreciated.
int mattress = 24;
int mask = 1;
for(int i = 0; i < num_stores; ++i) {
if(mask & mattress != 0) {
System.out.println("Store "+i+" has mattresses!");
}
mask = mask << 1;
}
The if statement lines up the the bits, if the mattress value has the same bit as the mask set, then the store whose mask that is sells mattresses. An AND of the mattress value and mask value will only be non-zero when the store sells mattresses. For each iteration we move the mask bit one position to the left.
Note that the mask values should be positive, not negative, if need be you can multiply by negative one.
Assuming you mean in a SQL database, then in your retrieval SQL, you can generally add e.g. WHERE (MyField AND 16) = 16, WHERE (MyField AND 24) = 24 etc.
However, note that if you're trying to optimise such retrievals, and the number of rows typically matching a query is much smaller than the total number of rows, then this probably isn't a very good way to represent this data. In that case, it would be better to have a separate "ProductStore" table that contains (ProductID, StoreID) pairs representing this information (and indexed on StoreID).
Are there at most two retailers whose inventories sum to the "masked" value in each case? If so you will still have to check all pairs to retrieve them, which will take n² time. Just use a nested loop.
If the value represents the sum of any number of retailers' inventories, then you are trying to trying to solve the subset-sum problem, so unfortunately you cannot do it in better than 2^n time.
If you are able to augment your original data structure with information to lookup the retailers contributing to the sum, then this would be ideal. But since you are asking the question I am assuming you don't have access to the data structure while it is being built, so to generate all subsets of retailers for checking you will want to look into Knuth's algorithm [pdf] for generating all k-combinations (and run it for 1...k) given in TAOCP Vol 4a Sec 7.2.1.3.
http://www.antiifcampaign.com/
Remember this. If you can remove the "if" with another construct(map/strategy pattern), for me you can let it there, otherwise that "if" is really dangerous!! (F.Cirillo)
In this case you can use map of map with bitmask operation.
Luca.

Categories