In a project I'm working on, I need to generate 16 character long unique IDs, consisting of 10 numbers plus 26 uppercase letters (only uppercase). They must be guaranteed to be universally unique, with zero chance of a repeat ever.
The IDs are not stored forever. An ID is thrown out of the database after a period of time and a new unique ID must be generated. The IDs can never repeat with the thrown out ones either.
So randomly generating 16 digits and checking against a list of previously generated IDs is not an option because there is no comprehensive list of previous IDs. Also, UUID will not work because the IDs must be 16 digits in length.
Right now I'm using 16-Digit Unique IDs, that are guaranteed to be universally unique every time they're generated (I'm using timestamps to generate them plus unique server ID). However, I need the IDs to be difficult to predict, and using timestamps makes them easy to predict.
So what I need to do is map the 16 digit numeric IDs that I have into the larger range of 10 digits + 26 letters without losing uniqueness. I need some sort of hashing function that maps from a smaller range to a larger range, guaranteeing a one-to-one mapping so that the unique IDs are guaranteed to stay unique after being mapped.
I have searched and so far have not found any hashing or mapping functions that are guaranteed to be collision-free, but one must exist if I'm mapping to a larger space. Any suggestions are appreciated.
Brandon Staggs wrote a good article on Implementing a Partial Serial Number Verification System. The examples are written in Delphi, but could be converted to other languages.
EDIT: This is an updated answer, as I misread the constraints on the final ID.
Here is a possible solution.
Let set:
UID16 = 16-digit unique ID
LUID = 16-symbol UID (using digits+letters)
SECRET = a secret string
HASH = some hash of SECRET+UID16
Now, you can compute:
LUID = BASE36(UID16) + SUBSTR(BASE36(HASH), 0, 5)
BASE36(UID16) will produce a 11-character string (because 16 / log10(36) ~= 10.28)
It is guaranteed to be unique because the original UID16 is fully included in the final ID. If you happen to get a hash collision with two different UID16, you'll still have two distinct LUID.
Yet, it is difficult to predict because the 5 other symbols are based on a non-predictable hash.
NB: you'll only get log2(36^5) ~= 26 bits of entropy on the hash part, which may or may not be enough depending on your security requirements. The less predictable the original UID16, the better.
One general solution to your problem is encryption. Because encryption is reversible it is always a one-to-one mapping from plaintext to cyphertext. If you encrypt the numbers [0, 1, 2, 3, ...] then you are guaranteed that the resulting cyphertexts are also unique, as long as you keep the same key, do not repeat a number or overflow the allowed size. You just need to keep track of the next number to encrypt, incrementing as needed, and check that it never overflows.
The problem then reduces to the size (in bits) of the encryption and how to present it as text. You say: "10 numbers plus 26 uppercase letters (only uppercase)." That is similar to Base32 encoding, which uses the digits 2, 3, 4, 5, 6, 7 and 26 letters. Not exactly what you require, but perhaps close enough and available off the shelf. 16 characters at 5 bits per Base32 character is 80 bits. You could use an 80 bit block cipher and convert the output to Base32. Either roll your own simple Feistel cipher or use Hasty Pudding cipher, which can be set for any block size. Do not roll your own if there is a major security requirement here. Your own Feistel cipher will give you uniqueness and obfuscation, not security. Hasty Pudding gives security as well.
If you really do need all 10 digits and 26 letters, then you are looking at a number in base 36. Work out the required bit size for 36^16 and proceed as before. Convert the cyphertext bits to a number expressed in base 36.
If you write your own cipher then it appears that you do not need the decryption function, which will save a little work.
You want to map from a space consisting of 1016 distinct values to one with 3616 values.
The ratio of the sizes of these two spaces is ~795,866,110.
Use BigDecimal and multiply each input value by the ratio to distribute the input keys equally over the output space. Then base-36 encode the resulting value.
Here's a sample of 16-digit values consisting of 11 digits "timestamp" and 5 digits server ID encoded using the above scheme.
Decimal ID Base-36 Encoding
---------------- ----------------
4156333000101044 -> EYNSC8L1QJD7MJDK
4156333000201044 -> EYNSC8LTY4Y8Y7A0
4156333000301044 -> EYNSC8MM5QJA9V6G
4156333000401044 -> EYNSC8NEDC4BLJ2W
4156333000501044 -> EYNSC8O6KXPCX6ZC
4156333000601044 -> EYNSC8OYSJAE8UVS
4156333000701044 -> EYNSC8PR04VFKIS8
4156333000801044 -> EYNSC8QJ7QGGW6OO
The first 11 digits form the "timestamp" and I calculated the result for a series incremented by 1; the last five digits are an arbitrary "server ID", in this case 01044.
Related
I have a requirement to generate random and unique key of exactly 10 characters for every record, based on some particular fields.
It should give me same key if I supply same set of information next time.
In summary I am looking for a way to convert a longer string to a 10 characters string, consistently.
Something like md5 hash but should output just 10 characters.
Thanks.
Note: As one of comments ask what I have tried. Basically I haven't found any proper solution and done lot of research on it.
The only solution I can think of is to store md5 hash and 10 characters key into a db which I can look up for next time. A custom 10 characters key can be generated which won't have any relation to md5 hash as such other than that mapping db.
What you're asking to do is not possible. The question title says, "generate random and unique key of exactly 10 characters based on longer input string".
By the Pigeonhole principle, you can't do that. Because there are more strings that are longer than 10 characters than there are strings that are exactly 10 characters long, any hashing function you come up with will generate duplicates. Multiple long strings will map to the same 10-character string.
You can't guarantee uniqueness.
Convert string to char array, chsr to int and multiply last and first char, second and second to last, until you have a 10 digit number.
I was wondering if I'm experiencing a bug, or have just run into a limitation of the Hashids algorithm.
I'm using a custom alphabet, which consists of all uppercase letters, minus "O" and "I" and digits 2 - 9.
After generating several million hashes, I noticed that duplicates started to appear. I'm confused by this, especially since Hashids claims that duplicates are not possible since the algorithm is simply a hex version of an integer. And so long as the integers remain unique (such as counting up forever), so will the hashes.
Does a custom alphabet make it more likely for duplicates to appear? Also, I was expecting the number of unique hashes for my alphabet to be: 32^7 = 34,359,738,368. Before my counter hit this number, the generated hashids grew from 7 characters long to 8.
Does anyone have any ideas as to why this is happening?
Edit: another rather strange anomaly: after generating 10647 hashes, the rest (2.9 million plus) either start with a K or an X. I'm beginning to think the custom alphabet plus the length of the salt affect how the letters get shuffled.
have same problem, just try this :
var hashids = new Hashids("BSomeoneNameN161179IBRB46", 5, "ABCDEFGHIJKLMNPQRSTUVWXYZ1234567890");
var id = hashids.encode(1234567);
var numbers = hashids.decode(id);
changing salt by deleting last 5 characters one by one, just showing same result.
making salt not more than 20 chars seems solve the problem.
I solved this issue by adding the letters and digits I,O,0,1 back into the alphabet being used. With the increase in length of the alphabet, the rotation calculated by Hashids was affected. I simply filtered out any output that included an I,O,0 or 1 using a regular expression.
I need to generate a reservation code of 6 alpha numeric characters, that is random and unique in java.
Tried using UUID.randomuuid().toString(), However the id is too long and the requirement demands that it should only be 6 characters.
What approaches are possible to achieve this?
Just to clarify, (Since this question is getting marked as duplicate).
The other solutions I've found are simply generating random characters, which is not enough in this case. I need to reasonably ensure that a random code is not generated again.
Consider using the hashids library to generate salted hashes of integers (your database ids or other random integers which is probably better).
http://hashids.org/java/
Hashids hashids = new Hashids("this is my salt",6);
String id = hashids.encode(1, 2, 3);
long[] numbers = hashids.decode(id);
You have 36 characters in the alphanumeric character set (0-9 digits + a-z letters). With 6 places you achieve 366 = 2.176.782.336 different options, that is slightly larger than 231.
Therefore you can use Unix time to create a unique ID. However, you must assure that no ID generated within the same second.
If you cannot guarantee that, you end up with a (synchronized) counter within your class. Also, if you want to survive a JVM restart, you should save the current value (e.g. to a database, file, etc. whatever options you have).
Despite its name, UUIDs are not unique. It's simply extremely unlikely to get a 128 bit collision. With 6 (less than 32 bit) it's very likely that you get a collision if you just hash stuff or generate a random string.
If the uniqueness constraint is necessary then you need to
generate a random 6 character string
Check if you generated that string before by querying your database
If you generated it before, go back to 1
Another way would be to use a pseadorandom permutation (PRP) of size 32 bit. Block ciphers are modeled as PRP functions, but there aren't many that support 32 bit block sizes. Some are Speck by the NSA and the Hasty Pudding Cipher.
With a PRP you could for example take an already unique value like your database primary key and encrypt it with the block cipher. If the input is not bigger than 32 bit then the output will still be unique.
Then you would run Base62 (or at least Base 41) over the output and remove the padding characters to get a 6 character output.
if you do a substring that value may not be unique
for more info please see following similar link
Generating 8-character only UUIDs
Lets say your corpus is the collection of alpha numberic letters. a-zA-Z0-9.
char[] corpus = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
We can use SecureRandom to generate a seed, which will ask the OS for entropy, depending on the os. The trick here is to keep a uniform distribution, each byte has 255 values, but we only need around 62 so I will propose rejection sampling.
int generated = 0;
int desired=6;
char[] result= new char[desired];
while(generated<desired){
byte[] ran = SecureRandom.getSeed(desired);
for(byte b: ran){
if(b>=0&&b<corpus.length){
result[generated] = corpus[b];
generated+=1;
if(generated==desired) break;
}
}
}
Improvements could include, smarter wrapping of generated values.
When can we expect a repeat? Lets stick with the corpus of 62 and assume that the distribution is completely random. In that case we have the birthday problem. That gives us N = 62^6 possiblities. We want to find n where the chance of a repeat around 10%.
p(r)= 1 - N!/(N^n (N-n)!)
And using the approximation given in the wikipedia page.
n = sqrt(-ln(0.9)2N)
Which gives us about 109000 numbers for 10% chance. For a 0.1% chance it woul take about 10000 numbers.
you can trying to make substring out of your generated UUID.
String uuid = UUID.randomUUID().toString();
System.out.println("uuid = " + uuid.substring(0,5);
I am looking for a function (hash function? not sure) that maps a set of (unique) numbers to another set of (unique) numbers. I had a look at perfect hashing -and the fact that it's impossible to do :( - and it seems quite close to what I am after.
In detail, I want the following:
Map any given 12 digit number to another number, 12 digit or more (I don't care about the resulting size but it must fit in a Java long).
Must be able to guarantee that the same 12 digit number maps to the same number every time.
Must be able to guarantee that a different 12 digit number always maps to a different number, i.e. there are no collisions in the result set.
Must be able to guarantee that the function is one way, which (in my mind anyway) means that you cannot compute the number you started from given the result of the function.
Something like minimal hashing function would not work in my case, as each number has to be computed on a different computer, meaning that the function itself has to guarantee these features without checking for collisions in the results and or having any centralized control of the result set.
An example of such a function would just be a simple:
take number, add 1, output number.
The only problem with that is that you can easily get first number just by subtracting one. I want it to be very difficult, or preferably impossible to get the previous number back.
Any thoughts ?
Translating the function from math to java I don't mind doing on my own. Unless you can suggest an already existing java library.
You are looking for Format-preserving encription algorithms.
It looks a cryptography problem for me.
If you want to map a String to other String and must be able to guarantee that is one way and has no collisions, you need encrypt the input String.
For example you can use DES.
If the output must be digits you can interpret the bytes of the output as hexadecimal and then convert to base 10.
What you want is a cipher. A good old fashioned symmetric cipher. Like AES.
Imagine for a second that instead of 12 digits, your numbers were 128 bits long. Let's say you set up an AES cipher with a key of your choosing, and use it to encrypt numbers. What is the outcome?
You will map any given 128-bit number to another number, of exactly 128 bits (AES's block size is 128 bits)
You can guarantee that the same 128-bit number maps to the same number every time (encryption is deterministic)
You can guarantee that two different 128-bit numbers always map to different numbers (encryption is reversible - there are no two plaintexts which encrypt to the same ciphertext, otherwise how could you decrypt them?)
You can guarantee that, for someone who doesn't have the key, the function is one way (encryption is not reversible without they key - it would be less useful if it was)
Now, 128 bits is bigger than you want, so AES is not the right cipher for you. All you need to do is choose a cipher which has a 12-digit block size.
There aren't any conventional ciphers which have a 12-digit block size. But it's actually pretty easy to construct one. You can use the Feistel construction to take a hash function and construct a block cipher. You can either build a binary cipher of a suitable size (40 bits in your case) and then use the "hasty pudding trick" to restrict its domain to 12 digits, or construct a cipher that works directly in decimal (more or less).
I wrote an answer to another question a while ago that explained this in some more detail; i even wrote some code to implement the idea, although looking at it now, i don't know how comprehensible it is. The key classes in it are TinyCipher, which implements a cipher with a block size of up to 32 bits (and could easily be extended up to 64 bits), and TrickCipher, which uses the hasty pudding trick to implement a cipher over an arbitrary-sized set (such as all 12-digit numbers).
I'm trying to implement the Chord protocol in order to quickly lookup some nodes and keys in a small network. What I can't figure out is ... Chord cosideres the nodes and keys as being placed on a cirlce. And their placement dictated by the hash values obtained by applying the SHA-1 hash function. How exactly do I operate with those values? Do I make them as a string de9f2c7f d25e1b3a fad3e85a 0bd17d9b 100db4b3 and then compare them as such, considering that "a" < "b" is true ? Or how? How do I know if a key is before or after another?
Since the keyspace is a ring, a single value can't be said to be greater than another, because if you go the other way around the ring, the opposite is true. You can say a value is within a range or not. In the Chord DHT, each server is responsible for the keys within the range of values between it and its predecessor.
I would advise against using strings for the hash values. You shouldn't use the hashCode function for distributed systems, but you need to math on the hash keys when adding new nodes. You could try converting the hashes into BigIntegers instead.
sha1 hashes are not strings but are very long hex numbers - they are often stored as strings because they would otherwise require a native 160 bit number type. They are built as 5 32 bit hex numbers and then often 'strung' together.
using sha1 strings as the numbers they represent is not hard but requires a library that can handle such large numbers (like BigInt or bcmath). these libraries work by calculating the numbers within the string one column at a time from the right to left, much like a person when using a pen and paper to add, multiply, divide, etc. they will typically have functions for doing common math as well comparisons etc, and often take strings as arguments. Also, make sure that you use a function for converting big numbers anytime you need to go from hex to dec, or else your 160 bit hex number will likely get rounded into a 64 bit dec float or similar and loose most of it's accuracy.
more/less than comparisons are used in chord to figure ranges but do so using modulo so that they 'wrap', making ranges such as [64, 2] possible. the actual formula is
find_successor(fingers[k] = n + 2^(k-1) mod(2^160))
where 'n' is the sha1 of a node and 'k' is the finger number.
remember, 'n' will be hex while 'k' and 'mod(160^2)' will typically be dec, so this is where your BigInt hex to BigInt dec will be needed.
even if your programing framework will let you create these vars as hex, 160 is specifically a dec (literally meaning one hounded and sixty bits) and besides, wrapping your brain around 'mod(160^2)' is already hard enough without visualizing it as hex. convert 'n' to dec rather than converting 'k' etc to hex , and then use a BigInt lib to do the math including comparisons.