Generate 9 digit unique value from a String - java

I have employee data, each employee has address information. I need to generate a unique 9 digit (numeric or alpha numeric) value for postal code (5 chars) and address line1 (35 chars), which is a unique value to represent a location. It is also called as "Wrap number".
As shown in below picture, when address of two employees is same, then Wrap Number should be same, otherwise new value should be assigned.
Which algorithm is best suitable to generate 9 digit unique value?
P.S. I need to program it in Java.

What you're asking is impossible. No, really, impossible.
You have a 5-digit ZIP code, which can be encoded in 17 bits. Then you have 35 characters of text. Let's say you limit it to upper and lower case letters, plus digits and special characters. Figure 96 possible characters, or approximately 6.5 bits each. So:
35 * 6.5 = 227.5 ~ 228 bits
So you have up to 245 bits of information and you want to create a "unique" 9-character code. Your 9-character code only occupies 72 bits. You can't pack 228 bits of information into 72 bits without duplication. See Pigeonhole principle.
A better solution would be to assign a sequential number to each employee. If you want to make those 9-character codes, then use a technique to obfuscate the numbers and encode them using base-36 (numbers and upper-case letters) or something similar. I explain how to do that in my blog post, How to generate unique "random-looking" keys.

The simple idea is to use the well-known hash algorithms, which are already implemented in Java.
private static long generateIdentifier(final String adrLine, final String postCode) {
final String resultInput = adrLine + postCode;
//do not forget about charset you want to work with
final byte[] inputBytes = resultInput.getBytes(Charset.defaultCharset());
byte[] outputBytes = null;
try {
//feel free to choose the encoding base like MD5, SHA-1, SHA-256
final MessageDigest digest = MessageDigest.getInstance("SHA-256");
outputBytes = digest.digest(inputBytes);
} catch (NoSuchAlgorithmException e) {
//do whatever you want, better throw some exception with error message
}
long digitResult = -1;
if (outputBytes != null) {
digitResult = Long.parseLong(convertByteArrayToHexString(outputBytes).substring(0, 7), 16);
}
return digitResult;
}
//this method also may be useful for you if you decide to use the full result
// or you need the appropriate hex representation
private static String convertByteArrayToHexString(byte[] arrayBytes) {
final StringBuilder stringBuffer = new StringBuilder();
for (byte arrByte: arrayBytes) {
stringBuffer.append(Integer.toString((arrByte & 0xff) + 0x100, 16)
.substring(1));
}
return stringBuffer.toString();
}
I suggest you not to use MD5 and SHA1 because of the collisions which those hash functions can provide.

My idea would be this:
String str = addressLine + postalCode;
UUID uid = UUID.nameUUIDFromBytes(str.getBytes());
return makeItNineDigits(uid);
Where makeItNineDigits is some reduction of the UUID string representation to your liking. :)
This could be uid.ToString().substring(0, 9). Or you could take the two long values getLeastSignificantBits, getMostSignificantBits and create a 9-digit value from them.

A simple option might be to just take advantage of the hashing built in to Java....
String generateIdentifier(String postCode, String addressLine) {
long hash = ((postCode.hashCode() & 0xffffffffL) << 14L)
^ (addressLine.hashCode() & 0xffffffffL);
return Long.toString(hash, 36);
}

Related

Java hashcode brute-forcing [duplicate]

Is there any way that I can use a hashcode of a string in java, and recreate that string?
e.g. something like this:
String myNewstring = StringUtils.createFromHashCode("Hello World".hashCode());
if (!myNewstring.equals("Hello World"))
System.out.println("Hmm, something went wrong: " + myNewstring);
I say this, because I must turn a string into an integer value, and reconstruct that string from that integer value.
This is impossible. The hash code for String is lossy; many String values will result in the same hash code. An integer has 32 bit positions and each position has two values. There's no way to map even just the 32-character strings (for instance) (each character having lots of possibilities) into 32 bits without collisions. They just won't fit.
If you want to use arbitrary precision arithmetic (say, BigInteger), then you can just take each character as an integer and concatenate them all together. VoilĂ .
No. Multiple Strings can have the same hash code. In theory you could create all the Strings that have have that hash code, but it would be near infinite.
Impossible I'm afraid. Think about it, a hashcode is a long value i.e. 8 bytes. A string maybe less than this but also could be much longer, you cannot squeeze a longer string into 8 bytes without losing something.
The Java hashcode algorithm sums every 8th byte if I remember correctly so you'd lose 7 out of 8 bytes. If your strings are all very short then you could encode them as an int or a long without losing anything.
For example, "1019744689" and "123926772" both have a hashcode of -1727003481. This proves that for any integer, you might get a different result (i.e. reversehashcode(hashcode(string)) != string).
Let's assume the string consists only of letters, digits and punctuation, so there are about 70 possible characters.
log_70{2^32} = 5.22...
This means for any given integer you will find a 5- or 6-character string with this as its hash code. So, retrieving "Hello World": impossible; but "Hello" might work if you're lucky.
You could do something like this:
char[] chars = "String here".toCharArray();
int[] ints = new int[chars.length];
for (int i = 0; i < chars.length; i++) {
ints[i] = (int)chars[i];
}
Then:
char[] chars = new char[ints.length]
for (int i = 0; i < chars.length; i++) {
chars[i] = (char)ints[i];
}
String final = new String(chars);
I have not actually tested this yet... It is just "concept" code.

How to convert Parse ObjectId (String) to long?

Every object in Parse.com has your own ObjectId, that is a string with 10 char and apparently it is created by this regex: [0-9a-zA-Z]{10}.
Example of ObjectId in Parse:
X12wEq4sFf
Weg243d21s
zwg34GdsWE
I would like to convert this String to Long, because it will save memory and improve searching. (10 chars using UTF-8 has 40 bytes, and 1 long has 8 bytes)
If we calculate the combinations, we can find:
String ObjectId: 62^10 = 839299365868340224 different values;
long: is 2^64 = 18446744073709551616 different values.
So, we can convert these values without losing information. There is a simple way to do it safely? Please, consider any kind of encoding for Chars (UTF-8, UTF-16, etc);
EDIT: I am just thinking in a hard way to solved it. I am asking if there is an easy way.
Your character set is a subset of the commonly-used Base64 encoding, so you could just use that. Java has the Base64 class, no need to roll your own codec for this.
Are you sure this is actually valuable? "because it will save memory and improve searching" seems like an untested assertion; saving a few bytes on the IDs may very well be offset by the added cost of encoding and decoding every time you want to use something.
EDIT: Also, why are you using UTF-8 strings for guaranteed-ascii data? If you represent 10 char IDs as a byte[10], that's just 10 bytes instead of 40 (i.e. much closer to the 8 for a long). And you don't need to do any fancy conversions.
Here's a straightforward solution using 6 bits to store a single character.
public class Converter {
private static final String CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static int convertChar(char c) {
int ret = CHARS.indexOf( c );
if (ret == -1)
throw new IllegalArgumentException( "Invalid character encountered: "+c);
return ret;
}
public static long convert(String s) {
if (s.length() != 10)
throw new IllegalArgumentException( "String length must be 10, was "+s.length() );
long ret = 0;
for (int i = 0; i < s.length(); i++) {
ret = (ret << 6) + convertChar( s.charAt( i ));
}
return ret;
}
}
I'll leave the conversion from long to String for you to implement, it's basically the same in reverse.
P.s.: If you really want to save space, don't use Long, it adds nothing compared to the primitive long except overhead.
P.s 2: Also note that you aren't really saving much with this conversion: storing the ASCII characters can be done in 10 bytes, while a long takes up 4. What you save here is mostly the overhead you'd get if you stored those 10 bytes in a byte array.

Standard way to create a hash in Java

The question is about the correct way of creating a hash in Java:
Lets assume I have a positive BigInteger value that I would like to create a hash from. Lets assume that below instance of the messageDigest is a valid instance of (SHA-256)
public static final BigInteger B = new BigInteger("BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58", 16);
byte[] byteArrayBBigInt = B.toByteArray();
this.printArray(byteArrayBBigInt);
messageDigest.reset();
messageDigest.update(byteArrayBBigInt);
byte[] outputBBigInt = messageDigest.digest();
Now I only assume that the code below is correct, as according to the test the hashes I produce match with the one produced by:
http://www.fileformat.info/tool/hash.htm?hex=BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58
However I am not sure why we are doing the step below i.e.
because the returned byte array after the digest() call is signed and in this case it is a negative, I suspect that we do need to convert it to a positive number i.e. we can use a function like that.
public static String byteArrayToHexString(byte[] b) {
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString((b[i] & 0xff) + 0x100, 16).substring(1);
}
return result;
}
thus:
String hex = byteArrayToHexString(outputBBigInt)
BigInteger unsignedBigInteger = new BigInteger(hex, 16);
When I construct a BigInteger from the new hex string and convert it back to byte array then I see that the sign bit, that is most significant bit i.e. the leftmost bit, is set to 0 which means that the number is positive, moreover the whole byte is constructed from zeros ( 00000000 ).
My question is: Is there any RFC that describes why do we need to convert the hash always to a "positive" unsigned byte array. I mean even if the number produced after the digest call is negative it is still a valid hash, right? thus why do we need that additional procedure. Basically, I am looking for a paper: standard or rfc describing that we need to do so.
A hash consists of an octet string (called a byte array in Java). How you convert it to or from a large number (a BigInteger in Java) is completely out of the scope for cryptographic hash algorithms. So no, there is no RFC to describe it as there is (usually) no reason to treat a hash as a number. In that sense a cryptographic hash is rather different from Object.hashCode().
That you can only treat hexadecimals as unsigned is a bit of an issue, but if you really want to then you can first convert it back to a byte array, and then perform new BigInteger(result). That constructor does threat the encoding within result as signed. Note that in protocols it is often not needed to convert back and forth to hexadecimals; hexadecimals are mainly for human consumption, a computer is fine with bytes.

Encrypt 4 digits pin with hex enumeration system, resulting 16 chars string

im looking for a way to encrypt a four digits password and as a result get a 16chars string.
So far ive got 64chars String using this
public static String digestHex(String text) {
StringBuilder stringBuffer = new StringBuilder();
try {
MessageDigest digest = MessageDigest.getInstance("SHA-256");// SHA-256
digest.reset();
for (byte b : digest.digest(text.getBytes("UTF-8"))) {
stringBuffer.append(Integer.toHexString((int) (b & 0xff)));
}
} catch (NoSuchAlgorithmException | UnsupportedEncodingException e) {
e.printStackTrace();
}
return stringBuffer.toString();
}
being text = 1234
the resulting String is = 3ac674216f3e15c761ee1a5e255f067953623c8b388b4459e13f978d7c846f4 Using Java btw :D
Any "encryption" scheme where you are encrypting a 4 digit number without an additional key is effectively a lookup scheme. Since there are only 10,000 unique "inputs" to the lookup scheme, it will be relatively easy to crack your encryption ... by trying all of the inputs.
In other words, the security of your encrypted PIN numbers is an illusion ... unless you do something like "seeding" the input before you encrypt it.
The security of you scheme aside - there are easier ways to do this:
// Your original - with the horrible exception hiding removed.
public static String digestHex(String text) throws NoSuchAlgorithmException, UnsupportedEncodingException {
StringBuilder stringBuffer = new StringBuilder();
MessageDigest digest = MessageDigest.getInstance("SHA-256");// SHA-256
digest.reset();
for (byte b : digest.digest(text.getBytes("UTF-8"))) {
stringBuffer.append(Integer.toHexString((int) (b & 0xff)));
}
return stringBuffer.toString();
}
// Uses BigInteger.
public static String digest(String text, int base) throws NoSuchAlgorithmException, UnsupportedEncodingException {
MessageDigest digest = MessageDigest.getInstance("SHA-256");// SHA-256
digest.reset();
BigInteger b = new BigInteger(digest.digest(text.getBytes("UTF-8")));
return b.toString(base);
}
public void test() throws NoSuchAlgorithmException, UnsupportedEncodingException {
System.out.println("Hex:" + digestHex("1234"));
System.out.println("Hex:" + digest("1234", 16));
System.out.println("36:" + digest("1234", 36));
System.out.println("Max:" + digest("1234", Character.MAX_RADIX));
}
This allows you to generate the string in a higher base - thus shortening the number but sadly you still do not achieve 16.
I would suggest you use one of the simple CRC algorithms if you are really instistent on 16 characters. Alternatively you could try base 62 or base 64 - there are many implementations out there.
You are using SHA-256. This algorithm generates 32 bytes long messages (256 bits, more details here).
This is why you obtain a 64 bytes long hex string as an output: Integer.toHexString((int) (b & 0xff)) converts each single b byte of the MessageDigest into a 2 bytes long hex String representation.
To obtain a 16 bytes long String, you can either use MD5 (16 bytes output, 32 if converted in hex), derive that string or use a completely different way such as actually using encryption (using javax.crypto.Cipher).
I'd need to know what you would like to to to elaborate further, knowing that using MessageDigestis actually hashing, not encryption, while in the first line of your post you are speaking of encryption. One of the difference resides in the fact that hash codes are not designed to be reversed but compared, unlike encryption which is reversible. See this interesting SO post on this.

Length of Strings regarding XOR operation for byte array

I am creating an encryption algorithm and is to XOR two strings. While I know how to XOR the two strings the problem is the length. I have two byte arrays one for the plain text which is of a variable size and then the key which is of 56 bytes lets say. What I want to know is what is the correct method of XORing the two strings. Concatenate them into one String in Binary and XOR the two values? Have each byte array position XOR a concatenated Binary value of the key and such. Any help is greatly appreciated.
Regards,
Milinda
To encode just move through the array of bytes from the plain text, repeating the key as necessary with the mod % operator. Be sure to use the same character set at both ends.
Conceptually we're repeating the key like this, ignoring encoding.
hello world, there are sheep
secretsecretsecretsecretsecr
Encrypt
String plainText = "hello world, there are sheep";
Charset charSet = Charset.forName("UTF-8");
byte[] plainBytes = plainText.getBytes(charSet);
String key = "secret";
byte[] keyBytes = key.getBytes(charSet);
byte[] cipherBytes = new byte[plainBytes.length];
for (int i = 0; i < plainBytes.length; i++) {
cipherBytes[i] = (byte) (plainBytes[i] ^ keyBytes[i
% keyBytes.length]);
}
String cipherText = new String(cipherBytes, charSet);
System.out.println(cipherText);
To decrypt just reverse the process.
// decode
for (int i = 0; i < cipherBytes.length; i++) {
plainBytes[i] = (byte) (cipherBytes[i] ^ keyBytes[i
% keyBytes.length]);
}
plainText = new String(plainBytes, charSet); // <= make sure same charset both ends
System.out.println(plainText);
(As noted in comments, you shouldn't use this for anything real. Proper cryptography is incredibly hard to do properly from scratch - don't do it yourself, use existing implementations.)
There's no such concept as "XOR" when it comes to strings, really. XOR specifies the result given two bits, and text isn't made up of bits - it's made up of characters.
Now you could just take the Unicode representation of each character (an integer) and XOR those integers together - but the result may well be a sequence of integers which is not a valid Unicode representation of any valid string.
It's not clear that you're even thinking in the right way to start with - you talk about having strings, but also having 56 bytes. You may have an encoded representation of a string (e.g. the result of converting a string to UTF-8) but that's not the same thing.
If you've got two byte arrays, you can easily XOR those together - and perhaps cycle back to the start of one of them if it's shorter than the other, so that the result is always the same length as the longer array. However, even if both inputs are (say) UTF-8 encoded text, the result often won't be valid UTF-8 encoded text. If you must have the result in text form, I'd suggest using Base64 at that point - there's a public domain base64 encoder which has a simple API.

Categories