Shortening java UUID while preserving the uniqueness

Shortening java UUID while preserving the uniqueness - java

I'm trying to make the java UUID shorter while preserving the same uniqueness as the UUID has.
I wrote the following code:
public static void main(String[] args) {
UUID uid=UUID.randomUUID();
String shortId=to62System(uid.getMostSignificantBits())+
to62System(uid.getLeastSignificantBits());
System.out.println(shortId);
}
static char[] DIGITS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".toCharArray();
static int RADIX = DIGITS.length;
public static String to62System(long value) {
if (value == 0) {
return "0";
} else {
char[] buf = new char[11];
int charPos = 10;
long i = value;
while (i != 0) {
buf[charPos--] = DIGITS[Math.abs((int) (i % RADIX))];
i /= RADIX;
}
return new String(buf, charPos + 1, (10 - charPos));
}
}
Am I doing it right or did I overlooked something important?

I use org.apache.commons.codec.binary.Base64 to convert a UUID into a url-safe unique string that is 22 characters in length and has the same uniqueness as UUID.
I posted my code on Storing UUID as base64 String

Take a look at FriendlyId library. This library allow to encode UUID to Base62 string (Url62) and back. Uniqueness is achieved and encoded string is shorter.
https://github.com/Devskiller/friendly-id

I believe even once you get it down to 22 characters by changing to base whatever, you can safely truncate a few characters and still be reasonably certain you won't get any collisions, as you probably know the astronomically large numbers involved. LOL loved the first guy's response thinking u were just like grabbing some characters from a standard UUID and calling it a day haha

Related

Why use bit shifting instead of a for loop?

I created the following code to find parity of a binary number (i.e output 1 if the number of 1's in the binary word is odd, output 0 if the number of 1's is even).
public class CalculateParity {
String binaryword;
int totalones = 0;
public CalculateParity(String binaryword) {
this.binaryword = binaryword;
getTotal();
}
public int getTotal() {
for(int i=0; i<binaryword.length(); i++) {
if (binaryword.charAt(i) == '1'){
totalones += 1;
}
}
return totalones;
}
public int calcParity() {
if (totalones % 2 == 1) {
return 1;
}
else {
return 0;
}
}
public static void main(String[] args) {
CalculateParity bin = new CalculateParity("1011101");
System.out.println(bin.calcParity());
}
}
However, all of the solutions I find online almost always deal with using bit shift operators, XORs, unsigned shift operations, etc., like this solution I found in a data structure book:
public static short parity(long x){
short result = 0;
while (x != 0) {
result A=(x&1);
x >>>= 1;
}
return result;
}
Why is this the case? What makes bitwise operators more of a valid/standard solution than the solution I came up with, which is simply iterating through a binary word of type String? Is a bitwise solution more efficient? I appreciate any help!

The code that you have quoted uses a loop as well (i.e., while):
public static short parity(long x){
short result = 9;
while (x != 9) {
result A=(x&1);
x >>>= 1;
}
return result;
}
You need to acknowledge that you are using a string that you know beforehand will be composed of only digits, and conveniently in a binary representation. Naturally, given those constraints, one does not need to use bitwise operations instead one just parsers char-by-char and does the desired computations.
On the other hand, if you receive as a parameter a long, as the method that you have quoted, then it comes in handy to use bitwise operations to go through each bit (at a time) in a number and perform the desired computation.
One could also convert the long into a string and apply the same logic code-wise that you have applied, but first, one would have to convert that long into binary. However, that approach would add extra unnecessary steps, more code, and would be performance-wise worse. Probably, the same applies vice-versa if you have a String with your constraints. Nevertheless, a String is not a number, even if it is only composed of digits, which makes using a type that represents a number (e.g., long) even a more desirable approach.
Another thing that you are missing is that you did some of the heavy lifting by converting already a number to binary, and encoded into a String new CalculateParity("1011101");. So you kind of jump a step there. Now try to use your approach, but this time using "93" and find the parity.

If you want know if a String is even. I think this method below is better.
If you convert a String too
long which the length of the String is bigger than 64. there will a error occur.
both of the method you
mention is O(n) performance.It will not perform big different. but
the shift method is more precise and the clock of the cpu use will a little bit less.
private static boolean isEven(String s){
char[] chars = s.toCharArray();
int i = 0;
for(char c : chars){
i ^= c;
}
return i == 0;
}

You use a string based method for a string input. Good choice.
The code you quote uses an integer-based method for an integer input. An equally good choice.

How to automatically create a string with 1s in Java

I am wondering if there is a shorter way in Java to create a String with a number of 1s. I would like to create a string like 111, or 1111 or 11111 without using loops or recursive calls.
For example, in Perl code, something like '0b' . ('1' x $numberOf1s) would return 11 (if numberOf1s) is 2 and 111 (if numberOf1s) is 3
Thanks

StringUtils provided by Apache commons jar has many static methods which can be used.For example,StringUtils has a method repeat(String str,int repeat).Example
String str = StringUtils.repeat("1",5);
See the doc here StringUtils's repeat method

If you have a maximum number of 1s that you would want to generate, you can do this:
private static final String ALL_ONES = "11111111111111111111111111"; // max # of 1s
public String getNOnes(int n) {
// perhaps should do some error checking here
return ALL_ONES.substring(0, n);
}
If you have no maximum in mind, you could use #f1sh's answer:
public String getNOnes(int n) {
char [] ones = new char[n];
Arrays.fill(ones, '1');
return new String(ones);
}
But the entire problem seems to have ridiculous requirements.

In short: no.
You can use something like Arrays.fill(char[] arr, char value) to fill up a whole char array and then make a String out of it, but internally it uses a for loop anyways.
Also: what requirement would disallow a for loop?

You can try with something like
new String(new char[5]).replace('\0','1')
but replace iterates over all characters in char[] which are by default set to '\0'.

(Any power of two) - 1 converted to binary is a string of all 1s.
For Example
4-1 = 3 = binary 11
8-1 = 7 = binary 111
16-1= 15 = binary 1111 and so on.
I used this fact to write the following code...
BigInteger will produce any size of string but will be a bit slow. If you need a string of size below 64 you can use long in the same logic.
private static String stringOf1s(int size)
{
BigInteger powerOfTwo = BigInteger.TWO.pow(size);
return powerOfTwo.subtract(BigInteger.ONE).toString(2);
}
private static String stringOfOnes(int size)
{
long powerOfTwo = (long) Math.pow(2,size);
return Long.toBinaryString(powerOfTwo-1);
}

How to generate a real unique char only string in java

Is there a way to generate a unique surrogate string like UUID.randomUUID() but containing characters only (means no numbers)? The string is stored in multiple databases on different hosts and has to be system wide unique (even if I generate two keys at the same time - i.e. by threads).

Apache Commons Lang 3 has a class named RandomStringUtils that can be used to obtain random alphabetic strings:
int stringSize = 8; // just choose the size better suits you
String randomString = RandomStringUtils.randomAlphabetic(stringSize);
If you need this to be unique even when more than one thread is running then you will need a way to synchronize them (maybe using synchronized blocks and a while loop) and checking in that database that no previous string equal to the generated one exists.
EDIT - A rough example
Set<String> previousKeys = new HashSet<String>();
public String generateKey(int stringSize) {
String randomString = "";
synchronized(previousKeys) {
do {
randomString = RandomStringUtils.randomAlphabetic(stringSize);
} while (randomString.length() == 0 || !previousKeys.add(randomString));
}
return randomString;
}

Just wrote this - random string of upper case characters:
package dan;
import java.util.Random;
public class RandText {
/**
* #param args
*/
public static void main(String[] args) {
String s = getRandomText(100);
System.out.println(s);
}
public static String getRandomText(int len) {
StringBuilder b = new StringBuilder();
Random r = new Random();
for (int i = 0; i<len;i++) {
char c = (char)(65+r.nextInt(25));
b.append(c);
}
return b.toString();
}
}

if you are looking to generate random characters, use a random int generator, and have it generate a random number 65 - 122. keep in mind that 91-96 are not A-Z or a-z in ascii, though. Keep in mind you will want to convert this int to a char.
This can be helpful: http://www.asciitable.com/

String id = UUID.randomUUID().toString().replaceAll("-", "");
id.replaceAll("0","zero");
// [...]
id.replaceAll("9","nine");
I am feeling bad about this approach.

Grab your UUID, strip out the "-"s. Convert char '1' -> 'Z', '2' -> 'Y' etc.

A particular type of hash on a String concatenation

I need a specialised hash function h(X,Y) in Java with the following properties.
X and Y are strings.
h(X,Y) = h(Y,X).
X and Y are arbitrary length strings and there is no length limit on the result of h(X,Y) either.
h(X,Y) and h(Y,X) should not collide with h(A,B) = h(B,A) if X is not equal to A and Y is not equal to B.
h() does not need to be a secure hash function unless it is necessary to meet the aforementioned requirements.
Fairly high-performant but this is an open-ended criterion.
In my mind, I see requirements 2 and 4 slightly contradictory but perhaps I am worrying too much.
At the moment, what I am doing in Java is the following:
public static BigInteger hashStringConcatenation(String str1, String str2) {
BigInteger bA = BigInteger.ZERO;
BigInteger bB = BigInteger.ZERO;
for(int i=0; i<str1.length(); i++) {
bA = bA.add(BigInteger.valueOf(127L).pow(i+1).multiply(BigInteger.valueOf(str1.codePointAt(i))));
}
for(int i=0; i<str2.length(); i++) {
bB = bB.add(BigInteger.valueOf(127L).pow(i+1).multiply(BigInteger.valueOf(str2.codePointAt(i))));
}
return bA.multiply(bB);
}
I think this is hideous but that's why I am looking for nicer solutions. Thanks.
Forgot to mention that on a 2.53GHz dual core Macbook Pro with 8GB RAM and Java 1.6 on OS X 10.7, the hash function takes about 270 micro-seconds for two 8 (ASCII) character Strings. I suspect this would be higher with the increase in the String size, or if Unicode characters are used.

why not just add their hashCode's together?

Today I've decided to add my solution for this hash function problem. It was not tested very good and I did not measure its performance, so you can feed me back with your comments. My solution is situated below:
public abstract class HashUtil {
//determines that we want hash, that has size of 32 integers ( or 32*32 bits )
private static final int hash_size = 32;
//some constants that can be changed in sake of avoiding collisions
private static final BigInteger INITIAL_HASH = BigInteger.valueOf(7);
private static final BigInteger HASH_MULTIPLIER = BigInteger.valueOf(31);
private static final BigInteger HASH_DIVIDER = BigInteger.valueOf(2).pow(32*hash_size);
public static BigInteger computeHash(String arg){
BigInteger hash = new BigInteger(INITIAL_HASH.toByteArray());
for (int i=0;i<arg.length()/hash_size+1;i++){
int[] tmp = new int[hash_size];
for(int j=0;j<Math.min(arg.length()-32*i,32);j++){
tmp[i]=arg.codePointAt(i*hash_size+j);
}
hash = hash.multiply(HASH_MULTIPLIER).add(new BigInteger(convert(tmp)).abs()).mod(HASH_DIVIDER);
}
//to reduce result space to something meaningful
return hash;
}
public static BigInteger computeHash(String arg1,String arg2){
//here I don't forgot about reducing of result space
return computeHash(arg1).add(computeHash(arg2)).mod(HASH_DIVIDER);
}
private static byte[] convert(int[] arg){
ByteBuffer byteBuffer = ByteBuffer.allocate(arg.length*4);
IntBuffer intBuffer = byteBuffer.asIntBuffer();
intBuffer.put(arg);
return byteBuffer.array();
}
public static void main(String[] args){
String firstString="dslkjfaklsjdkfajsldfjaldsjflaksjdfklajsdlfjaslfj",secondString="unejrng43hti9uhg9rhe3gh9rugh3u94htfeiuwho894rhgfu";
System.out.println(computeHash(firstString,secondString).equals(computeHash(secondString,firstString)));
}
}
I suppose that my solution should not produce any collision for single string with length less then 32 (to be more precise, for single string with length less then hash_size variable value). Also it is not very easy to find collisions (as I think). To regulate hash conflicts probability for your particular task you can try another prime numbers instead of 7 and 31 in INITIAL_HASH and HASH_MULTIPLIER variables. What do you think about it? Is it good enought for you?
P.S. I think that it will be much better if you'll try much bigger prime numbers.

3) h(X,Y) and h(Y,X) should not collide with h(A,B) = h(B,A) if X is not equal to A and Y is not equal to B.
I think that this requirement rules any hash function that produces numbers that are smaller (on average) than the original Strings.
Any requirement of no collisions runs into the roadblock of the Pigeonhole Principle.

How strict are you being with requirement 4? If the answer is 'not completely strict' then you could just concatenate the two strings putting the smaller one first (this would result in a collision for h('A', 'B') and h('AB', ''))
If there are any characters which you are sure would never appear in the string values then you could use a single instance as a separator, which would fix the collision above.

From 4-th point we can get that h(x,"") should never collide with h(y,"") until x.equals(y) is true. So, you have no size limits on what produce h(x,y), cause it shoud produce unique result for each unique x. But there are infinite number of unique strings. This is not a correct hash function, I think.

Building on String#hashCode, this is not a perfect hash function, so it does not fulfill condition 4.
public static long hashStringConcatenation(String str1, String str2) {
int h1 = str1.hashCode();
int h2 = str2.hashCode();
if ( h1 < h2 )
{
return ((long)h1)<<32 & h2;
}
else
{
return ((long)h2)<<32 & h1;
}
}

Okay, #gkuzmin's comment made me think why I am doing the powers of 127. So, here's a slightly simpler version of the code. The changes are as follows:
I am no longer doing the powers of 127 but actually concatenating the codePointAt numbers as strings, converting the result into BigInteger for each input string and then adding the two BigIntegers.
To compact the answer, I am doing a mod 2^1024 on the final answer.
Speed is not any better (perhaps a little worse!) but then I think the way I am measuring the speed is not right because it probably also measures the time taken for the function call.
Here's the modified code. Does this fulfill all conditions, albeit 4 for such unfortunate cases where repetitions may occur over the 2^1024 result space?
public static BigInteger hashStringConcatenation(String str1, String str2) {
if(str1==null || str1.isEmpty() || str2 == null || str2.isEmpty()) {
return null;
}
BigInteger bA, bB;
String codeA = "", codeB = "";
for(int i=0; i<str1.length(); i++) {
codeA += str1.codePointAt(i);
}
for(int i=0; i<str2.length(); i++) {
codeB += str2.codePointAt(i);
}
bA = new BigInteger(codeA);
bB = new BigInteger(codeB);
return bA.add(bB).mod(BigInteger.valueOf(2).pow(1024));
}

I've decided to add another answer because #Anirban Basu have proposed another solution. So, I do not know how to provide link to his post and if somebody know how to do it - correct me.
Anirban's solution looks like this:
public static BigInteger hashStringConcatenation(String str1, String str2) {
if(str1==null || str1.isEmpty() || str2 == null || str2.isEmpty()) {
return null;
}
BigInteger bA, bB;
String codeA = "", codeB = "";
for(int i=0; i<str1.length(); i++) {
codeA += str1.codePointAt(i);
}
for(int i=0; i<str2.length(); i++) {
codeB += str2.codePointAt(i);
}
bA = new BigInteger(codeA);
bB = new BigInteger(codeB);
return bA.add(bB).mod(BigInteger.valueOf(2).pow(1024));
}
Your new solution now looks like a hash function, but it still has some problems. I suggest that you should think about this:
Maybe it will be better to throw NullPointerException or IllegalArgumentException when null was used as function argument? Are you sure, that you do not want to compute hash for empty strings?
To concatenate large amount of strings it is better to use StringBuffer instead of + operator. Use of this class will produce huge positive impact on your code performance.
Your hash function is not very secure - it is realy easy to compute strings, which will produce conflict.
You can try this code to check algorithm that can can demonstrate your hash function collision.
public static void main(String[] args){
String firstString=new StringBuffer().append((char)11).append((char)111).toString();
String secondString=new StringBuffer().append((char)111).append((char)11).toString();
BigInteger hash1 = hashStringConcatenation(firstString,"arbitrary_string");
BigInteger hash2 = hashStringConcatenation(secondString,"arbitrary_string");
System.out.println("Is hash equal: "+hash1.equals(hash2));
System.out.println("Conflicted values: {"+firstString+"},{"+secondString+"}");
}
So, It is realy easy to break your hash function. Moreover, it is good that it has 2^1024 result space, but a lot of real life conflicts for your implementation lies in very close and simple strings.
P.S. I think that you should read something about already developed hash algorithms, hash function that failed in a real life (like java String class hash function which computed hash using only 16 first characters in the past) and try to examine your solutions according to your requirements and real life. At least you can try to find hash conflict manually and if you succeed then your solution most likely already has some problems.

Here's my changed code according to #gkuzmin's suggestion:
public static BigInteger hashStringConcatenation(String str1, String str2) {
BigInteger bA = BigInteger.ZERO, bB = BigInteger.ZERO;
StringBuffer codeA = new StringBuffer(), codeB = new StringBuffer();
for(int i=0; i<str1.length(); i++) {
codeA.append(str1.codePointAt(i));
}
for(int i=0; i<str2.length(); i++) {
codeB.append(str2.codePointAt(i));
}
bA = new BigInteger(codeA.toString());
bB = new BigInteger(codeB.toString());
return bA.multiply(bB).mod(BigInteger.valueOf(2).pow(1024));
}
Note that in the result, I now multiply bA with bB instead of adding.
Also, added #gkuzmin's suggested test function:
public static void breakTest2() {
String firstString=new StringBuffer().append((char)11).append((char)111).toString();
String secondString=new StringBuffer().append((char)111).append((char)11).toString();
BigInteger hash1 = hashStringConcatenation(firstString,"arbitrary_string");
BigInteger hash2 = hashStringConcatenation(secondString,"arbitrary_string");
System.out.println("Is hash equal: "+hash1.equals(hash2));
System.out.println("Conflicted values: {"+firstString+"},{"+secondString+"}");
}
and another test with strings having only numeric values:
public static void breakTest1() {
Hashtable<String,String> seenTable = new Hashtable<String,String>();
for (int i=0; i<100; i++) {
for(int j=i+1; j<100; j++) {
String hash = hashStringConcatenation(""+i, ""+j).toString();
if(seenTable.contains(hash)) {
System.out.println("Duplication for " + seenTable.get(hash) + " with " + i + "-" + j);
}
else {
seenTable.put(hash, i+"-"+j);
}
}
}
}
The code runs. Of course, it is not an exhaustive check, but the breakTest1() function does not have any issues. #gkuzmin's function displays the following:
Is hash equal: true
Conflicted values: { o},{o }
Why do the two strings produce the same hash? Because they are effectively working with strings '11111arbitrary_string' in both cases. This is a problem.

How about the slightly modified function now?
public static BigInteger hashStringConcatenation(String str1, String str2) {
BigInteger bA = BigInteger.ZERO, bB = BigInteger.ZERO;
StringBuffer codeA = new StringBuffer(), codeB = new StringBuffer();
for(int i=0; i<str1.length(); i++) {
codeA.append(str1.codePointAt(i)).append("0");
}
for(int i=0; i<str2.length(); i++) {
codeB.append(str2.codePointAt(i)).append("0");
}
bA = new BigInteger(codeA.toString());
bB = new BigInteger(codeB.toString());
return bA.multiply(bB).mod(BigInteger.valueOf(2).pow(1024));
}
Here, we add a separator character "0" between each character codes, so the combination for characters 11 111 and 111 11 will no longer confuse the function because the concatenation will produce 110111 and 111011. However, it still will not break requirement 2 of the original question.
So does this now solve the problem albeit within the limits of the 2^1024 range?

How to generate a random alpha-numeric string

I've been looking for a simple Java algorithm to generate a pseudo-random alpha-numeric string. In my situation it would be used as a unique session/key identifier that would "likely" be unique over 500K+ generation (my needs don't really require anything much more sophisticated).
Ideally, I would be able to specify a length depending on my uniqueness needs. For example, a generated string of length 12 might look something like "AEYGF7K0DM1X".

Algorithm
To generate a random string, concatenate characters drawn randomly from the set of acceptable symbols until the string reaches the desired length.
Implementation
Here's some fairly simple and very flexible code for generating random identifiers. Read the information that follows for important application notes.
public class RandomString {
/**
* Generate a random string.
*/
public String nextString() {
for (int idx = 0; idx < buf.length; ++idx)
buf[idx] = symbols[random.nextInt(symbols.length)];
return new String(buf);
}
public static final String upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static final String lower = upper.toLowerCase(Locale.ROOT);
public static final String digits = "0123456789";
public static final String alphanum = upper + lower + digits;
private final Random random;
private final char[] symbols;
private final char[] buf;
public RandomString(int length, Random random, String symbols) {
if (length < 1) throw new IllegalArgumentException();
if (symbols.length() < 2) throw new IllegalArgumentException();
this.random = Objects.requireNonNull(random);
this.symbols = symbols.toCharArray();
this.buf = new char[length];
}
/**
* Create an alphanumeric string generator.
*/
public RandomString(int length, Random random) {
this(length, random, alphanum);
}
/**
* Create an alphanumeric strings from a secure generator.
*/
public RandomString(int length) {
this(length, new SecureRandom());
}
/**
* Create session identifiers.
*/
public RandomString() {
this(21);
}
}
Usage examples
Create an insecure generator for 8-character identifiers:
RandomString gen = new RandomString(8, ThreadLocalRandom.current());
Create a secure generator for session identifiers:
RandomString session = new RandomString();
Create a generator with easy-to-read codes for printing. The strings are longer than full alphanumeric strings to compensate for using fewer symbols:
String easy = RandomString.digits + "ACEFGHJKLMNPQRUVWXYabcdefhijkprstuvwx";
RandomString tickets = new RandomString(23, new SecureRandom(), easy);
Use as session identifiers
Generating session identifiers that are likely to be unique is not good enough, or you could just use a simple counter. Attackers hijack sessions when predictable identifiers are used.
There is tension between length and security. Shorter identifiers are easier to guess, because there are fewer possibilities. But longer identifiers consume more storage and bandwidth. A larger set of symbols helps, but might cause encoding problems if identifiers are included in URLs or re-entered by hand.
The underlying source of randomness, or entropy, for session identifiers should come from a random number generator designed for cryptography. However, initializing these generators can sometimes be computationally expensive or slow, so effort should be made to re-use them when possible.
Use as object identifiers
Not every application requires security. Random assignment can be an efficient way for multiple entities to generate identifiers in a shared space without any coordination or partitioning. Coordination can be slow, especially in a clustered or distributed environment, and splitting up a space causes problems when entities end up with shares that are too small or too big.
Identifiers generated without taking measures to make them unpredictable should be protected by other means if an attacker might be able to view and manipulate them, as happens in most web applications. There should be a separate authorization system that protects objects whose identifier can be guessed by an attacker without access permission.
Care must be also be taken to use identifiers that are long enough to make collisions unlikely given the anticipated total number of identifiers. This is referred to as "the birthday paradox." The probability of a collision, p, is approximately n2/(2qx), where n is the number of identifiers actually generated, q is the number of distinct symbols in the alphabet, and x is the length of the identifiers. This should be a very small number, like 2‑50 or less.
Working this out shows that the chance of collision among 500k 15-character identifiers is about 2‑52, which is probably less likely than undetected errors from cosmic rays, etc.
Comparison with UUIDs
According to their specification, UUIDs are not designed to be unpredictable, and should not be used as session identifiers.
UUIDs in their standard format take a lot of space: 36 characters for only 122 bits of entropy. (Not all bits of a "random" UUID are selected randomly.) A randomly chosen alphanumeric string packs more entropy in just 21 characters.
UUIDs are not flexible; they have a standardized structure and layout. This is their chief virtue as well as their main weakness. When collaborating with an outside party, the standardization offered by UUIDs may be helpful. For purely internal use, they can be inefficient.

Java supplies a way of doing this directly. If you don't want the dashes, they are easy to strip out. Just use uuid.replace("-", "")
import java.util.UUID;
public class randomStringGenerator {
public static void main(String[] args) {
System.out.println(generateString());
}
public static String generateString() {
String uuid = UUID.randomUUID().toString();
return "uuid = " + uuid;
}
}
Output
uuid = 2d7428a6-b58c-4008-8575-f05549f16316

static final String AB = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
static SecureRandom rnd = new SecureRandom();
String randomString(int len){
StringBuilder sb = new StringBuilder(len);
for(int i = 0; i < len; i++)
sb.append(AB.charAt(rnd.nextInt(AB.length())));
return sb.toString();
}

If you're happy to use Apache classes, you could use org.apache.commons.text.RandomStringGenerator (Apache Commons Text).
Example:
RandomStringGenerator randomStringGenerator =
new RandomStringGenerator.Builder()
.withinRange('0', 'z')
.filteredBy(CharacterPredicates.LETTERS, CharacterPredicates.DIGITS)
.build();
randomStringGenerator.generate(12); // toUpperCase() if you want
Since Apache Commons Lang 3.6, RandomStringUtils is deprecated.

You can use an Apache Commons library for this, RandomStringUtils:
RandomStringUtils.randomAlphanumeric(20).toUpperCase();

In one line:
Long.toHexString(Double.doubleToLongBits(Math.random()));
Source: Java - generating a random string

This is easily achievable without any external libraries.
1. Cryptographic Pseudo Random Data Generation (PRNG)
First you need a cryptographic PRNG. Java has SecureRandom for that and typically uses the best entropy source on the machine (e.g. /dev/random). Read more here.
SecureRandom rnd = new SecureRandom();
byte[] token = new byte[byteLength];
rnd.nextBytes(token);
Note: SecureRandom is the slowest, but most secure way in Java of generating random bytes. I do however recommend not considering performance here since it usually has no real impact on your application unless you have to generate millions of tokens per second.
2. Required Space of Possible Values
Next you have to decide "how unique" your token needs to be. The whole and only point of considering entropy is to make sure that the system can resist brute force attacks: the space of possible values must be so large that any attacker could only try a negligible proportion of the values in non-ludicrous time1.
Unique identifiers such as random UUID have 122 bit of entropy (i.e., 2^122 = 5.3x10^36) - the chance of collision is "*(...) for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated2". We will choose 128 bits since it fits exactly into 16 bytes and is seen as highly sufficient for being unique for basically every, but the most extreme, use cases and you don't have to think about duplicates. Here is a simple comparison table of entropy including simple analysis of the birthday problem.
For simple requirements, 8 or 12 byte length might suffice, but with 16 bytes you are on the "safe side".
And that's basically it. The last thing is to think about encoding so it can be represented as a printable text (read, a String).
3. Binary to Text Encoding
Typical encodings include:
Base64 every character encodes 6 bit, creating a 33% overhead. Fortunately there are standard implementations in Java 8+ and Android. With older Java you can use any of the numerous third-party libraries. If you want your tokens to be URL safe use the URL-safe version of RFC4648 (which usually is supported by most implementations). Example encoding 16 bytes with padding: XfJhfv3C0P6ag7y9VQxSbw==
Base32 every character encodes 5 bit, creating a 40% overhead. This will use A-Z and 2-7, making it reasonably space efficient while being case-insensitive alpha-numeric. There isn't any standard implementation in the JDK. Example encoding 16 bytes without padding: WUPIL5DQTZGMF4D3NX5L7LNFOY
Base16 (hexadecimal) every character encodes four bit, requiring two characters per byte (i.e., 16 bytes create a string of length 32). Therefore hexadecimal is less space efficient than Base32, but it is safe to use in most cases (URL) since it only uses 0-9 and A to F. Example encoding 16 bytes: 4fa3dd0f57cb3bf331441ed285b27735. See a Stack Overflow discussion about converting to hexadecimal here.
Additional encodings like Base85 and the exotic Base122 exist with better/worse space efficiency. You can create your own encoding (which basically most answers in this thread do), but I would advise against it, if you don't have very specific requirements. See more encoding schemes in the Wikipedia article.
4. Summary and Example
Use SecureRandom
Use at least 16 bytes (2^128) of possible values
Encode according to your requirements (usually hex or base32 if you need it to be alpha-numeric)
Don't
... use your home brew encoding: better maintainable and readable for others if they see what standard encoding you use instead of weird for loops creating characters at a time.
... use UUID: it has no guarantees on randomness; you are wasting 6 bits of entropy and have a verbose string representation
Example: Hexadecimal Token Generator
public static String generateRandomHexToken(int byteLength) {
SecureRandom secureRandom = new SecureRandom();
byte[] token = new byte[byteLength];
secureRandom.nextBytes(token);
return new BigInteger(1, token).toString(16); // Hexadecimal encoding
}
//generateRandomHexToken(16) -> 2189df7475e96aa3982dbeab266497cd
Example: Base64 Token Generator (URL Safe)
public static String generateRandomBase64Token(int byteLength) {
SecureRandom secureRandom = new SecureRandom();
byte[] token = new byte[byteLength];
secureRandom.nextBytes(token);
return Base64.getUrlEncoder().withoutPadding().encodeToString(token); //base64 encoding
}
//generateRandomBase64Token(16) -> EEcCCAYuUcQk7IuzdaPzrg
Example: Java CLI Tool
If you want a ready-to-use CLI tool you may use dice:
Example: Related issue - Protect Your Current Ids
If you already have an id you can use (e.g., a synthetic long in your entity), but don't want to publish the internal value, you can use this library to encrypt it and obfuscate it: https://github.com/patrickfav/id-mask
IdMask<Long> idMask = IdMasks.forLongIds(Config.builder(key).build());
String maskedId = idMask.mask(id);
// Example: NPSBolhMyabUBdTyanrbqT8
long originalId = idMask.unmask(maskedId);

Using Dollar should be as simple as:
// "0123456789" + "ABCDE...Z"
String validCharacters = $('0', '9').join() + $('A', 'Z').join();
String randomString(int length) {
return $(validCharacters).shuffle().slice(length).toString();
}
#Test
public void buildFiveRandomStrings() {
for (int i : $(5)) {
System.out.println(randomString(12));
}
}
It outputs something like this:
DKL1SBH9UJWC
JH7P0IT21EA5
5DTI72EO6SFU
HQUMJTEBNF7Y
1HCR6SKYWGT7

Here it is in Java:
import static java.lang.Math.round;
import static java.lang.Math.random;
import static java.lang.Math.pow;
import static java.lang.Math.abs;
import static java.lang.Math.min;
import static org.apache.commons.lang.StringUtils.leftPad
public class RandomAlphaNum {
public static String gen(int length) {
StringBuffer sb = new StringBuffer();
for (int i = length; i > 0; i -= 12) {
int n = min(12, abs(i));
sb.append(leftPad(Long.toString(round(random() * pow(36, n)), 36), n, '0'));
}
return sb.toString();
}
}
Here's a sample run:
scala> RandomAlphaNum.gen(42)
res3: java.lang.String = uja6snx21bswf9t89s00bxssu8g6qlu16ffzqaxxoy

A short and easy solution, but it uses only lowercase and numerics:
Random r = new java.util.Random ();
String s = Long.toString (r.nextLong () & Long.MAX_VALUE, 36);
The size is about 12 digits to base 36 and can't be improved further, that way. Of course you can append multiple instances.

Surprising, no one here has suggested it, but:
import java.util.UUID
UUID.randomUUID().toString();
Easy.
The benefit of this is UUIDs are nice, long, and guaranteed to be almost impossible to collide.
Wikipedia has a good explanation of it:
" ...only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%."
The first four bits are the version type and two for the variant, so you get 122 bits of random. So if you want to, you can truncate from the end to reduce the size of the UUID. It's not recommended, but you still have loads of randomness, enough for your 500k records easy.

An alternative in Java 8 is:
static final Random random = new Random(); // Or SecureRandom
static final int startChar = (int) '!';
static final int endChar = (int) '~';
static String randomString(final int maxLength) {
final int length = random.nextInt(maxLength + 1);
return random.ints(length, startChar, endChar + 1)
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
}

public static String generateSessionKey(int length){
String alphabet =
new String("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); // 9
int n = alphabet.length(); // 10
String result = new String();
Random r = new Random(); // 11
for (int i=0; i<length; i++) // 12
result = result + alphabet.charAt(r.nextInt(n)); //13
return result;
}

import java.util.Random;
public class passGen{
// Version 1.0
private static final String dCase = "abcdefghijklmnopqrstuvwxyz";
private static final String uCase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static final String sChar = "!##$%^&*";
private static final String intChar = "0123456789";
private static Random r = new Random();
private static StringBuilder pass = new StringBuilder();
public static void main (String[] args) {
System.out.println ("Generating pass...");
while (pass.length () != 16){
int rPick = r.nextInt(4);
if (rPick == 0){
int spot = r.nextInt(26);
pass.append(dCase.charAt(spot));
} else if (rPick == 1) {
int spot = r.nextInt(26);
pass.append(uCase.charAt(spot));
} else if (rPick == 2) {
int spot = r.nextInt(8);
pass.append(sChar.charAt(spot));
} else {
int spot = r.nextInt(10);
pass.append(intChar.charAt(spot));
}
}
System.out.println ("Generated Pass: " + pass.toString());
}
}
This just adds the password into the string and... yeah, it works well. Check it out... It is very simple; I wrote it.

Using UUIDs is insecure, because parts of the UUID aren't random at all. The procedure of erickson is very neat, but it does not create strings of the same length. The following snippet should be sufficient:
/*
* The random generator used by this class to create random keys.
* In a holder class to defer initialization until needed.
*/
private static class RandomHolder {
static final Random random = new SecureRandom();
public static String randomKey(int length) {
return String.format("%"+length+"s", new BigInteger(length*5/*base 32,2^5*/, random)
.toString(32)).replace('\u0020', '0');
}
}
Why choose length*5? Let's assume the simple case of a random string of length 1, so one random character. To get a random character containing all digits 0-9 and characters a-z, we would need a random number between 0 and 35 to get one of each character.
BigInteger provides a constructor to generate a random number, uniformly distributed over the range 0 to (2^numBits - 1). Unfortunately 35 is not a number which can be received by 2^numBits - 1.
So we have two options: Either go with 2^5-1=31 or 2^6-1=63. If we would choose 2^6 we would get a lot of "unnecessary" / "longer" numbers. Therefore 2^5 is the better option, even if we lose four characters (w-z). To now generate a string of a certain length, we can simply use a 2^(length*numBits)-1 number. The last problem, if we want a string with a certain length, random could generate a small number, so the length is not met, so we have to pad the string to its required length prepending zeros.

I found this solution that generates a random hex encoded string. The provided unit test seems to hold up to my primary use case. Although, it is slightly more complex than some of the other answers provided.
/**
* Generate a random hex encoded string token of the specified length
*
* #param length
* #return random hex string
*/
public static synchronized String generateUniqueToken(Integer length){
byte random[] = new byte[length];
Random randomGenerator = new Random();
StringBuffer buffer = new StringBuffer();
randomGenerator.nextBytes(random);
for (int j = 0; j < random.length; j++) {
byte b1 = (byte) ((random[j] & 0xf0) >> 4);
byte b2 = (byte) (random[j] & 0x0f);
if (b1 < 10)
buffer.append((char) ('0' + b1));
else
buffer.append((char) ('A' + (b1 - 10)));
if (b2 < 10)
buffer.append((char) ('0' + b2));
else
buffer.append((char) ('A' + (b2 - 10)));
}
return (buffer.toString());
}
#Test
public void testGenerateUniqueToken(){
Set set = new HashSet();
String token = null;
int size = 16;
/* Seems like we should be able to generate 500K tokens
* without a duplicate
*/
for (int i=0; i<500000; i++){
token = Utility.generateUniqueToken(size);
if (token.length() != size * 2){
fail("Incorrect length");
} else if (set.contains(token)) {
fail("Duplicate token generated");
} else{
set.add(token);
}
}
}

Change String characters as per as your requirements.
String is immutable. Here StringBuilder.append is more efficient than string concatenation.
public static String getRandomString(int length) {
final String characters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJLMNOPQRSTUVWXYZ1234567890!##$%^&*()_+";
StringBuilder result = new StringBuilder();
while(length > 0) {
Random rand = new Random();
result.append(characters.charAt(rand.nextInt(characters.length())));
length--;
}
return result.toString();
}

import java.util.Date;
import java.util.Random;
public class RandomGenerator {
private static Random random = new Random((new Date()).getTime());
public static String generateRandomString(int length) {
char[] values = {'a','b','c','d','e','f','g','h','i','j',
'k','l','m','n','o','p','q','r','s','t',
'u','v','w','x','y','z','0','1','2','3',
'4','5','6','7','8','9'};
String out = "";
for (int i=0;i<length;i++) {
int idx=random.nextInt(values.length);
out += values[idx];
}
return out;
}
}

I don't really like any of these answers regarding a "simple" solution :S
I would go for a simple ;), pure Java, one liner (entropy is based on random string length and the given character set):
public String randomString(int length, String characterSet) {
return IntStream.range(0, length).map(i -> new SecureRandom().nextInt(characterSet.length())).mapToObj(randomInt -> characterSet.substring(randomInt, randomInt + 1)).collect(Collectors.joining());
}
#Test
public void buildFiveRandomStrings() {
for (int q = 0; q < 5; q++) {
System.out.println(randomString(10, "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")); // The character set can basically be anything
}
}
Or (a bit more readable old way)
public String randomString(int length, String characterSet) {
StringBuilder sb = new StringBuilder(); // Consider using StringBuffer if needed
for (int i = 0; i < length; i++) {
int randomInt = new SecureRandom().nextInt(characterSet.length());
sb.append(characterSet.substring(randomInt, randomInt + 1));
}
return sb.toString();
}
#Test
public void buildFiveRandomStrings() {
for (int q = 0; q < 5; q++) {
System.out.println(randomString(10, "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")); // The character set can basically be anything
}
}
But on the other hand you could also go with UUID which has a pretty good entropy:
UUID.randomUUID().toString().replace("-", "")

I'm using a library from Apache Commons to generate an alphanumeric string:
import org.apache.commons.lang3.RandomStringUtils;
String keyLength = 20;
RandomStringUtils.randomAlphanumeric(keylength);
It's fast and simple!

You mention "simple", but just in case anyone else is looking for something that meets more stringent security requirements, you might want to take a look at jpwgen. jpwgen is modeled after pwgen in Unix, and is very configurable.

import java.util.*;
import javax.swing.*;
public class alphanumeric {
public static void main(String args[]) {
String nval, lenval;
int n, len;
nval = JOptionPane.showInputDialog("Enter number of codes you require: ");
n = Integer.parseInt(nval);
lenval = JOptionPane.showInputDialog("Enter code length you require: ");
len = Integer.parseInt(lenval);
find(n, len);
}
public static void find(int n, int length) {
String str1 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
StringBuilder sb = new StringBuilder(length);
Random r = new Random();
System.out.println("\n\t Unique codes are \n\n");
for(int i=0; i<n; i++) {
for(int j=0; j<length; j++) {
sb.append(str1.charAt(r.nextInt(str1.length())));
}
System.out.println(" " + sb.toString());
sb.delete(0, length);
}
}
}

Here is the one-liner by abacus-common:
String.valueOf(CharStream.random('0', 'z').filter(c -> N.isLetterOrDigit(c)).limit(12).toArray())
Random doesn't mean it must be unique. To get unique strings, use:
N.uuid() // E.g.: "e812e749-cf4c-4959-8ee1-57829a69a80f". length is 36.
N.guid() // E.g.: "0678ce04e18945559ba82ddeccaabfcd". length is 32 without '-'

You can use the following code, if your password mandatory contains numbers and alphabetic special characters:
private static final String NUMBERS = "0123456789";
private static final String UPPER_ALPHABETS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static final String LOWER_ALPHABETS = "abcdefghijklmnopqrstuvwxyz";
private static final String SPECIALCHARACTERS = "##$%&*";
private static final int MINLENGTHOFPASSWORD = 8;
public static String getRandomPassword() {
StringBuilder password = new StringBuilder();
int j = 0;
for (int i = 0; i < MINLENGTHOFPASSWORD; i++) {
password.append(getRandomPasswordCharacters(j));
j++;
if (j == 3) {
j = 0;
}
}
return password.toString();
}
private static String getRandomPasswordCharacters(int pos) {
Random randomNum = new Random();
StringBuilder randomChar = new StringBuilder();
switch (pos) {
case 0:
randomChar.append(NUMBERS.charAt(randomNum.nextInt(NUMBERS.length() - 1)));
break;
case 1:
randomChar.append(UPPER_ALPHABETS.charAt(randomNum.nextInt(UPPER_ALPHABETS.length() - 1)));
break;
case 2:
randomChar.append(SPECIALCHARACTERS.charAt(randomNum.nextInt(SPECIALCHARACTERS.length() - 1)));
break;
case 3:
randomChar.append(LOWER_ALPHABETS.charAt(randomNum.nextInt(LOWER_ALPHABETS.length() - 1)));
break;
}
return randomChar.toString();
}

You can use the UUID class with its getLeastSignificantBits() message to get 64 bit of random data, and then convert it to a radix 36 number (i.e. a string consisting of 0-9,A-Z):
Long.toString(Math.abs( UUID.randomUUID().getLeastSignificantBits(), 36));
This yields a string up to 13 characters long. We use Math.abs() to make sure there isn't a minus sign sneaking in.

Here it is a Scala solution:
(for (i <- 0 until rnd.nextInt(64)) yield {
('0' + rnd.nextInt(64)).asInstanceOf[Char]
}) mkString("")

Using an Apache Commons library, it can be done in one line:
import org.apache.commons.lang.RandomStringUtils;
RandomStringUtils.randomAlphanumeric(64);
Documentation

public static String randomSeriesForThreeCharacter() {
Random r = new Random();
String value = "";
char random_Char ;
for(int i=0; i<10; i++)
{
random_Char = (char) (48 + r.nextInt(74));
value = value + random_char;
}
return value;
}

I think this is the smallest solution here, or nearly one of the smallest:
public String generateRandomString(int length) {
String randomString = "";
final char[] chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890".toCharArray();
final Random random = new Random();
for (int i = 0; i < length; i++) {
randomString = randomString + chars[random.nextInt(chars.length)];
}
return randomString;
}
The code works just fine. If you are using this method, I recommend you to use more than 10 characters. A collision happens at 5 characters / 30362 iterations. This took 9 seconds.

public class Utils {
private final Random RANDOM = new SecureRandom();
private final String ALPHABET = "0123456789QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm";
private String generateRandomString(int length) {
StringBuffer buffer = new StringBuffer(length);
for (int i = 0; i < length; i++) {
buffer.append(ALPHABET.charAt(RANDOM.nextInt(ALPHABET.length())));
}
return new String(buffer);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Shortening java UUID while preserving the uniqueness - java

I use org.apache.commons.codec.binary.Base64 to convert a UUID into a url-safe unique string that is 22 characters in length and has the same uniqueness as UUID. I posted my code on Storing UUID as base64 String

Take a look at FriendlyId library. This library allow to encode UUID to Base62 string (Url62) and back. Uniqueness is achieved and encoded string is shorter. https://github.com/Devskiller/friendly-id

Related

Why use bit shifting instead of a for loop?

How to automatically create a string with 1s in Java

How to generate a real unique char only string in java

A particular type of hash on a String concatenation

How to generate a random alpha-numeric string

Categories

Resources