I would like to implement a logic based on a provided string I have to generate two random numbers between 1 to 10.
I have a string like Johnsen using it I have to generate two numbers like 1 and 3 and next time with the same string it should give the same numbers for the same string.
Need help to develop this algorithm or logic.
Java has random number generator support via the java.util.Random class. This class 'works' by having a seed value, then giving you some random data based on this seed value, which then updates the seed value to something new.
This pragmatically means:
2 instances of j.u.Random with the same seed value will produce the same sequence of values if you invoke the same sequence of calls on it to give you random data.
But, seed values are of type long - 64 bits worth of data.
Thus, to do what you want, you need to write an algorithm that turns any String into a long.
Given that long, you simply make an instance of j.u.Random with that long as seed, using the new Random(seedValue) constructor.
So that just leaves: How do I turn a string into a long?
Easy way
The simplest answer is to invoke hashCode() on them. But, note, hashcodes only have 32 bits of info (they are int, not long), so this doesn't cover all possible seed values. This is unlikely to matter unless you're doing this for crypto purposes. If you ARE, then you need to stop what you are doing and do a lot more research, because it is extremely easy to mess up and have working code that seems to test fine, but which is easy to hack. You don't want that. For starters, you'd want SecureRandom instead, but that's just the tip of the iceberg.
Harder way
Hashing algorithms exist that turn arbitrary data into fixed size hash representations. The hashCode algorithm of string [A] only makes 32-bits worth of hash, and [B] is not cryptographically secure: If you task me to make a string that hashes to a provided value I can trivially do so; a cryptographically secure hash has the property that I can't just cook you up a string that hashes to a desired value.
You can search the web for hashing strings or byte arrays (you can turn a string into one with str.getBytes(StandardCharsets.UTF_8)).
You can 'collapse' a byte array containing a hash into a long also quite easily - just take any 8 bytes in that hash and use them to construct a long. "Turn 8 bytes into a long" also has tons of tutorials if you search the web for it.
I assume the easy way is good enough for this exercise, however.
Thus:
String key = ...;
Random rnd = new Random(key.hashCode());
int number1 = rnd.nextInt(10) + 1;
int number2 = rnd.nextInt(10) + 1;
System.out.println("First number: " + number1);
System.out.println("Second number: " + number2);
You could get the hashcode of the string, then use that to seed a random number generator. Use the RNG to get numbers in the range 1 - 10.
Related
I'm trying to generate n random numbers that depend on an input string. It would be a function generateNumbers(String input) that generates the same set of numbers for the same input string but entirely different numbers for a slightly different input string.
My question is: Is there an easy way to do this?
I agree with nihlon, if what you want is a function f() returning an int such that f(string1) != f(string2) for any string1, string2 in some set of strings S, then you're looking for a perfect hash.
Obviously, if S is the set of all possible strings, there are way more than 2^32, or even 2^64, so no such f() can exist returning an int or even long. Hence, the question is: how is S characterized?
Also, are you sure you need unique numbers for different strings? In most problem domains regular hashing is adequate...
As Roberto says, a hash is one way to do this, with a small possibility of two different strings hashing to the same value. That probability depends on the maximum size of string you allow and the bit-size of the resulting hash number.
You could also use an encryption, but then you would have to limit the string size to one or two blocks of a block cipher. Two blocks of AES is 32 characters, and will produce a 256 bit number.
Pick the smallest string size you can live with, and the largest hash size/block size you can work with. A non-cryptographic hash like the fnv hash will be faster than a cryptographic hash like SHA-256, but obviously less secure. You do not say how important security is to you.
I need to generate a reservation code of 6 alpha numeric characters, that is random and unique in java.
Tried using UUID.randomuuid().toString(), However the id is too long and the requirement demands that it should only be 6 characters.
What approaches are possible to achieve this?
Just to clarify, (Since this question is getting marked as duplicate).
The other solutions I've found are simply generating random characters, which is not enough in this case. I need to reasonably ensure that a random code is not generated again.
Consider using the hashids library to generate salted hashes of integers (your database ids or other random integers which is probably better).
http://hashids.org/java/
Hashids hashids = new Hashids("this is my salt",6);
String id = hashids.encode(1, 2, 3);
long[] numbers = hashids.decode(id);
You have 36 characters in the alphanumeric character set (0-9 digits + a-z letters). With 6 places you achieve 366 = 2.176.782.336 different options, that is slightly larger than 231.
Therefore you can use Unix time to create a unique ID. However, you must assure that no ID generated within the same second.
If you cannot guarantee that, you end up with a (synchronized) counter within your class. Also, if you want to survive a JVM restart, you should save the current value (e.g. to a database, file, etc. whatever options you have).
Despite its name, UUIDs are not unique. It's simply extremely unlikely to get a 128 bit collision. With 6 (less than 32 bit) it's very likely that you get a collision if you just hash stuff or generate a random string.
If the uniqueness constraint is necessary then you need to
generate a random 6 character string
Check if you generated that string before by querying your database
If you generated it before, go back to 1
Another way would be to use a pseadorandom permutation (PRP) of size 32 bit. Block ciphers are modeled as PRP functions, but there aren't many that support 32 bit block sizes. Some are Speck by the NSA and the Hasty Pudding Cipher.
With a PRP you could for example take an already unique value like your database primary key and encrypt it with the block cipher. If the input is not bigger than 32 bit then the output will still be unique.
Then you would run Base62 (or at least Base 41) over the output and remove the padding characters to get a 6 character output.
if you do a substring that value may not be unique
for more info please see following similar link
Generating 8-character only UUIDs
Lets say your corpus is the collection of alpha numberic letters. a-zA-Z0-9.
char[] corpus = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
We can use SecureRandom to generate a seed, which will ask the OS for entropy, depending on the os. The trick here is to keep a uniform distribution, each byte has 255 values, but we only need around 62 so I will propose rejection sampling.
int generated = 0;
int desired=6;
char[] result= new char[desired];
while(generated<desired){
byte[] ran = SecureRandom.getSeed(desired);
for(byte b: ran){
if(b>=0&&b<corpus.length){
result[generated] = corpus[b];
generated+=1;
if(generated==desired) break;
}
}
}
Improvements could include, smarter wrapping of generated values.
When can we expect a repeat? Lets stick with the corpus of 62 and assume that the distribution is completely random. In that case we have the birthday problem. That gives us N = 62^6 possiblities. We want to find n where the chance of a repeat around 10%.
p(r)= 1 - N!/(N^n (N-n)!)
And using the approximation given in the wikipedia page.
n = sqrt(-ln(0.9)2N)
Which gives us about 109000 numbers for 10% chance. For a 0.1% chance it woul take about 10000 numbers.
you can trying to make substring out of your generated UUID.
String uuid = UUID.randomUUID().toString();
System.out.println("uuid = " + uuid.substring(0,5);
I'm trying to implement the Chord protocol in order to quickly lookup some nodes and keys in a small network. What I can't figure out is ... Chord cosideres the nodes and keys as being placed on a cirlce. And their placement dictated by the hash values obtained by applying the SHA-1 hash function. How exactly do I operate with those values? Do I make them as a string de9f2c7f d25e1b3a fad3e85a 0bd17d9b 100db4b3 and then compare them as such, considering that "a" < "b" is true ? Or how? How do I know if a key is before or after another?
Since the keyspace is a ring, a single value can't be said to be greater than another, because if you go the other way around the ring, the opposite is true. You can say a value is within a range or not. In the Chord DHT, each server is responsible for the keys within the range of values between it and its predecessor.
I would advise against using strings for the hash values. You shouldn't use the hashCode function for distributed systems, but you need to math on the hash keys when adding new nodes. You could try converting the hashes into BigIntegers instead.
sha1 hashes are not strings but are very long hex numbers - they are often stored as strings because they would otherwise require a native 160 bit number type. They are built as 5 32 bit hex numbers and then often 'strung' together.
using sha1 strings as the numbers they represent is not hard but requires a library that can handle such large numbers (like BigInt or bcmath). these libraries work by calculating the numbers within the string one column at a time from the right to left, much like a person when using a pen and paper to add, multiply, divide, etc. they will typically have functions for doing common math as well comparisons etc, and often take strings as arguments. Also, make sure that you use a function for converting big numbers anytime you need to go from hex to dec, or else your 160 bit hex number will likely get rounded into a 64 bit dec float or similar and loose most of it's accuracy.
more/less than comparisons are used in chord to figure ranges but do so using modulo so that they 'wrap', making ranges such as [64, 2] possible. the actual formula is
find_successor(fingers[k] = n + 2^(k-1) mod(2^160))
where 'n' is the sha1 of a node and 'k' is the finger number.
remember, 'n' will be hex while 'k' and 'mod(160^2)' will typically be dec, so this is where your BigInt hex to BigInt dec will be needed.
even if your programing framework will let you create these vars as hex, 160 is specifically a dec (literally meaning one hounded and sixty bits) and besides, wrapping your brain around 'mod(160^2)' is already hard enough without visualizing it as hex. convert 'n' to dec rather than converting 'k' etc to hex , and then use a BigInt lib to do the math including comparisons.
So I know I can convert a string to a hashcode simply by doing .hashCode(), but is there a way to convert (or use some other function if there is one out there) that will instead of returning an integer return a double between 0 and 1? I was thinking of just dividing the number by the maximum possible integer but wasn't sure if there was a better way.
*Edit (more information about why i'm trying to do this): i'm doing a mathematical operation, and i'm trying to group different objects to perform the same mathematical operation in their group but have a different parameter into the function. each member has a list of characteristics that "group" them... so i was thinking to put the characteristics into a string and then hashcode it and find their group value from that
You couldn't just divide by Integer.MAX_VALUE, as that wouldn't deal with negative numbers. You could use:
private static double INTEGER_RANGE = 1L << 32;
...
// First need to put it in the range [0, INTEGER_RANGE)
double doubleHash = ((long) text.hashCode() - Integer.MIN_VALUE) / INTEGER_RANGE;
That should be okay, as far as I'm aware... but I'm not going to make any claims about the distribution. There may well be a fairly simple way of using the 32 bits to make a unique double (per unique hash code) in the right range, but if you don't care too much about that, this will be simpler.
Dividing it should be ok, but you might loose some "precision" due to rounding problems, etc, that doubles might have.
In general a hash is used to identify something trying to assure it'll be unique, loosing precision might have problems in that.
You could write your own String.hashCodeDouble() returning the desired number, perhaps using a common hash algorithm (let's say, MD5) and adapting it to your required response range.
Example: do the MD5 of the String to get a hash, then simply put a 0. in front of it...
Remember that the .hashCode() is used in lots of functions in Java, you can't simply overwrite it.
This smells bad but might do what you want:
Integer iHash = "123".hashCode();
String sHash = "0."+iHash;
Double dHash = Double.valueOf(sHash);
I have come across situations in an interview where I needed to use a hash function for integer numbers or for strings. In such situations which ones should we choose ? I've been wrong in these situations because I end up choosing the ones which have generate lot of collisions but then hash functions tend to be mathematical that you cannot recollect them in an interview. Are there any general recommendations so atleast the interviewer is satisfied with your approach for integer numbers or string inputs? Which functions would be adequate for both inputs in an "interview situation"
Here is a simple recipe from Effective java page 33:
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the
equals method, that is), do the following:
Compute an int hash code c for the field:
If the field is a boolean, compute (f ? 1 : 0).
If the field is a byte, char, short, or int, compute (int) f.
If the field is a long, compute (int) (f ^ (f >>> 32)).
If the field is a float, compute Float.floatToIntBits(f).
If the field is a double, compute Double.doubleToLongBits(f), and
then hash the resulting long as in step 2.1.iii.
If the field is an object reference and this class’s equals method
compares the field by recursively invoking equals, recursively
invoke hashCode on the field. If a more complex comparison is
required, compute a “canonical representation” for this field and
invoke hashCode on the canonical representation. If the value of the
field is null, return 0 (or some other constant, but 0 is traditional).
48 CHAPTER 3 METHODS COMMON TO ALL OBJECTS
If the field is an array, treat it as if each element were a separate field.
That is, compute a hash code for each significant element by applying
these rules recursively, and combine these values per step 2.b. If every
element in an array field is significant, you can use one of the
Arrays.hashCode methods added in release 1.5.
Combine the hash code c computed in step 2.1 into result as follows:
result = 31 * result + c;
Return result.
When you are finished writing the hashCode method, ask yourself whether
equal instances have equal hash codes. Write unit tests to verify your intuition!
If equal instances have unequal hash codes, figure out why and fix the problem.
You should ask the interviewer what the hash function is for - the answer to this question will determine what kind of hash function is appropriate.
If it's for use in hashed data structures like hashmaps, you want it to be a simple as possible (fast to execute) and avoid collisions (most common values map to different hash values). A good example is an integer hashing to the same integer - this is the standard hashCode() implementation in java.lang.Integer
If it's for security purposes, you will want to use a cryptographic hash function. These are primarily designed so that it is hard to reverse the hash function or find collisions.
If you want fast pseudo-random-ish hash values (e.g. for a simulation) then you can usually modify a pseudo-random number generator to create these. My personal favourite is:
public static final int hash(int a) {
a ^= (a << 13);
a ^= (a >>> 17);
a ^= (a << 5);
return a;
}
If you are computing a hash for some form of composite structure (e.g. a string with multiple characters, or an array, or an object with multiple fields), then there are various techniques you can use to create a combined hash function. I'd suggest something that XORs the rotated hash values of the constituent parts, e.g.:
public static <T> int hashCode(T[] data) {
int result=0;
for(int i=0; i<data.length; i++) {
result^=data[i].hashCode();
result=Integer.rotateRight(result, 1);
}
return result;
}
Note the above is not cryptographically secure, but will do for most other purposes. You will obviously get collisions but that's unavoidable when hashing a large structure to a integer :-)
For integers, I usually go with k % p where p = size of the hash table and is a prime number and for strings I choose hashcode from String class. Is this sufficient enough for an interview with a major tech company? – phoenix 2 days ago
Maybe not. It's not uncommon to need to provide a hash function to a hash table whose implementation is unknown to you. Further, if you hash in a way that depends on the implementation using a prime number of buckets, then your performance may degrade if the implementation changes due to a new library, compiler, OS port etc..
Personally, I think the important thing at interview is a clear understanding of the ideal characteristics of a general-purpose hash algorithm, which is basically that for any two input keys with values varying by as little as one bit, each and every bit in the output has about 50/50 chance of flipping. I found that quite counter-intuitive because a lot of the hashing functions I first saw used bit-shifts and XOR and a flipped input bit usually flipped one output bit (usually in another bit position, so 1-input-bit-affects-many-output-bits was a little revelation moment when I read it in one of Knuth's books. With this knowledge you're at least capable of testing and assessing specific implementations regardless of how they're implemented.
One approach I'll mention because it achieves this ideal and is easy to remember, though the memory usage may make it slower than mathematical approaches (could be faster too depending on hardware), is to simply use each byte in the input to look up a table of random ints. For example, given a 24-bit RGB value and int table[3][256], table[0][r] ^ table[1][g] ^ table[2][b] is a great sizeof int hash value - indeed "perfect" if inputs are randomly scattered through the int values (rather than say incrementing - see below). This approach isn't ideal for long or arbitrary-length keys, though you can start revisiting tables and bit-shift the values etc..
All that said, you can sometimes do better than this randomising approach for specific cases where you are aware of the patterns in the input keys and/or the number of buckets involved (for example, you may know the input keys are contiguous from 1 to 100 and there are 128 buckets, so you can pass the keys through without any collisions). If, however, the input ceases to meet your expectations, you can get horrible collision problems, while a "randomising" approach should never get much worse than load (size() / buckets) implies. Another interesting insight is that when you want a quick-and-mediocre hash, you don't necessarily have to incorporate all the input data when generating the hash: e.g. last time I looked at Visual C++'s string hashing code it picked ten letters evenly spaced along the text to use as inputs....