Hashing to Negative Values - java

Pretty much the title: I'm hashing a bunch of names (10000-ish) and some are outputting as negative. (table size is 20011).
The hash function in question is:
public static long hash2 ( String key ){
int hashVal = 0;
for( int i = 0; i < key.length(); i++ )
hashVal = (37 * hashVal) + key.charAt(i);
return hashVal % 20011;
}
I dug around and I think I have to do something to do with "wrap around." But I don't know how to go about that.

This is a clear case of Integer Overflow. As you have mentioned in the question that the String may have upto 10000 characters then the hashValue will definitely overflow because it is needed to store value around 37^10000. Even this will fail in string of length 20.
In number theory,
(A+B)%M = (A%M + B%M) % M;
(A*B)%M = (A%M * B%M) % M;
You should apply modulo operation inside the for loop. However If you do modulo operation at last or in execution of for loop, Both will give the same answer If overflow doesn't happen.
So make changes accordingly,
public static long hash2 ( String key ){
int hashVal = 0;
for( int i = 0; i < key.length(); i++ )
{
hashVal = (37 * hashVal) + key.charAt(i);
hashVal%=20011;
}
return hashVal;
}

hashVal is an Integer. It is most likely that your hash function is causing an Integer overflow.
You can easily resolve this by using Math.abs() to ensure that hashVal is a positive number. e.g.
hashVal = hashVal == Integer.MIN_VALUE ? 0 : Math.abs(hashVal);
return hashVal % 20011;
The mod % is to ensure that the final index computed is within the bounds of the table (i.e. if it's >= 20011, it uses the the remainder of division to as you say 'wrap around').

Related

Array form of integer

I was trying to convert the array to integer sum=999999999999 (twelve 9) , when i am limiting the array to less than ten 9s it is giving the result but when i am giving the array of more than ten 9s it is giving an unexpected result , please explain it will be really helpful for me
int[] arr={9,9,9,9,9,9,9,9,9,9,9,9};
int p=arr.length-1;
int m;
int num=0;
for (int i = 0; i <= p; i++) {
m=(int) Math.pow(10, p-i);
num += arr[i]*m; // it is executing like: 900+90+9=999
}
this happens because you're exceeding the Integer.MAX_VALUE.
You can read about it here.
You can use instead of int a long, to store large values,
and if that is not enough for you, you can use - BigInteger
BigInteger num = BigInteger.valueOf(0);
for (int i = 0; i <= p; i++) {
BigInteger m = BigInteger.valueOf((int) Math.pow(10, p-i));
BigInteger next = BigInteger.valueOf(arr[i]).multiply(m));
num = num.add(BigInteger.valueOf(arr[i]*m));
}
A couple of things.
You don't need to use Math.pow.
for up to 18 digits, you can use a long to do the computation.
I added some extra digits to demonstrate
int[] arr={9,9,9,9,9,9,9,9,9,9,9,9,1,1,2,3,4};
long sum = 0; // or BigInteger sum = BigInteger.ZERO;
for (int val : arr) {
sum = sum * 10 + val; // or sum.multiply(BigInteger.TEN).add(BigInteger.valueOf(val));
}
System.out.println(sum);
prints
99999999999911234
Here is the sequence for 1,2,3,4 so you can see what is happening.
- sum = 0
- sum = sum(0) * 10 + 1 (sum is now 1)
- sum = sum(1) * 10 + 2 (sum is now 12)
- sum = sum(12)* 10 + 3 (sum is now 123)
- sum = sum(123)*10 + 4 (sum is now 1234)
It is because an int is coded on 4 byte so technically you can only go from -2,147,483,648 to 2,147,483,647.
Consider using the long type.
Try using long (or any other type which can represent larger numbers) instead of int.
I suggest this because the int overflows: see https://en.wikipedia.org/wiki/Integer_overflow
Because it overflows integer boundry. The maximum integer value that can be stored in Java is 2147483647. When you try to store a value greater than this, the result will be an unexpected value. To solve this issue, you can use a long data type instead of an int data type
you can read about it here and here

Rabin-Karp not working for large primes (gives wrong output)

So I was solving this problem (Rabin Karp's algorithm) and wrote this solution:
private static void searchPattern(String text, String pattern) {
int txt_len = text.length(), pat_len = pattern.length();
int hash_pat = 0, hash_txt = 0; // hash values for pattern and text's substrings
final int mod = 100005; // prime number to calculate modulo... larger modulo denominator reduces collisions in hash
final int d = 256; // to include all the ascii character codes
int coeff = 1; // stores the multiplier (or coeffecient) for the first index of the sliding window
/*
* HASHING PATTERN:
* say text = "abcd", then
* hashed text = 256^3 *'a' + 256^2 *'b' + 256^1 *'c' + 256^0 *'d'
*/
// The value of coeff would be "(d^(pat_len - 1)) % mod"
for (int i = 0; i < pat_len - 1; i++)
coeff = (coeff * d) % mod;
// calculate hash of the first window and the pattern itself
for (int i = 0; i < pat_len; i++) {
hash_pat = (d * hash_pat + pattern.charAt(i)) % mod;
hash_txt = (d * hash_txt + text.charAt(i)) % mod;
}
for (int i = 0; i < txt_len - pat_len; i++) {
if (hash_txt == hash_pat) {
// our chances of collisions are quite less (1/mod) so we dont need to recheck the substring
System.out.println("Pattern found at index " + i);
}
hash_txt = (d * (hash_txt - text.charAt(i) * coeff) + text.charAt(i + pat_len)) % mod; // calculating next window (i+1 th index)
// We might get negative value of t, converting it to positive
if (hash_txt < 0)
hash_txt = hash_txt + mod;
}
if (hash_txt == hash_pat) // checking for the last window
System.out.println("Pattern found at index " + (txt_len - pat_len));
}
Now this code is simply not working if the mod = 1000000007, whereas as soon as we take some other prime number (large enough, like 1e5+7), the code magically starts working !
The line at which the code's logic failed is:
hash_txt = (d * (hash_txt - text.charAt(i) * coeff) + text.charAt(i + pat_len)) % mod;
Can someone please tell me why is this happening ??? Maybe this is a stupid doubt but I just do not understand.
In Java, an int is a 32-bit integer. If a calculation with such number mathematically yields a result that needs more binary digits, the extra digits are silently discarded. This is called overflow.
To avoid this, the Rabin-Karp algorithm reduces results modulo some prime in each step, thereby keeping the number small enough that the next step will not overflow. For this to work, the prime chosen must be suitably small that
d * (hash + max(char) * coeff) + max(char)) < max(int)
Since
0 ≤ hash < p,
1 ≤ coeff < p,
max(char) = 216
max(int) = 231
any prime smaller than 27=128 will do. For larger primes, it depends on what their coeff ends up being, but even if we select one with the smallest possible coeff = 1, the prime must not exceed 223, which is much smaller than the prime you used.
In practice, one therefore uses Rabin-Karp with an integer datatype that is significantly bigger that the character type, such as a long (64 bits). Then, any prime < 239 will do.
Even then, if it worth noting that your reasoning
our chances of collisions are quite less (1/mod) so we dont need to recheck the substring
is flawed, because the probability is determined not by chance, but by the strings being checked. Unless you know the probability distribution of your inputs, you can't know what the probability of failure is. That's why Rabin-Karp rechecks the string to make sure.

How to sort digits of an integer using binary number technique?

Yesterday I went for an interview and they asked me to create a method which takes an integer value and displays the number with its digits in descending order. I used string manipulation and solved it but they asked me to do it using binary number technique. I still don't know how to approach this problem.
"Binary number technique"? It's a bullshit question, one where the correct answer is to walk out from the interview because it's a bullshit company.
Anyway, the best answer I can think of is
public static int solveBullshitTaskInASmartWay(int n) {
// get characters and sort them
char[] chars = Integer.toString(n).toCharArray();
Arrays.sort(chars);
// comparators don't work in Java for primitives,
// so you either have to flip the array yourself
// or make an array of Integer or Character
// so that Arrays.sort(T[] a, Comparator<? super T> c)
// can be applied
for (int i = 0, j = chars.length - 1; i < j; i++, j--) {
char t = chars[i]; chars[i] = chars[j]; chars[j] = t;
}
// reconstruct the number
return Integer.parseInt(new String(chars));
}
There is no numeric way to sort a number's digits, if you're expecting a nifty mathematical answer you will be waiting for a while.
EDIT: I need to add this - "digit" is solely a property of display of numbers. It is not a property of a number. Mathematically, the number 0b1000 is the same as 0x8 or 0o10, or 008.00000, or 8e0 (or even trinary 22, if anyone used trinary; alas, no conventional notation for that in programming). It is only the string representations of numbers that have digits. Solving this problem without use of characters or strings is not only pretty hard, it is stupid.
EDIT2: It is probably obvious, but I should make it clear that I have no beef with the OP, it is the interviewer I that I am entirely laying the blame on.
The is a simple (but not efficient) way of doing it without conversion to string. You can perform insertion sort on digits by extracting them from number using modulo and division, comparing them, and swapping if needed. There will be at most 9*8 comparsions need.
Here is code in C++
int sortDigits(int number)
{
for(int j = 0; j < 9; ++j) //because number can have 9+1 digits (we don't need 10 because digits are sorted in pairs)
{
int mul = 1;
for(int i = 0; i < 8; ++i) //because with i == 7 mul * 10 is biggest number fitting in int (will extract last digit)
{
if (mul * 10 > number) break; //by doing that we ensure there will be no zeroes added to number
int digitRight = number / mul % 10;
int digitLeft = number / (mul * 10) % 10;
if(digitRight > digitLeft) //swapping digits
{
/*
number -= digitLeft * mul * 10;
number += digitLeft * mul;
number -= digitRight * mul;
number += digitRight * mul * 10;
*/
number -= digitLeft * mul * 9;
number += digitRight * mul * 9;
}
mul *= 10;
}
}
return number;
}

Optimizing and finding the computation of this inequality

Let's assume I have a set of integers, out of which I want to find out all the maximum number of integers which satisfy a particular inequality. For sake of explanation,
r1, r2, r3, ... rn when ri is a positive integer. I want the to find the maximum z which would range from 1 to n for which ri <= 0.5 * (r1 + r2 + r3 + ... + rn) for all i from 3 to z. How to approach such problems? I have approaches the naive method of finding all subsets of sizes from 1 to n and iterating through each of the subset to check whether each element satisfies the condition or not? Any other approach?
I feel kind of bad, especially for the false edit I have done to the question... Anyway:
First, the most naive, direct and straightforward way to solve this question; which would be to start off from the 1st number in the set, calculate the nth partial sum on each step, compare that partial sum with the double of each element starting from the 3rd element, up till the nth element. If the comparison holds for each element, mark the current last as the maximum z.
The following macro largestzfinder, dependant on the function largestzfinderfunc does that:
// indices are zero based
#define largestzfinder(_x_) largestzfinderfunc((_x_), sizeof(_x_) / sizeof(*_x_))
unsigned int largestzfinderfunc( unsigned int set[], size_t size ) {
unsigned int largestz = 0;
unsigned int partialsumsofar = 0;
int disqualified;
for ( int i = 0; i < size; i++ ){
partialsumsofar += set[i];
disqualified = 0;
for ( int j = 2; j <= i; j++ ) { // for all j from 2 to i (inclusive)
if ( 2 * set[j] > partialsumsofar ) {
disqualified = 1;
break;
}
}
if ( !disqualified ) // if comparison held for all j
largestz = i;
}
return largestz;
}
With this method, we gradually reach to our largestz by starting off from the smallest z and finding the next bigger z until we reach the largest. We could simplify that process by starting from the end, in which case we wouldn't have to go through all that other zs except for the largest one, and we don't need the others either...
To do that, we would need to pre-calculate the whole sum, and then reduce the elements from the last one by one, make comparisons with shrinking partial sums the same way, and return an answer as soon as we find a non disqualified candidate. Following code does that:
// indices are zero based
#define largestzfinder(_x_) largestzfinderfunc((_x_), sizeof(_x_) / sizeof(*_x_))
unsigned int largestzfinderfunc( unsigned int set[], size_t size ) {
unsigned int largestz = 0;
unsigned int partialsumsofar = 0;
int disqualified;
for ( int i = 0; i < size; i++ )
partialsumsofar += set[i];
for ( int i = size - 1; i >= 0; i-- ){
disqualified = 0;
for ( int j = 2; j <= i; j++ ) { // for all j from 2 to i (inclusive)
if ( 2 * set[j] > partialsumsofar ) {
disqualified = 1;
break;
}
}
if ( !disqualified ) { // if comparison held for all j
largestz = i;
break;
}
partialsumsofar -= set[i]; // updates/reduces partialsumsofar
}
return largestz;
}
Now, you see, we check if the condition holds for every single element from 3rd to last, one by one... while we could just check it for the largest among them! If largestamongthem <= partialsumsofar / 2, then all of them will simply be less than or equal to the partial sum that far.
How would you determine the largestsofar? Well, things get complicated, especially when you do the process starting from the end. If we were doing it from the start, then we could just start off with largestsofar = 0; and compare each subsequent element with it, update our largestsofar.
One way to do it, is to make a new array of integers that has the same size with the set array, which will hold the largestsofar for each point. The following code uses that method:
// indices are zero based
#define largestzfinder(_x_) largestzfinderfunc((_x_), sizeof(_x_) / sizeof(*_x_))
unsigned int largestzfinderfunc( unsigned int set[], size_t size ) {
unsigned int largestz = 0;
unsigned int partialsumsofar = 0;
unsigned int * largestsofar = calloc( size, sizeof * largestsofar );
for ( int i = 0; i < size; i++ )
partialsumsofar += set[i];
largestsofar[0] = 0;
largestsofar[1] = 0;
largestsofar[2] = set[2];
for ( int i = 3; i < size; i++ )
largestsofar[i] = (set[i] > largestsofar[i-1]) ? set[i] : largestsofar[i-1];
for ( int i = size - 1; i >= 0; i-- ){
if ( 2 * largestsofar[i] <= partialsumsofar ) {
largestz = i;
break;
}
partialsumsofar -= set[i];
}
free( largestsofar );
return largestz;
}
Well, if you aren't really happy with allocating memory, keeping a list of largest number so far in it, I'm with you. The following recursive function does the thing, without keeping a list. It also looks far shorter. The downside is that it is less reader-friendly. Here:
// indices are zero based
#define largestzfinder(_x_) largestzfinderfunc((_x_), sizeof(_x_) / sizeof(*_x_), (_x_)[0] + (_x_)[1], 1, 0)
unsigned int largestzfinderfunc( unsigned int set[], size_t size, unsigned int partialsumsofar, unsigned int i, unsigned int largestsofar ){
unsigned int j;
if ( i < size - 1 && ( j = largestzfinderfunc( set, size, partialsumsofar + set[i + 1], i + 1, ( set[i + 1] > largestsofar ) ? set[i + 1] : largestsofar ) ) ){
return j;
}
else if ( 2 * largestsofar <= partialsumsofar ) {
return i;
}
else
return 0;
}
Let me explain what it does briefly: By a call to the macro, it first passes r_0 + r_1 as partialsumsofar, 1 as the current z to be checked, and 0 as the largest number up till the r_1 (this is because r_0 and r_1 are not to be considered during comparison).
It doesn't check right away, however; provided that this was not the last element (i < size - 1), it will call for the function one level beyond, with partialsumsofar updated, current z to be checked incremented, and largest number to be updated as well. This will be done until the last element is reached.
After that, the recursion tree will collapse. If the last one fails to satisfy the condition, it will return a 0, causing the parent recursion branch to check if he/she satisfies the condition.
As soon as one parent satisfies the condition, it will return its current z, causing each parent to return the same z, since they all store the returned value under j and return it, if j is not zero.
In essence, it actually does allocate memory for each largestsumsofar and partialsumsofar and all, but it frees all those one by one as the recursion tree/line/whatever collapses.
This wasn't so brief, but whatever, I hope I've done it right this time. I want to get over this question now...

Universal Hashfunctions for string

I am trying to implement two different universal hash functions for strings.
But I have the problem that sometimes the hash value is 0.
With this I can´t use the hash function because I want to implement double hashing and have to implement this function: hash_func1(string s) + i * hash_func2(string s) to go through the hash table.
But if one hash function is 0 nothing changes and I get an endless loop.
This is for collision detection in a hash table.
I need two different universal hash functions for doing that.
I have tried different hash functions but cant find anything that works.
Can anyone help me with this problem?
This are some of the functions I have tried.
int h = 0 , r1 = 31415 , r2 = 27183;
for (int i =0; i < key.length (); i ++) {
h = ( r1 * h + key.charAt ( i )) % capacity ;
r1 = r1 * r2 % (capacity -1);
}
return h ;
Or this one
int seed = 131;
long hash = 0;
for(int i = 0; i < key.length(); i++)
{
hash = (hash * seed) + key.charAt(i);
}
return (int) (hash % capacity);
The wikipedia article on double hashing suggests that you modify your hash function to avoid that it becomes zero, the easiest way to do that being to simply add 1:
int h1 = hash_func1(s);
int h2 = (hash_func2(s) % (capacity - 1)) + 1;
// loop over (h1 + i * h2) % capacity
EDIT: Oops, I guess you also need to bound it by capacity - 1, otherwise with h2 == capacity, you would still run into an endless loop...
Or, even better, have hash_func2() already return a value less than capacity - 1, then adding 1 is sufficient.

Categories