Anagram Hash Function - java

I know something like this has been asked before, but the answer was sort of side tracked.
I want to develop a hash function which will take a word and spit out an address of an array.
So, for example, if you input god:
sort the word, d o g
perform some sort of function on this to get an address d o g -> some number
insert 'dog' into address some_number in array[].
I can't seem to make a function which doesn't get screwed up somehow.
public static int hashCode(String word){
char[] x = word.toCharArray();
Arrays.sort(x);
int hash = 0;
for(int i =0; i<x.length; i++)
{
hash +=(x[i]-96)*(x[i]-96)*(x[i]-96)*(i+1)*(i+1)+i;
}
hash %=size; // get a value that's inside the bounds of the array
if(hash<0)
hash = hash + size;
return (hash);
}
This is my current algorithm but there are two problems.
the array size has the be huge so that there aren't a ton of collisions
there still are a few collisions, chair for example, produces: smarminess, parr, chair
What do you guys think? I really appreciate your help

Your hash function looks totally arbitrary. Why are you using that?
There are a few common, well known and relatively good hash functions, see a description here:
http://www.azillionmonkeys.com/qed/hash.html
See also https://stackoverflow.com/questions/263400#263416

There is a lot of research on hash functions and collision resolution. Here's a place to start: Hash Table

I guess that -- from your title and from the Arrays.sort(x) function -- that you're looking for a hash function that expressly collides when two strings are anagrams of each other. Is this correct? If so, you should specify that requirement INSIDE the question.
The article that Vinko suggested is good. I also recommend Integer Hash Function for other algorithms that you might try.
Good luck!

If you you really want to develop a "hash" that deliberately collides for all anagrams (in other words one that's amenable to finding anagrams in a hash table) then why not split the string into an array of characters, filter out any characters you want to ignore (non-letters) and sort the results, concatenate and then hash that string.
Thus "dog" and "god" both get munged into "dgo" and that's your key for all anagrams of "dog."
In modern versions of Python all that verbiage can be summarized in the following one-line function:
def anagrash(s):
return ''.join(sorted([x for x in s.lower() if s.isalpha()]))
... which you might use as:
anagrams = dict()
for each in phrases:
ahash = anagrash(each)
if ahash not in anagrams:
anagrams[ahash] = list()
anagrams[ahash].append(each)
... to build a dictionary of possible anagrams from a list of phrases.
Then, to filter out all of the phrases for which no anagram was found:
for key,val in anagrams:
if len(val) < 2:
del anagrams[key]
So, there's your homework assignment. Less than a dozen lines of Python. Porting that to whatever language your instructor is teaching and wrapping it in logic to read in the phrases and write out results is all left as an exercise to the student.

Related

Best pratice for using array as the key of memoization in Java

I am doing some algorithm problems in Java, and from time to time the problem needs memoization to optimize speed. And often times, the key is an array. What I usually uses is
HashMap<ArrayList<Integer>, Integer> mem;
The main reason here to use ArrayList<Integer> instead of int[] is that the hashCode() of an primitive array is calculated based on the reference, but for ArrayList<Integer> the value of the actual array is compared, which is desired behavior.
However, it is not very efficient and code can be pretty lengthy as well. So I am wondering if there is any best practice for this kind of memoization in Java? Thanks.
UPDATE: As many have pointed this out in the comments: it is a very bad idea to use mutable objects as the key of a HashMap, which I totally agree.
And I am going to clarify the question a little bit more: when I use this type of memoization, I will NOT change the ArrayList<Integer> once it is inserted to the map. Normally the array represents some status, and I need to cache the corresponding value for that status in case it is visited again.
So please do not focus on how bad it is to use a mutable object as the key to a HashMap. Do suggest some better way to do this kind of memoization please.
UPDATE2: So at last I choose the Arrays.toString() approach since I am doing algorithm problems on TopCoder/Codeforces, and it is just dirty and fast to code.
However, I do think HashMap is the more reasonable and readable way to do this.
You can create a new class - Key, put an array with some numbers as a field and implement your own hascode() based on the contents of the array.
It will improve the readability as well:
HashMap<Key, Integer> mem;
If your ArrayList contains usually 3-4 elements,
I would not worry much about performance. Your approach is OK.
But as others pointed out, your key is thus mutable which is
a bad idea.
Another approach is to append all elements of the ArrayList
together using some separator (say #) and thus have this kind
of string for key: 123#555#66678 instead of an ArrayList of
these 3 integers. You can just call Arrays.toString(int[])
and get a decent string key out of an array of integers.
I would choose the second approach.
If the input array is large, the main problem seems to be the efficiency of lookup. On the other hand, your computation is probably much more expensive than that, so you've got same CPU cycles to spare.
Lookup time will depend both on the hashcode calculation and on the brute-force equals needed to pinpoint the key in a hash bucket. This is why the array as a key is out of the question.
The suggestion already given by user:XpressOneUp, creating a class which wraps the array and provides its custom hash code, seems like your best bet and you can optimize hashcode calculation to involve only some array elements. You'll know best which elements are the most salient.
If the values in the array are small integer than here is way to do it efficiently :-
HashMap<String,Integer> Map
public String encode(ArrayList arr) {
String key = "";
for(int i=0;i<arr.size();i++) {
key = key + arr.get(i) + ",";
}
return(key);
}
Use the encode method to convert your array to unique string use to add and lookup the values in HashMap

Prevent treemap merging on collision

Edit: I should have probably mentioned that I am extremely new to Java programming. I just started with the language about two weeks ago.
I have tried looking for an answer to this questions, but so far I haven't found one so that is why I am asking it here.
I writing java code for an Dungeons and Dragons Initiative Tracker and I am using a TreeMap for its ability to sort on entry. I am still very new to java, so I don't know everything that is out there.
My problem is that when I have two of the same keys, the tree merges the values such that one of the values no longer exists. I understand this can be desirable behavior but in my case I cannot have that happen. I was hoping there would be an elegant solution to fix this behavior. So far what I have is this:
TreeMap<Integer,Character> initiativeList = new TreeMap<Integer,Character>(Collections.reverseOrder());
Character [] cHolder = new Character[3];
out.println("Thank you for using the Initiative Tracker Project.");
cHolder[0] = new Character("Fred",2);
cHolder[1] = new Character("Sam",3,23);
cHolder[2] = new Character("John",2,23);
for(int i = 0; i < cHolder.length; ++i)
{
initiativeList.put(cHolder[i].getInitValue(), cHolder[i]);
}
out.println("Initiative List: " + initiativeList);
Character is a class that I have defined that keeps track of a player's character name and initiative values.
Currently the output is this:
Initiative List: {23=John, 3=Fred}
I considered using a TreeMap with some sort of subCollection but I would also run into a similar problem. What I really need to do is just find a way to disable the merge. Thank you guys for any help you can give me.
EDIT: In Dungeons and Dragons, a character rolls a 20 sided dice and then added their initiative mod to the result to get their total initiative. Sometimes two players can get the same values. I've thought about having the key formatted like this:
Key = InitiativeValue.InitiativeMod
So for Sam his key would be 23.3 and John's would be 23.2. I understand that I would need to change the key type to float instead of int.
However, even with that two players could have the same Initiative Mod and roll the same Initiative Value. In reality this happens more than you might think. So for example,
Say both Peter and Scott join the game. They both have an initiative modifier of 2, and they both roll a 10 on the 20 sided dice. That would make both of their Initiative values 12.
When I put them into the existing map, they both need to show up even though they have the same value.
Initiative List: {23=John, 12=Peter, 12=Scott, 3=Fred}
I hope that helps to clarify what I am needing.
If I understand you correctly, you have a bunch of characters and their initiatives, and want to "invert" this structure to key by initiative ID, with the value being all characters that have that initiative. This is perfectly captured by a MultiMap data structure, of which one implementation is the Guava TreeMultimap.
There's nothing magical about this. You could achieve something similar with a
TreeMap<Initiative,List<Character>>
This is not exactly how a Guava multimap is implemented, but it's the simplest data structure that could support what you need.
If I were doing this I would write my own class that wrapped the above TreeMap and provided an add(K key, V value) method that handled the duplicate detection and list management according to your specific requirements.
You say you are "...a TreeMap for its ability to sort on entry..." - but maybe you could just use a TreeSet instead. You'll need to implement a suitable compareTo method on your Character class, that performs the comparison that you want; and I strongly recommend that you implement hashCode and equals too.
Then, when you iterate through the TreeSet, you'll get the Character objects in the appropriate order. Note that Map classes are intended for lookup purposes, not for ordering.

Java: Sorting different types of arrays to one another

I need to sort an array based on the positions held in another array.
What I have works, but it is kinda slow, is there a faster/better way to implement this?
2 Parts:
Part1
int i = mArrayName.size();
int temp = 0;
for(int j=0;j<i;j++){
temp = mArrayPosition.get(j);
mArrayName.set(temp, mArrayNameOriginal.get(j));
}
In this part, mArrayPosition is the position I would like the mArrayName to be in.
Ex.
input:
mArrayName= (one, two, three)
mArrayPosition = (2,0,1)
output:
mArrayName= (three, one two)
Part 2
int k=0;
int j=0;
do{
if(mArrayName.get(k)!=mArrayNameOriginal.get(j)){
j++;
}else{
mArrayIdNewOrder.set(k, mArrayId.get(j));
k++;
j=0;
}
}while(k < mArrayName.size());
}
In this part, mArrayName is the reordered name array, mArrayNameOriginal is the original name array.
Ex.
mArrayName = (three, one, two)
mArrayNameOriginal = (one, two, three)
Now I want to compare these two arrays, find which entries are equal and relate that to a new array that has their rowId number in it.
Ex.
input:
mArrayId = (001,002,003)
output:
mArrayIdNewOrder = (003,001,002)
So then I will have mArrayIdNewOrder id's matching up with the correct names in mArrayName.
Like I said these methods work, but is there a faster/better way to do it? I tried looking at Arrays.sort and comparators but they only seem to sort alphabetically or numerically. I saw something like I can create my own rules inside the comparator but it would probably end up being similar to what I already have.
Sorry for the confusing question. I'll try to clear up any ambiguities if needed.
The best performance read I've found is Android's Designing For Performance doc. You are violating a couple of the "Android way" style of doing things that will help you.
You are using multiple internal getters inside each loop for what looks like a simple value. Redo this by accessing the fields directly.
For extra credit, post your performance comparison results! I'd love to see em!
You could use some form of tuple, some class to hold both id and name. You'll just to have a java.util.Comparator that compares it accordingly, both elements will move together and your code will be cleaner.
This data structure might be convenient for the rest of your program... if not, just take things off it again and you're done.
If your order indexes are compact, i.e. from index 0 to size - 1, then just use an array and create the updated list afterwards? About something like
MyArray[] array = new MyArray[size];
for(int j=0;j< size;j++) {
array[ mArrayPosition.get(j) ] = mArrayName.get(j);
}
// create ArrayList from array

Help me understand question related to HashMap in Java

Im given a task which i am a little confused to understand. Here is the question statement:
The following program should read a file and store all its tokens in a member variable.
Your task is to write a single method that returns the number of items in tokenMap, the average length (as double value) of the elements in tokenMap, and the number of tokens starting with character "a".
Here the tokenMap is an object of type HashMap<String, Integer>;
I do have some idea about HashMap but what i want to know the "key value" for HashMap required is a single character or the whole word?? that i should store in tokenMap.
Also how can i compute the average length?
Looks like you have to use the entire word as the key.
The average length of tokens can be computed by summing the lengths of each token and dividing by the number of tokens.
In Java, you can find the number of tokens in the HashMap by tokenMap.size().
You can write loops that visit each member of the map like this:
for(String t: tokenMap.values()){
//t is a token
}
and if you look up String in the Java API docs you will see that it is easy to find the length of a String.
To compute the average length of the items in a hash map, you'll have to iterate over them all and count the length and calculate the average.
As for your other question about what to use for a key, how are we supposed to know? A hashmap can use practically any* value for a key.
*The value must be hashable, which is defined differently for different languages.
Reading the question closely, it seems that you have to read a file, extract each word and use it as the key value, and store the length of each key as the integer:
an example line
leads to a HashMap like this
an : 2
example : 7
line : 4
After you've built your map (made of keys mapping to entries, or seemingly elements in the question), you'll need to run some statistics over it to find
the number of keys (look at HashMap)
the average length of all keys (again, simple enough)
the number beginning with "a" (just look at the String)
Then make a value object containing these values and return it from the method that does the statistics.
I know I've given more information that you require, but someone else may benefit from a little extra help.
Guys there is some confusion. Im not asking for a solution. Im just confused for one thing.
For the time being, im gonna use String type as the key type.
The only confusion i have is once i read the file line by line, should i split it based upon words or based upon each character. So that the key value should be a single character type string or a String of whole word.
If you can go through the question statement, what do you suggest. That's all im asking.
should i split it based upon words or
based upon each character
The requirement is to make tokens, so you should split them based on words. Each word becomes a unique String key. It would make sense for the value to be the count of each token.
If the file you are reading has these three lines:
int alpha;
int beta;
float delta;
Then you should have something like
<"int", 2>
<";", 3>
<"alpha", 1>
<"beta", 1>
<"float", 1>
<"delta", 1>
(The semicolon may or may not be considered a token.)
Your average length would be ( 3x2 + 3x1 + 5 + 4 + 5 + 5) / 6.
Your length of tokens starting with "a" would be 5.0.
Look elsewhere on this forum for keySet and you should be good to go.

What is the fastest way to find an array within another array in Java?

Is there any equivalent of String.indexOf() for arrays? If not, is there any faster way to find an array within another other than a linear search?
Regardless of the elements of your arrays, I believe this is not much different than the string search problem.
This article provides a general intro to the various known algorithms.
Rabin-Karp and KMP might be your best options.
You should be able to find Java implementations of these algorithms and adapt them to your problem.
List<Object> list = Arrays.asList(myArray);
Collections.sort(list);
int index = Collections.binarySearch(list, find);
OR
public static int indexOf(Object[][] array, Object[] find){
for (int i = 0; i < array.length(); i ++){
if (Arrays.equals(array[i], find)){
return i;
}
}
return -1;
}
OR
public static int indexOf(Object[] array, Object find){
for (int i = 0; i < array.length(); i ++){
if (array[i].equals(find)){
return i;
}
}
return -1;
}
OR
Object[] array = ...
int index = Arrays.asList(array).indexOf(find);
As far as I know, there is NO way to find an array within another without a linear search. String.indexOf uses a linear search, just inside a library.
You should write a little library called indexOf that takes two arrays, then you will have code that looks just like indexOf.
But no matter how you do it, it's a linear search under the covers.
edit:
After looking at #ahmadabolkader's answer I kind of take this back. Although it's still a linear search, it's not as simple as just "implement it" unless you are restricted to fairly small test sets/results.
The problem comes when you want to see if ...aaaaaaaaaaaaaaaaaab fits into a string of (x1000000)...aaaaaaaaab (in other words, strings that tend to match most places in the search string).
My thought was that as soon as you found a first character match you'd just check all subsequent characters one-on-one, but that performance would degrade terrifyingly when most of the characters matched most of the time. There was a rolling hash method in #a12r's answer that sounded much better if this is a real-world problem and not just an assignment.
I'm just going to vote for #a12r's answer because of those awesome Wikipedia references.
The short answer is no - there is no faster way to find an array within an array by using some existing construct in Java. Based on what you described, consider creating a HashSet of arrays instead of an array of arrays.
Normally the way you find things in collections in java is
put them in a hashmap (dictionary) and look them up by their hash.
loop through each object and test its equality
(1) won't work for you because an array object's hash won't tell you that the contents are the same. You could write some sort of wrapper that would create a hashcode based on the contents (you'd also have to make sure equals returned values consistent with that).
(2) also will require a bit of work because object equality for arrays will only test that the objects are the same. You'd need to wrap the arrays with a test of the contents.
So basically, not unless you write it yourself.
You mean you have an array which elements also are array elements? If that is the case and the elements are sorted you might be able to use binarysearch from java.util.Arrays

Categories