Improving the speed of the HashMap

Improving the speed of the HashMap - java

Just coded my own realisation of HashMap with open addressing, key type is int and value type is long. But it works more slowly than exicted java realisation even when i just add a new values. Whats way to make it faster?
public class MyHashMap {
private int maxPutedId =0;
private int size;
private int[] keys;
private long[] values;
private boolean[] change;
public MyHashMap(int size){
this.size = size;
keys = new int[size];
values = new long[size];
change = new boolean[size];
}
public MyHashMap(){
this.size = 100000;
keys = new int[size];
values = new long[size];
change = new boolean[size];
}
public boolean put(int key, long value){
int k = 0;
boolean search = true;
for(int i = 0;i<maxPutedId+2;i++){
if(search&& !change[i] && keys[i] == 0 && values [i] == 0 ){
k=i;
search = false;
}
if(change[i] && keys[i] == key ){
return false;
}
}
keys[k] = key;
values[k] = value;
change[k] = true;
maxPutedId = k;
return true;
}
public Long get(int key) {
for (int i = 0; i < size; i++) {
if (change[i] && keys[i] == key) {
return values[i];
}
}
return null;
}
public int size(){
int s = 0;
for(boolean x: change){
if(x) s++;
}
return s;
}}

You have not implemented a hash table; there is no hashing going on. For example, your get() method is doing a linear traversal through the key array. A hash table implementation is supposed to be able to compute the array entry where the key is most likely to be found (and will in fact be found if it exists and there were no hash collisions).
A simple hash table would look like this: we first compute a hash from the key. Then we look at that slot in the table. Ideally, that's where the key will be found. However, if the key is not there, it could be due to collisions, so then we scan (assuming open addressing) looking for the key in subsequent slots - until we've looked through the whole table or found an unoccupied slot.
I wrote 'get' since it seemed simpler :-)
This is 'off the top of my head' code so you will need to check it carefully.
Long get(int key) {
int h = hash(key);
// look in principal location for this key
if (change[h] && keys[h] == key)
return values[h];
// nope, scan table (wrapping around at the end)
// and stop when we have found the key, scanned
// the whole table, or met an empty slot
int h0 = h; // save original position
while ((h = (h+1) % size) != h0 && change[h])
if ( keys[h] == key)
return values[h];
return null;
}
I probably should have written 'put' first to be more instructive.
The hash function, for int keys, could be computed as key % size. Whether that's a good hash depends on the distribution of your keys; you want a hash that avoids collisions.

Related

How to make my linear probe hash function more efficient?

I am struggling here to see if my linear probing technique is correct and if it is efficient at all. Is there any way for me to make it more efficient?
static void enterValues(int values[], int hashTable[])
{
for(int i = 0; i < values.length; i++){
int k = hashFunction(values[i]);
if(hashTable[k]== 0)
hashTable[k] = values[i];
else{
boolean b = false;
int counter = k%hashTable.length+1;
if(counter >= hashTable.length)
counter = 0;
while (!b) {
if (hashTable[counter] == 0) {
hashTable[counter] = values[i];
b = true;
} else {
counter = counter % hashTable.length+1;
}
}
}
}
}
static int hashFunction(int value)
{
return value % 10;
}
int values[] = {4371,1323,6173,4199,4344,9679,1989};
the output for the size 10 hashset is
9679,
4371,
1989,
1323,
6173,
4344,
0,
0,
0,
4199
Thank you for taking a look

Why your hash function just returns value % 10? It's better to return value % hashTable.length. I'd suggest you to use Integer.hashCode and then do a modulus with hashTable.length
int counter = k%hashTable.length+1;
if(counter >= hashTable.length)
counter = 0;
These two statements can be replaced by: int counter = k % hashTable.length
To make it more efficient you can have additional book-keeping where you store the unfilled indices in hash table, and while filling you query this unfilled array (or a better data-structure - which can be sorted-tree to make searching, insertion and deletion faster) when you have clashes.

It is incorrect. Consider what happens if value[i] is zero:
if (hashTable[k] == 0) {
hashTable[k] = values[i];
} else { .... }
Since you are using zero in the hashtable to mean that the entry is not used, and you are also assigning values directly into the table, your code cannot distinguish a zero value from an empty entry.

Removes a key at a given index r and its associated value from this symbol table

I need help understanding how to go about this... I need to remove the key at index r and its associated value from this list. Anything that would point me towards the right direction would be greatly appreciated! Inside the function I have written what I thought would help but its not really helping me
public class SortedArrayST<Key extends Comparable<Key>, Value> {
private static final int MIN_SIZE = 2;
private Key[] keys; // the keys array
private Value[] vals; // the values array
private int N = 0; // size of the symbol table
public SortedArrayST(int size) {
keys = (Key[])(new Comparable[size]);
vals = (Value[])(new Object[size]);
}
public int size() {
return N;
}
private void remove(int r) {
if (keys == null) return;
for(int i = 0; i < size(); i++){
// iterate through the list
// if key is at index r and if key is at associated value
// remove from list
}
}

There are a variety of ways. One is to shift the array values along, starting at the index you want to remove, then set the last item to Null and decrease N.
Also, bare in mind you're not setting N in your constructor.

How to implement a bag-of-words in java

Basically, I need a double-connected Map, which can retrieve value from key and the inverse, I have checked this link, BUT also should be sorted according to the values AND should take multiple values for a single key (I cannot guarantee that there won't be an exact freq for different keys).
So is there any structure with that criteria out there?
Below is the specific problem that imposed this need (maybe I have something missing while I was implementing this but if you know an answer to the above question then you can probably skip it):
I want to implement a bag-of-words method for some features. The idea is to keep only the top k bins with the greatest frequency of occurrence.
To make it more specific let's say I have a codebook
double[10000][d] codebook and a set of features double[][] features. For each line in features, which represent a feature I check the distance from each line in codebook and assign it to the bin having this line a centroid.
Then I increment the index of this bin by 1 until all features have been used.
Then, I would like to keep only the top k bins as results.
The part I am a bit stuck is the one for keeping only the top-k bins. I use a BoundedPriorityQueue<Feature> collection to achieve but I am not sure if there is some simpler approach.
public static BoundedPriorityQueue<Feature> boWquantizerLargerK(double[][] codebook, double[][] features, int featureLength, int maxNumWords) {
HashMap<Integer, Integer> boWMap = new HashMap<Integer, Integer>();
BoundedPriorityQueue<Feature> nn = new BoundedPriorityQueue<Feature>(new Feature(), maxNumWords);
for(int i = 0; i < features.length; i++) {
double[] distCodebook = new double[codebook.length];
for(int j = 0; j < codebook.length; j++) {
double[] dist = new double[featureLength];
for(int k = 0; k < featureLength; k++)
dist[k] = (codebook[j][k] - features[i][k])*(codebook[j][k] - features[i][k]);
distCodebook[j] = MathUtils.sum(dist);
}
Integer index = MathUtils.indexOfMin(distCodebook) + 1;
Integer freq;
if((freq = boWMap.get(index)) == null) {
boWMap.put(index, 1);
nn.offer(new Feature(1, index));
}
else {
boWMap.put(index, ++freq);
nn.offer(new Feature(freq, index));
}
}
return nn;
}
The Feature class is a simple implementation of Comparator:
public class Feature implements Comparator<Feature> {
private Integer freq;
private Integer word;
public Feature() {}
public Feature(Integer freq, Integer word) {
this.freq = freq;
this.word = word;}
public int compare(Feature o1, Feature o2) {
if ((o1).getFrequency() > (o2).getFrequency())
return -1;
else if ((o1).getFrequency() < (o2).getFrequency())
return 1;
else
return 0;}
public double getFrequency() {
return freq;}
}
To summarize the problem, I have a collection which has members pairs of values, the first representing the bin and the second the frequency. This collection is updated until all features have been processed and at which point I just want to keep the bins with the greatest values.
I am using a HashMap<Integer, Integer> structure for the collection and a BoundedPriorityQueue<Feature> for the top k bins.

Finding elements in array with binary search

Trying to use Arrays.binarySearch() to search for a string in an array and return the index. However each time I call Arrays.binarySearch() I get the following exception -
Exception in thread "main" java.lang.NullPointerException
at java.util.Arrays.binarySearch0(Unknown Source)
at java.util.Arrays.binarySearch(Unknown Source)
at project.ArrayDirectory.lookupNumber(ArrayDirectory.java:97)
at project.test.main(test.java:12)
Here is my ArrayDirectory class -
public class ArrayDirectory implements Directory {
static Entry[] directory = new Entry[50];
#Override
public void addEntry(String surname, String initials, int extension) {
int n = 0;
for (int i = 0; i < directory.length; i++) { // counting number of
// entries in array
if (directory[i] != null) {
n++;
}
}
if (n == directory.length) {
Entry[] temp = new Entry[directory.length * 2]; // if array is full
// double the
// length.
for (int i = 0; i < directory.length; i++)
temp[i] = directory[i];
directory = temp;
}
int position = -1;
for (int i = 0; i < directory.length; i++) {
position = i;
if (directory[i] != null) { // sorting the array into alphabetical
// order by surname.
int y = directory[i].getSurname().compareTo(surname);
if (y > 0) {
break;
}
}
else if (directory[i] == null) {
break;
}
}
System.arraycopy(directory, position, directory, position + 1,
directory.length - position - 1);
directory[position] = new Entry(initials, surname, extension); // placing
// new
// entry
// in
// correct
// position.
}
#Override
public int lookupNumber(String surname, String initials) {
// TODO Auto-generated method stub
Entry lookup = new Entry(surname, initials);
int index = Arrays.binarySearch(directory, lookup);
return index;
}
}
Any ideas how I use binary search to find the correct index?
Thank you for you help.
edit -
I have also overridden comapreToin my Entry class -
public int compareTo(Entry other) {
return this.surname.compareTo(other.getSurname());
}

In your invocation of
int index = Arrays.binarySearch(directory,lookup);
directory seems to contain only null elements. Check that you are initializing elements correctly.

I note two things:
static Entry [] directory = new Entry [1];
First, that code allocates space for one Entry in the array. It doesn't actually instantiate an Entry. That is, directory[0] is null. Secondly, a binary-search on an array with one entry is crazy. There is only one element. It must be directory[0]. Finally, you should sort your array to do a binary search on it.

The basic concept behind a binary search is the recursion of the following steps(Note the search assumes the list or array of elements is sorted in some form and the element exists there.):
Go to the middle element of the array.
check if the searched element is equal to the element at the middle. If it is then return its index.
if not then check if the searched element is 'smaller' or 'larger' than the element in the middle.
if it is smaller then go to step 1 using only the lower/first half of the array instead of the whole.
else go to step 1 using only the upper/last half of the array instead of the whole.
As the array is continuously divided in 2 it will eventually reach the size of 1 giving the result.
Now, suppose you are looking for an integer in an int array. Here is what the code would be like:
public static int binarySearch(int number, int[] array)
{
boolean isHere = false;
Integer index =0;
for(int i=0;i<array.length;i++)
{
if(array[i] == number)
{
isHere = true;
i = array.length;
}
}
if(!isHere)
{
index = -1;
}
else
{
int arrayStart = 0;
int arrayEnd = array.length;
index = binarySearch(number, arrayStart, arrayEnd, array);
}
return index;
}
private static int binarySearch(int number, int start, int end, int[] array)
{
// this formula ensures the index number will be preserved even if
// the array is divided later.
int middle = (start+ end)/ 2;
if(array[middle] == number)
{
return middle;
}
else
{
if(number < array[middle])
{
//searches the first half of the array
return binarySearch(number, start, middle, array);
}
else
{
// searches the last half of the array
return binarySearch(number, middle, end, array);
}
}
}
You can use the compareTo() method instead of <,>, & == operators in your example. The logic should still be the same.

Hash by Chaining VS Double Probing

I'm trying to compare between Chaining and Double probing.
I need to insert 40 integers to table size 100,
when I measure the time with nanotime (in java)
I get that the Double is faster.
thats because in the Insert methood of Chaining, I create every time LinkedListEntry,
and it's add time.
how can it be that Chaining is more faster than Double probing ? (that's what i read in wikipedia)
Thanks!!
this is the code of chaining:
public class LastChain
{
int tableSize;
Node[] st;
LastChain(int size) {
tableSize = size;
st = new Node[tableSize];
for (int i = 0; i < tableSize; i++)
st[i] = null;
}
private class Node
{
int key;
Node next;
Node(int key, Node next)
{
this.key = key;
this.next = next;
}
}
public void put(Integer key)
{
int i = hash(key);
Node first=st[i];
for (Node x = st[i]; x != null; x = x.next)
if (key.equals(x.key))
{
return;
}
st[i] = new Node(key, first);
}
private int hash(int key)
{ return key%tableSize;
}
}
}
and this is the relevant code from double probing:
public class HashDouble1 {
private Integer[] hashArray;
private int arraySize;
private Integer bufItem; // for deleted items
HashDouble1(int size) {
arraySize = size;
hashArray = new Integer[arraySize];
bufItem = new Integer(-1);
}
public int hashFunc1(int key) {
return key % arraySize;
}
public int hashFunc2(int key) {
return 7 - key % 7;
}
public void insert(Integer key) {
int hashVal = hashFunc1(key); // hash the key
int stepSize = hashFunc2(key); // get step size
// until empty cell or -1
while (hashArray[hashVal] != null && hashArray[hashVal] != -1) {
hashVal += stepSize; // add the step
hashVal %= arraySize; // for wraparound
}
hashArray[hashVal] = key; // insert item
}
}
in this way the insert in Double is more faster than Chaining.
how can i fix it?

Chaining works best with high load factors. Trying using 90 strings (not a well places selection of integers) in a table of 100.
Also chaining is much easier to implement removal/delete for.
Note: In HashMap, an Entry object is created whether it is chained or not, not there is no saving there.

Java has the special "feature" Objects take up a lot of memory.
Thus, for large datasets (where this will have any relevance) double probing will be good.
But as a very first thing, please change your Integer[] into int[] -> the memory usage will be one fourth or so and the performance will jump nicely.
But always with performance questions: measure, measure, measure, as your case will always be special.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Improving the speed of the HashMap - java

Related

How to make my linear probe hash function more efficient?

Removes a key at a given index r and its associated value from this symbol table

How to implement a bag-of-words in java

Finding elements in array with binary search

Hash by Chaining VS Double Probing

Categories

Resources