Closed addressing hash tables. How are they resized? - java

Reading about hopscotch hashing and trying to understand how it can be code I realized that in linear probing hash table variants we need to have a recursive approach to resize as follows:
create a back up array of the existing buckets
allocate a new array of the requested capacity
go over the back up array and rehash each element to get
the new position of the element in the new array and insert it in the
new array
when done release the backup array
And the code structure would be like:
public V put(Object key, Object value) {
//code
//we need to resize)
if(condition){
resize(2*keys.length);
return put(key, value);
}
//other code
}
private void resize(int newCapacity) {
//step 1
//step 2
//go over each element
for(Object key:oldKeys) {
put(key, value);
}
}
I don't like this structure as we recursively call put inside resize.
Is this the standard approach to resizing a hash table when using linear probing variants

Good question! Usually, in closed address hashing like hopscotch hashing, cuckoo hashing, or static perfect hashing where there's a chance that a rehash can fail, a single "rehash" step might have to sit in a loop trying to assign everything into a new table until it finds a way to do so that works.
You might want to consider having three methods - put, the externally visible function, rehash, an internal function, and tryPut, which tries to add an element, but might fail. You can then implement the functions like these, which are primarily for exposition and can definitely be optimized a bit:
public V put(Object key, Object value) {
V oldValue = get(key);
while (!tryPut(key, value)) {
rehash();
}
return oldValue;
}
private void rehash() {
increaseCapacity();
boolean success;
do {
success = true;
reallocateSpace();
for (each old key/value pair) {
if (!tryPut(key, value)) {
success = false;
break;
}
}
} while (!success);
}
private boolean tryPut(Object key, Object value) {
// Try adding the key/value pair using a
// hashtable specific implementation, returning
// true if it works and false otherwise.
}
There's no longer any risk of a weird recursion here, because tryPut never calls anything else.
Hope this helps!

Related

What is the use of LinkedHashMap.removeEldestEntry?

I am aware the answer to this question is easily available on the internet. I need to know what happens if I choose not to removeEldestEntry. Below is my code:
package collection;
import java.util.*;
public class MyLinkedHashMap {
private static final int MAX_ENTRIES = 2;
public static void main(String[] args) {
LinkedHashMap lhm = new LinkedHashMap(MAX_ENTRIES, 0.75F, false) {
protected boolean removeEldestEntry(Map.Entry eldest) {
return false;
}
};
lhm.put(0, "H");
lhm.put(1, "E");
lhm.put(2, "L");
lhm.put(3, "L");
lhm.put(4, "O");
System.out.println("" + lhm);
}
}
Even though I am not allowing the removeEldestEntry my code works fine.
So, internally what is happening?
removeEldestEntry always gets checked after an element was inserted. For example, if you override the method to always return true, the LinkedHashMap will always be empty, since after every put or putAll insertion, the eldest element will be removed, no matter what. The JavaDoc shows a very sensible example on how to use it:
protected boolean removeEldestEntry(Map.Entry eldest){
return size() > MAX_SIZE;
}
In an alternative way, you might only want to remove an entry if it is unimportant:
protected boolean removeEldestEntry(Map.Entry eldest){
if(size() > MAX_ENTRIES){
if(isImportant(eldest)){
//Handle an important entry here, like reinserting it to the back of the list
this.remove(eldest.getKey());
this.put(eldest.getKey(), eldest.getValue());
//removeEldestEntry will be called again, now with the next entry
//so the size should not exceed the MAX_ENTRIES value
//WARNING: If every element is important, this will loop indefinetly!
} else {
return true; //Element is unimportant
}
return false; //Size not reached or eldest element was already handled otherwise
}
Why can't people just answer the OP's simple question!
If removeEldestEntry returns false then no items will ever be removed from the map and it will essentially behave like a normal Map.
Expanding on the answer by DavidNewcomb:
I'm assuming that you are learning how to implement a cache.
The method LinkedHashMap.removeEldestEntry is a method very commonly used in cache data structures, where the size of the cache is limited to a certain threshold. In such cases, the removeEldestEntry method can be set to automatically remove the oldest entry when the size exceeds the threshold (defined by the MAX_ENTRIES attribute) - as in the example provided here.
On the other hand, when you override the removeEldestEntry method this way, you are ensuring that nothing ever happens when the MAX_ENTRIES threshold is exceeded. In other words, the data structure would not behave like a cache, but rather a normal map.
Your removeEldestEntry method is identical to the default implementation of LinkedHashMap.removeEldestEntry, so your LinkedHashMap will simply behave like a normal LinkedHashMap with no overridden methods, retaining whatever you values and keys put into it unless and until you explicitly remove them by calling remove, removeAll, clear, etc. The advantage of using LinkedHashMap is that the collection views (keySet(), values(), entrySet()) always return Iterators that traverse the keys and/or values in the order they were added to the Map.

Efficient search in datastructure ArrayList

I've an ArrayList which contains my nodes. A node has a source, target and costs. Now I have to iterate over the whole ArrayList. That lasts for for over 1000 nodes a while. Therefore I tried to sort my List by source. But to find the corresponding pair in the List I tried the binary search. Unfortunately that works only if I want to compare either source or target. But I have to compare both to get the right pair. Is there another possibility to search an ArrayList efficient?
Unfortunately, no. ArrayLists are not made to be efficiently searched. They are used to store data and not search it. If you want to merely know if an item is contained, I would suggest you use HashSet as the lookup will have a time complexitiy of O(1) instead of O(n) for the ArrayList (assuming that you have implemented a functioning equals method for your objects).
If you want to do fast searches for objects, I recommend using an implementation of Dictionnary like HashMap. If you can afford the space requirement, you can have multiple maps, each with different keys to have a fast lookup of your object no matter what key you have to search for. Keep in mind that the lookup also requires implementing a correct equals method. Unfortunately, this requires that each key be unique which may not be a brilliant idea in your case.
However, you can use a HashMapto store, for each source, a List of nodes that have the keyed source as a source. You can do the same for cost and target. That way you can reduce the number of nodes you need to iterate over substantially. This should prove to be a good solution with a scarcely connected network.
private HashMap<Source, ArrayList<Node>> sourceMap = new HashMap<Source, ArrayList<Node>>();
private HashMap<Target, ArrayList<Node>> targetMap = new HashMap<Target, ArrayList<Node>>();
private HashMap<Cost, ArrayList<Node>> costMap = new HashMap<Cost, ArrayList<Node>>();
/** Look for a node with a given source */
for( Node node : sourceMap.get(keySource) )
{
/** Test the node for equality with a given node. Equals method below */
if(node.equals(nodeYouAreLookingFor) { return node; }
}
In order to be sure that your code will work, be sure to overwrite the equals method. I know I have said so already but this is a very common mistake.
#Override
public boolean equals(Object object)
{
if(object instanceof Node)
{
Node node = (Node) object;
if(source.equals(node.getSource() && target.equals(node.getTarget()))
{
return true;
}
} else {
return false;
}
}
If you don't, the test will simply compare references which may or may not be equal depending on how you handle your objects.
Edit: Just read what you base your equality upon. The equals method should be implemented in your node class. However, for it to work, you need to implement and override the equals method for the source and target too. That is, if they are objects. Be watchful though, if they are Nodes too, this may result in quite some tests spanning all of the network.
Update: Added code to reflect the purpose of the code in the comments.
ArrayList<Node> matchingNodes = sourceMap.get(desiredSourde).retainAll(targetMap.get(desiredTarget));
Now you have a list of all nodes that match the source and target criteria. Provided that you are willing to sacrifice a bit of memory, the lookup above will have a complexity of O(|sourceMap| * (|sourceMap|+|targetMap|)) [1]. While this is superior to just a linear lookup of all nodes, O(|allNodeList|), if your network is big enough, which with 1000 nodes I think it is, you could benefit much. If your network follows a naturally occurring network, then, as Albert-László Barabási has shown, it is likely scale-free. This means that splitting your network into lists of at least source and target will likely (I have no proof for this) result in a scale-free size distribution of these lists. Therefore, I believe the complexity of looking up source and target will be substantially reduced as |sourceMap| and |targetMap| should be substantially lower than |allNodeList|.
You'll need to combine the source and target into a single comparator, e.g.
compare(T o1, T o2) {
if(o1.source < o2.source) { return -1; }
else if(o1.source > o2.source) { return 1; }
// else o1.source == o2.source
else if(o1.target < o2.target) { return -1; }
else if(o1.target > o2.target) { return 1; }
else return 0;
}
You can use the .compareTo() method to compares your nodes.
You can create two ArrayLists. The first sorted by source, the second sorted by target.
Then you can search by source or target using binarySearch on the corresponding List.
You can make a helper class to store source-target pairs:
class SourceTarget {
public final Source source; // public fields are OK when they're final and immutable.
public final Target target; // you can use getters but I'm lazy
// (don't give this object setters. Map keys should ideally be immutable)
public SourceTarget( Source s, Target t ){
source = s;
target = t;
}
#Override
public boolean equals( Object other ){
// Implement in the obvious way (only equal when both source and target are equal
}
#Override
public int hashCode(){
// Implement consistently with equals
}
}
Then store your things in a HashMap<SourceTarget, List<Node>>, with each source-target pair mapped to the list of nodes that have exactly that source-target pair.
To retrieve just use
List<Node> results = map.get( new SourceTarget( node.source, node.target ) );
Alternatively to making a helper class, you can use the comparator in Zim-Zam's answer and a TreeMap<Node,List<Node>> with a representative Node object acting as the SourceTarget pair.

Limited SortedSet

i'm looking for an implementation of SortedSet with a limited number of elements. So if there are more elements added then the specified Maximum the comparator decides if to add the item and remove the last one from the Set.
SortedSet<Integer> t1 = new LimitedSet<Integer>(3);
t1.add(5);
t1.add(3);
t1.add(1);
// [1,3,5]
t1.add(2);
// [1,2,3]
t1.add(9);
// [1,2,3]
t1.add(0);
// [0,1,2]
Is there an elegant way in the standard API to accomplish this?
I've wrote a JUnit Test for checking implementations:
#Test
public void testLimitedSortedSet() {
final LimitedSortedSet<Integer> t1 = new LimitedSortedSet<Integer>(3);
t1.add(5);
t1.add(3);
t1.add(1);
System.out.println(t1);
// [1,3,5]
t1.add(2);
System.out.println(t1);
// [1,2,3]
t1.add(9);
System.out.println(t1);
// [1,2,3]
t1.add(0);
System.out.println(t1);
// [0,1,2]
Assert.assertTrue(3 == t1.size());
Assert.assertEquals(Integer.valueOf(0), t1.first());
}
With the standard API you'd have to do it yourself, i.e. extend one of the sorted set classes and add the logic you want to the add() and addAll() methods. Shouldn't be too hard.
Btw, I don't fully understand your example:
t1.add(9);
// [1,2,3]
Shouldn't the set contain [1,2,9] afterwards?
Edit: I think now I understand: you want to only keep the smallest 3 elements that were added to the set, right?
Edit 2: An example implementation (not optimised) could look like this:
class LimitedSortedSet<E> extends TreeSet<E> {
private int maxSize;
LimitedSortedSet( int maxSize ) {
this.maxSize = maxSize;
}
#Override
public boolean addAll( Collection<? extends E> c ) {
boolean added = super.addAll( c );
if( size() > maxSize ) {
E firstToRemove = (E)toArray( )[maxSize];
removeAll( tailSet( firstToRemove ) );
}
return added;
}
#Override
public boolean add( E o ) {
boolean added = super.add( o );
if( size() > maxSize ) {
E firstToRemove = (E)toArray( )[maxSize];
removeAll( tailSet( firstToRemove ) );
}
return added;
}
}
Note that tailSet() returns the subset including the parameter (if in the set). This means that if you can't calculate the next higher value (doesn't need to be in the set) you'll have to readd that element. This is done in the code above.
If you can calculate the next value, e.g. if you have a set of integers, doing something tailSet( lastElement + 1 ) would be sufficient and you'd not have to readd the last element.
Alternatively you can iterate over the set yourself and remove all elements that follow the last you want to keep.
Another alternative, although that might be more work, would be to check the size before inserting an element and remove accordingly.
Update: as msandiford correctly pointed out, the first element that should be removed is the one at index maxSize. Thus there's no need to readd (re-add?) the last wanted element.
Important note:
As #DieterDP correctly pointed out, the implementation above violates the Collection#add() api contract which states that if a collection refuses to add an element for any reason other than it being a duplicate an excpetion must be thrown.
In the example above the element is first added but might be removed again due to size constraints or other elements might be removed, so this violates the contract.
To fix that you might want to change add() and addAll() to throw exceptions in those cases (or maybe in any case in order to make them unusable) and provide alterante methods to add elements which don't violate any existing api contract.
In any case the above example should be used with care since using it with code that isn't aware of the violations might result in unwanted and hard to debug errors.
I'd say this is a typical application for the decorator pattern, similar to the decorator collections exposed by the Collections class: unmodifiableXXX, synchronizedXXX, singletonXXX etc. I would take Guava's ForwardingSortedSet as base class, and write a class that decorates an existing SortedSet with your required functionality, something like this:
public final class SortedSets {
public <T> SortedSet<T> maximumSize(
final SortedSet<T> original, final int maximumSize){
return new ForwardingSortedSet<T>() {
#Override
protected SortedSet<T> delegate() {
return original;
}
#Override
public boolean add(final T e) {
if(original.size()<maximumSize){
return original.add(e);
}else return false;
}
// implement other methods accordingly
};
}
}
No, there is nothing like that using existing Java Library.
But yes, you can build a one like below using composition. I believe it will be easy.
public class LimitedSet implements SortedSet {
private TreeSet treeSet = new TreeSet();
public boolean add(E e) {
boolean result = treeSet.add(e);
if(treeSet.size() >= expectedSize) {
// remove the one you like ;)
}
return result;
}
// all other methods delegate to the "treeSet"
}
UPDATE
After reading your comment
As you need to remove the last element always:
you can consider maintaining a stack internally
it will increase memory complexity with O(n)
but possible to retrieve the last element with just O(1)... constant time
It should do the trick I believe

Simple database-like collection class in Java

The problem: Maintain a bidirectional many-to-one relationship among java objects.
Something like the Google/Commons Collections bidi maps, but I want to allow duplicate values on the forward side, and have sets of the forward keys as the reverse side values.
Used something like this:
// maintaining disjoint areas on a gameboard. Location is a space on the
// gameboard; Regions refer to disjoint collections of Locations.
MagicalManyToOneMap<Location, Region> forward = // the game universe
Map<Region, <Set<Location>>> inverse = forward.getInverse(); // live, not a copy
Location parkplace = Game.chooseSomeLocation(...);
Region mine = forward.get(parkplace); // assume !null; should be O(log n)
Region other = Game.getSomeOtherRegion(...);
// moving a Location from one Region to another:
forward.put(parkplace, other);
// or equivalently:
inverse.get(other).add(parkplace); // should also be O(log n) or so
// expected consistency:
assert ! inverse.get(mine).contains(parkplace);
assert forward.get(parkplace) == other;
// and this should be fast, not iterate every possible location just to filter for mine:
for (Location l : mine) { /* do something clever */ }
The simple java approaches are: 1. To maintain only one side of the relationship, either as a Map<Location, Region> or a Map<Region, Set<Location>>, and collect the inverse relationship by iteration when needed; Or, 2. To make a wrapper that maintains both sides' Maps, and intercept all mutating calls to keep both sides in sync.
1 is O(n) instead of O(log n), which is becoming a problem. I started in on 2 and was in the weeds straightaway. (Know how many different ways there are to alter a Map entry?)
This is almost trivial in the sql world (Location table gets an indexed RegionID column). Is there something obvious I'm missing that makes it trivial for normal objects?
I might misunderstand your model, but if your Location and Region have correct equals() and hashCode() implemented, then the set of Location -> Region is just a classical simple Map implementation (multiple distinct keys can point to the same object value). The Region -> Set of Location is a Multimap (available in Google Coll.). You could compose your own class with the proper add/remove methods to manipulate both submaps.
Maybe an overkill, but you could also use in-memory sql server (HSQLDB, etc). It allows you to create index on many columns.
I think you could achieve what you need with the following two classes. While it does involve two maps, they are not exposed to the outside world, so there shouldn't be a way for them to get out of sync. As for storing the same "fact" twice, I don't think you'll get around that in any efficient implementation, whether the fact is stored twice explicitly as it is here, or implicitly as it would be when your database creates an index to make joins more efficient on your 2 tables. you can add new things to the magicset and it will update both mappings, or you can add things to the magicmapper, which will then update the inverse map auotmatically. The girlfriend is calling me to bed now so I cannot run this through a compiler - it should be enough to get you started. what puzzle are you trying to solve?
public class MagicSet<L> {
private Map<L,R> forward;
private R r;
private Set<L> set;
public MagicSet<L>(Map forward, R r) {
this.forward = map;
this.r = r;
this.set = new HashSet<L>();
}
public void add(L l) {
set.add(l);
forward.put(l,r);
}
public void remove(L l) {
set.remove(l);
forward.remove(l);
}
public int size() {
return set.size();
}
public in contains(L l){
return set.contains(l);
}
// caution, do not use the remove method from this iterator. if this class was going
// to be reused often you would want to return a wrapped iterator that handled the remove method properly. In fact, if you did that, i think you could then extend AbstractSet and MagicSet would then fully implement java.util.Set.
public Iterator iterator() {
return set.iterator();
}
}
public class MagicMapper<L,R> { // note that it doesn't implement Map, though it could with some extra work. I don't get the impression you need that though.
private Map<L,R> forward;
private Map<R,MagicSet<L>> inverse;
public MagicMapper<L,R>() {
forward = new HashMap<L,R>;
inverse = new HashMap<R,<MagicSet<L>>;
}
public R getForward(L key) {
return forward.get(key);
}
public Set<L> getBackward(R key) {
return inverse.get(key); // this assumes you want a null if
// you try to use a key that has no mapping. otherwise you'd return a blank MagicSet
}
public void put (L l, R r) {
R oldVal = forward.get(l);
// if the L had already belonged to an R, we need to undo that mapping
MagicSet<L> oldSet = inverse.get(oldVal);
if (oldSet != null) {oldSet.remove(l);}
// now get the set the R belongs to, and add it.
MagicSet<L> newSet = inverse.get(l);
if (newSet == null) {
newSet = new MagicSet<L>(forward, r);
inverse.put(r,newSet);
}
newSet.add(l); // magically updates the "forward" map
}
}

On using Enum based Singleton to cache large objects (Java)

Is there any better way to cache up some very large objects, that can only be created once, and therefore need to be cached ? Currently, I have the following:
public enum LargeObjectCache {
INSTANCE;
private Map<String, LargeObject> map = new HashMap<...>();
public LargeObject get(String s) {
if (!map.containsKey(s)) {
map.put(s, new LargeObject(s));
}
return map.get(s);
}
}
There are several classes that can use the LargeObjects, which is why I decided to use a singleton for the cache, instead of passing LargeObjects to every class that uses it.
Also, the map doesn't contain many keys (one or two, but the key can vary in different runs of the program) so, is there another, more efficient map to use in this case ?
You may need thread-safety to ensure you don't have two instance of the same name.
It does matter much for small maps but you can avoid one call which can make it faster.
public LargeObject get(String s) {
synchronized(map) {
LargeObject ret = map.get(s);
if (ret == null)
map.put(s, ret = new LargeObject(s));
return ret;
}
}
As it has been pointed out, you need to address thread-safety. Simply using Collections.synchronizedMap() doesn't make it completely correct, as the code entails compound operations. Synchronizing the entire block is one solution. However, using ConcurrentHashMap will result in a much more concurrent and scalable behavior if it is critical.
public enum LargeObjectCache {
INSTANCE;
private final ConcurrentMap<String, LargeObject> map = new ConcurrentHashMap<...>();
public LargeObject get(String s) {
LargeObject value = map.get(s);
if (value == null) {
value = new LargeObject(s);
LargeObject old = map.putIfAbsent(s, value);
if (old != null) {
value = old;
}
}
return value;
}
}
You'll need to use it exactly in this form to have the correct and the most efficient behavior.
If you must ensure only one thread gets to even instantiate the value for a given key, then it becomes necessary to turn to something like the computing map in Google Collections or the memoizer example in Brian Goetz's book "Java Concurrency in Practice".

Categories