How do I get a ConcurrentHashMap with weak keys and identity hashes in Java? I think Google Guava Collections can give such a thing, but can I get it from the standard library? What other options do I have?
I think Google Guava Collections can give such a thing, but can I get it from the standard library?
The short answer to that is No. Java SE does not implement this particular combination.
You could instantiate a java.util.concurrent.ConcurrentHashMap with WeakReference keys, and do some extra work to implement removal of map entries for broken references, but that won't give you identity hash semantics.
You could instantiate a java.util.IdentityHashMap with WeakReference keys, and do some extra work to implement removal of map entries for broken references, but that won't give you concurrent behaviour.
Using a java.util.WeakHashMap won't give you either concurrency or identity hashing.
You could (in theory) wrap the key class in something that overrode the natural equals and hashcode methods. But that is most likely to be unusable.
I don't think it would be possible to do this by overriding methods in either ConcurrentHashMap or IdentityHashMap.
Maybe the only viable option would be to change the key classes equals and hashcode methods to be identity based. But that won't work for "built in" key types (especially final ones) or for cases where you need value-based equals/hashcode in other parts of the application.
The Google Guava implementation appears the easiest way to go. One may initialize the required map with new MapMaker().weakKeys().makeMap() and use just as one would use java.util.concurrent.ConcurrentHashMap. See the apidoc for more details.
if your application is under spring framework ( version is gt 3.2 ), you can consider to use org.springframework.util.ConcurrentReferenceHashMap. Below is its description:
A ConcurrentHashMap that uses soft or weak references for both keys and values.
This class can be used as an alternative to Collections.synchronizedMap(new WeakHashMap>()) in order to support better performance when accessed concurrently. This implementation follows the same design constraints as ConcurrentHashMap with the exception that null values and null keys are supported.
NOTE: The use of references means that there is no guarantee that items placed into the map will be subsequently available. The garbage collector may discard references at any time, so it may appear that an unknown thread is silently removing entries.
If not explicitly specified, this implementation will use soft entry references.
search ConcurrentWeakIdentityHashMap, you will get many examples. I wrote an implement myself, for I think the hashCode of org/ehcache/core/internal/util/ConcurrentWeakIdentityHashMap$WeakReference is so bad.
Example of ehcache3
Example I wrote
Pull Rquest to fix the ehcache3 ConcurrentWeakIdentityHashMap Key hashCode
Related
As we all know that ConcurrentHashMap is better in performance but can we have any scenario where Hashtable is better?
I'll say this before answering the question: do not ever use Hashtable anymore. Hashtable is a legacy tool from Java 1.0, before the superior collections framework introduced with Java 2 going onwards.
If you require a simple hash map, use HashMap. If you require performing thread-safety, use ConcurrentHashMap. If you require plain basic and non-performing thread-safety, wrap a HashMap into Collections.synchronizedMap(Map).
Now to actually answer the question as is, which is purely compare two specific classes without seeing the spectrum of possibilities:
Yes, such scenarios exist
From the Hashtable documentation:
If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
So yes, Hashtable is appropriate for scenarios where you need a thread-safe implementation, but do not "desire" a highly-concurrent implementation. This, again strictly in the exclusive comparison between ConcurrentHashMap and Hashtable.
Also, if you need Enumeration[1], Hashtable has a direct support, while you have to go through Collections.enumeration(...) for other Maps.
1. Enumeration is also a Java 1.0 class. Switch to Iterator (if using Java 2 to 7) or Stream (if using Java 8+)
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Use cases for IdentityHashMap
What could be a practical use of the IdentityHashMap introduced in Java 5?
Have a look at the Java Docs :-)
A typical use of this class is topology-preserving object graph
transformations, such as serialization or deep-copying. To perform
such a transformation, a program must maintain a "node table" that
keeps track of all the object references that have already been
processed. The node table must not equate distinct objects even if
they happen to be equal. Another typical use of this class is to
maintain proxy objects. For example, a debugging facility might wish
to maintain a proxy object for each object in the program being
debugged.
On a side note: it's available since version 1.4, not Java 5 or 6...
For adding dynamic fields to objects.
Some language directly support dynamic fields: anybody can add any field to any object any time.
This is handy when you want to associate some information to objects, unforseenable by object designer.
Java doesn't have real dynamic field. We can simulate it by using an identity map to associate an object to some information of some kind.
WeakHashMap is better for the purpose; it is an identity map too, and it doesn't add additional strong reference to the object. So it is much closer to the dynamic field concept.
Concurrency is the remaining problem. If two threads accessing the same dynamic field of two different objects, there shouldn't be dependencies among two threads. We can solve it by some kind of concurrent weak hashmap. However the performance isn't ideal compared to normal field access.
Think about java.lang.ThreadLocal, adding dynamic field to threads; and java.lang.ClassValue, adding dynamic field to classes. They aren't strictly necessary - we can achieve the same thing with concurrent weak maps. They exist for performance reason. JDK can "hack" into Thread/Class to add supports to achieve faster lookup.
When serializing mutable objects you want to keep track of the objects you have serialized and their reference id. You cannot use equality as you cannot trust mutable objects to use identity checks for equals and to not change. e.g. Date is mutable and equals compares contents.
Used rarely. It implements Map interface but used in rare cases wherein reference-equality semantics are required.
When I implement a collection that uses hashes for optimizing access, should I cache the hash values or assume an efficient implementation of hashCode()?
On the other hand, when I implement a class that overrides hashCode(), should I assume that the collection (i.e. HashSet) caches the hash?
This question is only about performance vs. memory overhead. I know that the hash value of an object should not change.
Clarification:
A mutable object would of course have to clear the cached value when it is changed, whereas the collection relies on objects not changing. But this is not relevant for my question.
When designing Guava's ImmutableSet and ImmutableMap classes, we opted not to cache hash codes. This way, you'll get better performance from hash code caching when and only when you care enough to do the caching yourself. If we cached them ourselves, we'd be costing you extra time and memory even in the case that you care deeply about speed and space!
It's true that HashMap does this caching, but it was HashMap's author (Josh Bloch) who strongly suggested we not follow that precedent!
Edit: oh, also, if your hashCode() is slow, the caching by the collection only addresses half of the problem anyway, as hashCode() still must be invoked on the object passed in to get() no matter what.
Considering that java.lang.String caches its hash, i guess that hashcode() is supposed to be fast.
So as first approach, I would not cache hashes in my collection.
In my objects that I use, I would not cache hash code unless it is oviously slow, and only do it if profiling tell me so.
If my objects will be used by others, i would probubly consider cachnig hash codes sooner (but needs measurements anyway).
On the other hand, when I implement a class that overrides hashcode(),
should I assume that the collection (i.e. HashSet) caches the hash?
No, you should not make any assumptions beyond the scope of the class you are writing.
Of course you should try to make your hashCode cheap. If it isn't, and your class is immutable, create the hashCode on initialization or lazily upon the first request (see java.lang.String). If your class is not immutable, I don't see any other option than to re-calculate the hashCode every time.
I'd say in most cases you can rely on efficient implementations of hashCode(). AFAIK, that method is only invoked on lookup methods (like contains, get etc.) or methods that change the collection (add/put, remove etc.).
Thus, in most cases there shouldn't be any need to cache hashes yourself.
Why do you want to cache it? You need to ask objects what their hashcode is while you're working with it to allocate it to a hash bucket (and any objects that are in the same bucket that may have the same hashcode), but then you can forget it.
You could store objects in a wrapper HashNode or something, but I would try implementing it first without caching (just like HashSet et al does) and see if you need the added performance and complexity before going there.
I need a Concurrent Hash Map with Weak or Soft keys were the equality is equals and not ==.
For this kind of keys, google collection chooses == by default.
Is there a way to override this choice? How should I proceed?
Best regards,
Nicolas.
You can't do that in google-collections. You can't do it in guava either, currently. However, they have added an Equivalence interface and the implementations you'd expect for it (equals, null-aware equals and ==) recently and it seems like they might allow you to specify what Equivalence should be used for keys/values in the future (see this issue). MapMaker code seems to be undergoing some changes at this time.
You can use java.util.WeakHashMap, wrapped with a call to Collections.synchronizedMap()
It won't be as fast as a ConcurrentHashMap if thread contention is significant. But it has the behaviour you want.
The hash structures I am aware of - HashTable, HashSet & HashMap.
Do they all use the bucket structure - ie when two hashcodes are similar exactly the same one element does not overwrite the other, instead they are placed in the same bucket associated with that hashcode?
In Sun's current implementation of the Java library, IdentityHashMap and the internal implementation in ThreadLocal use probing structures.
The general problem with probing hash tables in Java is that hashCode and equals may be relatively expensive. Therefore you want to cache the hash value. You can't have an array that mixes references and primitives, so you'd need to do something relatively complicated. On the other hand, if you are using == to check matches, then you can check many references without a performance problem.
IIRC, Azul had a fast concurrent quadratic probing hash map.
A linked list is used at each bucket for dealing with hash collisions. Note that the java HashSet is actually implemented by a HashMap underneath (all keys being mapped to the same singleton value across all HashSets) and hence uses the same bucket structure.
If an element is added, its equality is checked against all items in the linked list (via .equals) before it is added at the end. Hence having hash collisions is particularly bad, as this could be an expensive check as the linked list becomes larger.
I believe Java hash structures all use a form of chaining to deal with colisions when performing the hashing - which places the items that have the same hash into a list.
I do not believe that Java uses open addressing for it's hash based data structures (open addressing recomputes hashes based on retry sequences until it finds an open slit in the table)
No -- open addressing is an alternate method of representing hash tables, where objects are stored directly in the table, instead of residing in a linked list. Only one object can be stored at a given index, so resolving collisions is more complicated.
When adding an object for which another object already resides at the same index, a probing sequence is used to determine the new index at which to store the new object. Removing objects is also more complicated, since you if you remove an object, you need to leave a marker that says "there used to be an object here"; for more details, see Wikipedia.
Open addressing is preferable when the objects being stored as small and will rarely be deleted. Open addressing has improved cache performance, since you don't need to go through an extra level of indirection walking a linked list.
The classes you mentioned -- HashTable, HashSet, and HashMap don't use open addressing, but you could easily create new classes that implemented open addressing and provided the same APIs as those classes.
The apis define the behaviour, the internals of how Hash collisions are managed doesn't affect the guarantees of the API ... the performance impact of bad hash value computation is another story. Let's just hash everything to 42 and see how it behaves.
Maps and Sets are the interfaces that determine the behavior of a HashSet or HashMap. A HashSet is a Set, and so it behaves like a Set (ie duplicates are not allowed). A HashMap acts like a Map - it will not overwrite a key with a similar hashcode, but it will overwrite a key, if the same exact key is used again. This will be the same regardless of what data structure is backing the Map internally. See the javadoc for Sets and HashMaps for more.
Did you mean to ask something about the specific implementation of one of these structures?
Except the HashSet. Set is by definition unique elements.
This was a mistake. Please see the comments below.