I need a Concurrent Hash Map with Weak or Soft keys were the equality is equals and not ==.
For this kind of keys, google collection chooses == by default.
Is there a way to override this choice? How should I proceed?
Best regards,
Nicolas.
You can't do that in google-collections. You can't do it in guava either, currently. However, they have added an Equivalence interface and the implementations you'd expect for it (equals, null-aware equals and ==) recently and it seems like they might allow you to specify what Equivalence should be used for keys/values in the future (see this issue). MapMaker code seems to be undergoing some changes at this time.
You can use java.util.WeakHashMap, wrapped with a call to Collections.synchronizedMap()
It won't be as fast as a ConcurrentHashMap if thread contention is significant. But it has the behaviour you want.
Related
The CacheBuilder methods weakValues() and softValues() both contain this line in their javadocs:
Note: when this method is used, the resulting cache will use identity (==) comparison to determine equality of values.
How exactly does this affect behaviour? As far as I can tell there are no public methods on the Cache or LoadingCache interface which would require testing for value equality. Does it affect the asMap() view?
Guava internally caches those soft/weak values. If the reference of 2 weak/soft values are equal then the content must be equal too.
Guava only compares those values in methods like contains(...), remove(...) or replace(...) and is used to find a specifc entry and check if it really exists or to remove a key. So guava first looks for the specific entry. If found it removes the entry.
I think the main purpose is to detect an excplicit removal of an entry, because if the value, which the reference points to is already null, then it was not an explicit removal, but it was removed by guava internally (computed).
It's nothing to worry about. It's just the way guava handles the removal of old entries.
There is no way to change this and you really don't have to care about this, guava will handle this :D
I guess it uses Google's ConcurrentMap internally and uses value comparison for remove and replace.
How do I get a ConcurrentHashMap with weak keys and identity hashes in Java? I think Google Guava Collections can give such a thing, but can I get it from the standard library? What other options do I have?
I think Google Guava Collections can give such a thing, but can I get it from the standard library?
The short answer to that is No. Java SE does not implement this particular combination.
You could instantiate a java.util.concurrent.ConcurrentHashMap with WeakReference keys, and do some extra work to implement removal of map entries for broken references, but that won't give you identity hash semantics.
You could instantiate a java.util.IdentityHashMap with WeakReference keys, and do some extra work to implement removal of map entries for broken references, but that won't give you concurrent behaviour.
Using a java.util.WeakHashMap won't give you either concurrency or identity hashing.
You could (in theory) wrap the key class in something that overrode the natural equals and hashcode methods. But that is most likely to be unusable.
I don't think it would be possible to do this by overriding methods in either ConcurrentHashMap or IdentityHashMap.
Maybe the only viable option would be to change the key classes equals and hashcode methods to be identity based. But that won't work for "built in" key types (especially final ones) or for cases where you need value-based equals/hashcode in other parts of the application.
The Google Guava implementation appears the easiest way to go. One may initialize the required map with new MapMaker().weakKeys().makeMap() and use just as one would use java.util.concurrent.ConcurrentHashMap. See the apidoc for more details.
if your application is under spring framework ( version is gt 3.2 ), you can consider to use org.springframework.util.ConcurrentReferenceHashMap. Below is its description:
A ConcurrentHashMap that uses soft or weak references for both keys and values.
This class can be used as an alternative to Collections.synchronizedMap(new WeakHashMap>()) in order to support better performance when accessed concurrently. This implementation follows the same design constraints as ConcurrentHashMap with the exception that null values and null keys are supported.
NOTE: The use of references means that there is no guarantee that items placed into the map will be subsequently available. The garbage collector may discard references at any time, so it may appear that an unknown thread is silently removing entries.
If not explicitly specified, this implementation will use soft entry references.
search ConcurrentWeakIdentityHashMap, you will get many examples. I wrote an implement myself, for I think the hashCode of org/ehcache/core/internal/util/ConcurrentWeakIdentityHashMap$WeakReference is so bad.
Example of ehcache3
Example I wrote
Pull Rquest to fix the ehcache3 ConcurrentWeakIdentityHashMap Key hashCode
When I implement a collection that uses hashes for optimizing access, should I cache the hash values or assume an efficient implementation of hashCode()?
On the other hand, when I implement a class that overrides hashCode(), should I assume that the collection (i.e. HashSet) caches the hash?
This question is only about performance vs. memory overhead. I know that the hash value of an object should not change.
Clarification:
A mutable object would of course have to clear the cached value when it is changed, whereas the collection relies on objects not changing. But this is not relevant for my question.
When designing Guava's ImmutableSet and ImmutableMap classes, we opted not to cache hash codes. This way, you'll get better performance from hash code caching when and only when you care enough to do the caching yourself. If we cached them ourselves, we'd be costing you extra time and memory even in the case that you care deeply about speed and space!
It's true that HashMap does this caching, but it was HashMap's author (Josh Bloch) who strongly suggested we not follow that precedent!
Edit: oh, also, if your hashCode() is slow, the caching by the collection only addresses half of the problem anyway, as hashCode() still must be invoked on the object passed in to get() no matter what.
Considering that java.lang.String caches its hash, i guess that hashcode() is supposed to be fast.
So as first approach, I would not cache hashes in my collection.
In my objects that I use, I would not cache hash code unless it is oviously slow, and only do it if profiling tell me so.
If my objects will be used by others, i would probubly consider cachnig hash codes sooner (but needs measurements anyway).
On the other hand, when I implement a class that overrides hashcode(),
should I assume that the collection (i.e. HashSet) caches the hash?
No, you should not make any assumptions beyond the scope of the class you are writing.
Of course you should try to make your hashCode cheap. If it isn't, and your class is immutable, create the hashCode on initialization or lazily upon the first request (see java.lang.String). If your class is not immutable, I don't see any other option than to re-calculate the hashCode every time.
I'd say in most cases you can rely on efficient implementations of hashCode(). AFAIK, that method is only invoked on lookup methods (like contains, get etc.) or methods that change the collection (add/put, remove etc.).
Thus, in most cases there shouldn't be any need to cache hashes yourself.
Why do you want to cache it? You need to ask objects what their hashcode is while you're working with it to allocate it to a hash bucket (and any objects that are in the same bucket that may have the same hashcode), but then you can forget it.
You could store objects in a wrapper HashNode or something, but I would try implementing it first without caching (just like HashSet et al does) and see if you need the added performance and complexity before going there.
I have a hash-based collection of objects, such as HashSet or HashMap. What issues can I run into when the implementation of hashCode() is such that it can change with time because it's computed from some mutable fields?
How does it affect Hibernate? Is there any reason why having hashCode() return object's ID by default is bad? All not-yet-persisted objects have id=0, if that matters.
What is the reasonable implementation of hashCode for Hibernate-mapped entities? Once set the ID is immutable, but it's not true for the moment of saving an entity to database.
I'm not worried about performance of a HashSet with a dozen entities with key=0. What I care about is whether it's safe for my application and Hibernate to use ID as hash code, because ID changes as it is generated on persist.
If the hash code of the same object changes over time, the results are basically unpredictable. Hash collections use the hash code to assign objects to buckets -- if your hash code suddenly changes, the collection obviously doesn't know, so it can fail to find an existing object because it hashes to a different bucket now.
Returning an object's ID by itself isn't bad, but if many of them have id=0 as you mentioned, it will reduce the performance of the hash table: all objects with the same hash code go into the same bucket, so your hash table is now no better than a linear list.
Update: Theoretically, your hash code can change as long as nobody else is aware of it -- this implies exactly what #bestsss mentioned in his comment, which is to remove your object from any collections that may be holding it and insert it again once the hash code has changed. In practice, a better alternative is to generate your hash code from the actual content fields of your object rather than relying on the database ID.
If you add an object to a hash-based collection, then mutate its state so as to change its hashcode (and by implication probably the behaviour in .equals() calls), you may see effects including but not limited to:
Stuff you put in the collection seeming to not be there any more
Getting something out which is different to what you asked for
This is surely not what you want. So, I recommend making the hashcode only out of immutable fields. This is usually done by making the fields final and setting their values in the constructor.
http://community.jboss.org/wiki/EqualsandHashCode
Don’t change hashcode of elements in hash based collection after put.
Many programmers fall into the pitfall.
You could think hashcode is kind of address in collection, so you couldn’t change address of an element after it’s put in the collection.
The Javadoc spefically says that the built-in Collections don't support this. So don't do it.
Can the following piece of code be rewritten w/o using Collections.synchronizedMap() yet maintaining correctness at concurrency?
Collections.synchronizedMap(new WeakHashMap<Class, Object>());
i.e. is there something from java.util.concurrent one can use instead? Note that merely replacing with
new ConcurrentHashMap<Class, Object>(new WeakHashMap<Class, Object>()));
obviously won't work
Guava's CacheBuilder class allows you to do this easily.
CacheBuilder.newBuilder().weakKeys().build()
Note that this changes key equality semantics to be == instead of .equals() which will not matter in your case of using Class instances but is a potential pitfall.
I don't believe there is. In fact the javadoc suggests using Collections.synchronizedMap()
"Like most collection classes, this class is not synchronized. A synchronized WeakHashMap may be constructed using the Collections.synchronizedMap method."
Cafeine is a popular competitor of Guava cache.
- keys automatically wrapped in weak references
- values automatically wrapped in weak or soft references
usage:
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
.weakKeys()
.weakValues()
.build(key -> createExpensiveGraph(key));
Does wrapping the WeakHashMap in a synchronized map still work
correctly for what you want to do, since the garbage collector can
modify the weakreferences directly at anytime, bypassing the
synchronized map wrapper? I think WeakHashMap only truly works in a
single threaded model.
As mentioned above, the documentation for WeakHashMap at https://docs.oracle.com/javase/7/docs/api/java/util/WeakHashMap.html specifically says:
"A synchronized WeakHashMap may be constructed using the
Collections.synchronizedMap method"
Which implies to me that this technique must work in tandem with the garbage collector's behavior (unless the documentation is buggy!)
If you are using Java 7 and above, this use case is solved in a thread-safe manner with ClassValue https://docs.oracle.com/javase/7/docs/api/java/lang/ClassValue.html If you require the use of remove, think carefully about concurrency and read the doc thoroughly.
If you are using Java 6 or below. No, you have to synchronize a WeakHashMap.
If you happen to have the Spring Framework in your classpath already, then one option is ConcurrentReferenceHashMap:
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/util/ConcurrentReferenceHashMap.html
You can choose between using weak or soft references (for both the keys and values).
Does wrapping the WeakHashMap in a synchronized map still work correctly for what you want to do, since the garbage collector can modify the weakreferences directly at anytime, bypassing the synchronized map wrapper? I think WeakHashMap only truly works in a single threaded model.