Guava Cache - what is the implication of comparing values by identity

Guava Cache - what is the implication of comparing values by identity - java

The CacheBuilder methods weakValues() and softValues() both contain this line in their javadocs:
Note: when this method is used, the resulting cache will use identity (==) comparison to determine equality of values.
How exactly does this affect behaviour? As far as I can tell there are no public methods on the Cache or LoadingCache interface which would require testing for value equality. Does it affect the asMap() view?

Guava internally caches those soft/weak values. If the reference of 2 weak/soft values are equal then the content must be equal too.
Guava only compares those values in methods like contains(...), remove(...) or replace(...) and is used to find a specifc entry and check if it really exists or to remove a key. So guava first looks for the specific entry. If found it removes the entry.
I think the main purpose is to detect an excplicit removal of an entry, because if the value, which the reference points to is already null, then it was not an explicit removal, but it was removed by guava internally (computed).
It's nothing to worry about. It's just the way guava handles the removal of old entries.
There is no way to change this and you really don't have to care about this, guava will handle this :D

I guess it uses Google's ConcurrentMap internally and uses value comparison for remove and replace.

Related

Collections in java - how to choose the appropriate one

I'm learning about collections and trying to ascertain the best one to use for my practice exercise.....I've done a lot of reading on them, but still can't find the best approach.....this may sound a bit woolly but any guidance at all would be appreciated....
I need to associate a list of Travellers, with a list of Boarding Passes. Both classes contain a mutable boolean field that will be modified during my programme, else all other fields are immutable. That boolean field must exist. I'll need to create a collection of 10 travellers, and then when all criteria has been met, instantiate a boarding pass, and associate it with them.
There won't be any duplicates of either due to each object having a unique reference variable associated with them, created through an object factory.
From doing some reading I understand that Sets must contain immutable objects, and don't allow duplicate elements, whereas Lists are the opposite.
Because I need to associate them with each other, I was thinking a Map, but I now know that the keys are stored in a set, which would be problematic due to the aforementioned reasons....
Could I override the hashcode() method so that it doesn't taken into consideration the boolean field and therefore as long as all of my other fields are immutable it should be fine? Or is that bad practice?
I also thought about creating a list of Travellers, and then trying to associate a Boarding Pass another way, but couldn't think of how that could be achieved....
Please don't give me any code - just some sort of a steer in the right direction would be really helpful.

If you are looking for a best practice, you need to think what you are planning to do with the data now and in the (near) future. When you know
what this is, you need to check which of the methods (list, set and map) works best for you. If you want to compare the three, have a look here

You've been mislead about the mutability requirements of set members and map keys.
When you do a lookup in a HashMap, you do it based on the key's hashCode. If you have mutable objects as keys, and mutating the object modifies the hashCode value, then this is a problem.
If a key was inserted into the table when it had a hashCode of 123, but later it's modified to have a hashCode of 345, you won't be able to find it again later since it's stored in the 123 bucket.
If the mutable boolean field does not influence your hashCode values (e.g., you didn't override hashCode or equals on your key class), then there's no issue.
That said, since you say you'll only have one unique instance of each passenger, Boris's suggestion in the comments about using an IdentityHashMap is probably the way to go. The IdentityHashMap gives the same behavior as a HashMap whose keys all use the default (identity-based) implementations for hashCode and equals. This way you'll get the expected behavior whether or not you've overridden equals and/or hashCode for other purposes.
(Note that you need to take equality into account as well as the hashCode.)

Using multiple alternatives of hashCode() and equals() for sets

Suppose I have a simple POJO class Class1 , and it has 2 fields of type int.
I've implemented the hashCode() and equals() methods of it to handle exactly those 2 fields, in order to put instances of the class into a set.
So far so good.
Now, I want to have a different set, which considers instances of Class1 to be equal if the first field is equal , making the equality condition weaker. I might even want to have another set which considers only the second field as the one that checks for equality.
Is it possible? If so, how?

You can get that effect by using a TreeSet when providing a custom Comparator that only inspects the fields you're interested in.
Note, however, that strictly speaking such a TreeSet no longer is a "correct" Set because it effectively ignores the equal() method of your objects:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

The standard Java libraries don't support this.
And (surprisingly) there doesn't appear to be a Map or Set class in the Apache Commons Collections or Guava libraries that supports this.
There are probably other libraries that to support this if you look hard enough.
Alternatively, you could write your own ... starting with the standard HashMap code.
A cheap-and-cheerful alternative is to create a light-weight wrapper class for your element type that delegates most methods to the wrapped class and provides a different equals / hashcode pair to the original. There is a small runtime penalty in doing this ... but it is worth considering.
Joachim's suggestion is good too, unless your sets are likely to be particularly big. (TreeSet has O(logN) lookup compared with O(1) for a properly implemented hash table.)

ConcurrentHashMap with weak keys and identity hash?

How do I get a ConcurrentHashMap with weak keys and identity hashes in Java? I think Google Guava Collections can give such a thing, but can I get it from the standard library? What other options do I have?

I think Google Guava Collections can give such a thing, but can I get it from the standard library?
The short answer to that is No. Java SE does not implement this particular combination.
You could instantiate a java.util.concurrent.ConcurrentHashMap with WeakReference keys, and do some extra work to implement removal of map entries for broken references, but that won't give you identity hash semantics.
You could instantiate a java.util.IdentityHashMap with WeakReference keys, and do some extra work to implement removal of map entries for broken references, but that won't give you concurrent behaviour.
Using a java.util.WeakHashMap won't give you either concurrency or identity hashing.
You could (in theory) wrap the key class in something that overrode the natural equals and hashcode methods. But that is most likely to be unusable.
I don't think it would be possible to do this by overriding methods in either ConcurrentHashMap or IdentityHashMap.
Maybe the only viable option would be to change the key classes equals and hashcode methods to be identity based. But that won't work for "built in" key types (especially final ones) or for cases where you need value-based equals/hashcode in other parts of the application.

The Google Guava implementation appears the easiest way to go. One may initialize the required map with new MapMaker().weakKeys().makeMap() and use just as one would use java.util.concurrent.ConcurrentHashMap. See the apidoc for more details.

if your application is under spring framework ( version is gt 3.2 ), you can consider to use org.springframework.util.ConcurrentReferenceHashMap. Below is its description:
A ConcurrentHashMap that uses soft or weak references for both keys and values.
This class can be used as an alternative to Collections.synchronizedMap(new WeakHashMap>()) in order to support better performance when accessed concurrently. This implementation follows the same design constraints as ConcurrentHashMap with the exception that null values and null keys are supported.
NOTE: The use of references means that there is no guarantee that items placed into the map will be subsequently available. The garbage collector may discard references at any time, so it may appear that an unknown thread is silently removing entries.
If not explicitly specified, this implementation will use soft entry references.

search ConcurrentWeakIdentityHashMap, you will get many examples. I wrote an implement myself, for I think the hashCode of org/ehcache/core/internal/util/ConcurrentWeakIdentityHashMap$WeakReference is so bad.
Example of ehcache3
Example I wrote
Pull Rquest to fix the ehcache3 ConcurrentWeakIdentityHashMap Key hashCode

Java - Why does Map.put() overwrite while Set.add() does not?

I am wondering what the rationale is behind having Java's Map.put(key, value) method overwrite equivalently key'd values that are already in the collection, while Set.add(value) does not overwrite a pre-existing equivalent value that is already in the collection?
Edit:
It looks like majority viewpoint is that objects in a set that evaluate to equality should be equal in every respect, thus it shouldn't matter if Set.add(Object) overwrites equivalently valued objects or not. If two objects evaluate to equality, but do in fact hold different data, then a Map-type collection is a more appropriate container.
I somewhat disagree with this veiwpoint.
Example: A set holding a group of "Person" objects. In order to update some information about that person, you might want to pass the set a new, updated, person object to overwrite the old, outdated person object. In this case, a Person would hold a primary key that identifies that individual and the set would identify and compare people based only on their primary keys. This primary key is part of the person's identity as opposed to an external reference such as a Map would imply.

The Map behavior allows changing the values associated with equivalent keys. That is a pretty common use case: a : b becomes a : c.
Yes, over-writing Set contents with add could change something (reference value) - but that seems like a pretty narrow use case (which can be accomplished anyways - always try to remove before adding: s.remove(o); s.add(o);) relative to what one would be getting in most cases - nothing for cycles.
edit:
the one potential use I could see for that behavior, is having a constrained memory budget, lots of heavy-but-equivalent objects being created, and having references to different equal versions in various places, preventing garbage collection of the duplicate ones. Having run into that problem before, however, I don't think this behavior is even the best way to solve it.

In my opinion, there is no point in overwriting something in Set, since nothing will change.
However when you update a map, the key might be the same, but the value might be different.

Note that Map isn't actually so different... it may always change the value, but (at least in Sun's implementations) the key will remain the same even if later calls to put() use a different instance that compares as equal to the original.

I disagree with the premise of your question. Both Map and Set are abstract interfaces. Whether they overwrite or not is an implementation detail.
an implementation of Map that does not overwrite.
You could create a mutable singleton set - adding stuff to the set overwrites the existing singleton value.

Java hashmaps without the value?

Let's say I want to put words in a data structure and I want to have constant time lookups to see if the word is in this data structure. All I want to do is to see if the word exists. Would I use a HashMap (containsKey()) for this? HashMaps use key->value pairings, but in my case I don't have a value. Of course I could use null for the value, but even null takes space. It seems like there ought to be a better data structure for this application.
The collection could potentially be used by multiple threads, but since the objects contained by the collection would not change, I do not think I have a synchronization/concurrency requirement.
Can anyone help me out?

Use HashSet instead. It's a hash implementation of Set, which is used primarily for exactly what you describe (an unordered set of items).

You'd generally use an implementation of Set, and most usually HashSet. If you did need concurrent access, then ConcurrentHashSet provides a drop-in replacement that provides safe, concurrent access, including safe iteration over the set.
I'd recommend in any case referring to it as simply a Set throughout your code, except in the one place where you construct it; that way, it's easier to drop in one implementation for the other if you later require it.
Even if the set is read-only, if it's used by a thread other than the one that creates it, you do need to think about safe publication (that is, making sure that any other thread sees the set in a consistent state: remember any memory writes, even in constructors, aren't guaranteed to be made available to other threads when or in the otder you expect, unless you take steps to ensure this). This can be done by both of the following:
making sure the only reference(s) to the set are in final fields;
making sure that it really is true that no thread modifies the set.
You can help to ensure the latter by using the Collections.unmodifiableSet() wrapper. This gives you an unmodifiable view of the given set-- so provided no other "normal" reference to the set escapes, you're safe.

You probably want to use a java.util.Set. Implementations include java.util.HashSet, which is the Set equivalent of HashMap.
Even if the objects contained in the collection do not change, you may need to do synchronization. Do new objects need to be added to the Set after the Set is passed to a different thread? If so, you can use Collections.synchronizedSet() to make the Set thread-safe.
If you have a Map with values, and you have some code that just wants to treat the Map as a Set, you can use Map.entrySet() (though keep in mind that entrySet returns a Set view of the keys in the Map; if the Map is mutable, the Map can be changed through the set returned by entrySet).

You want to use a Collection implementing the Set interface, probably HashSet to get the performance you stated. See http://java.sun.com/javase/6/docs/api/java/util/Set.html

Other than Sets, in some circumstances you might want to convert a Map into a Set with Collections.newSetFromMap(Map<E,Boolean>) (some Maps disallow null values, hence the Boolean).

as everyone said HashSet is probably the simplest solution but you won't have constant time lookup in a HashSet (because entries may be chained) and you will store a dummy object (always the same) for every entry...
For information here a list of data structures maybe you'll find one that better fits your needs.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.