Java - Why does Map.put() overwrite while Set.add() does not? - java

I am wondering what the rationale is behind having Java's Map.put(key, value) method overwrite equivalently key'd values that are already in the collection, while Set.add(value) does not overwrite a pre-existing equivalent value that is already in the collection?
Edit:
It looks like majority viewpoint is that objects in a set that evaluate to equality should be equal in every respect, thus it shouldn't matter if Set.add(Object) overwrites equivalently valued objects or not. If two objects evaluate to equality, but do in fact hold different data, then a Map-type collection is a more appropriate container.
I somewhat disagree with this veiwpoint.
Example: A set holding a group of "Person" objects. In order to update some information about that person, you might want to pass the set a new, updated, person object to overwrite the old, outdated person object. In this case, a Person would hold a primary key that identifies that individual and the set would identify and compare people based only on their primary keys. This primary key is part of the person's identity as opposed to an external reference such as a Map would imply.

The Map behavior allows changing the values associated with equivalent keys. That is a pretty common use case: a : b becomes a : c.
Yes, over-writing Set contents with add could change something (reference value) - but that seems like a pretty narrow use case (which can be accomplished anyways - always try to remove before adding: s.remove(o); s.add(o);) relative to what one would be getting in most cases - nothing for cycles.
edit:
the one potential use I could see for that behavior, is having a constrained memory budget, lots of heavy-but-equivalent objects being created, and having references to different equal versions in various places, preventing garbage collection of the duplicate ones. Having run into that problem before, however, I don't think this behavior is even the best way to solve it.

In my opinion, there is no point in overwriting something in Set, since nothing will change.
However when you update a map, the key might be the same, but the value might be different.

Note that Map isn't actually so different... it may always change the value, but (at least in Sun's implementations) the key will remain the same even if later calls to put() use a different instance that compares as equal to the original.

I disagree with the premise of your question. Both Map and Set are abstract interfaces. Whether they overwrite or not is an implementation detail.
an implementation of Map that does not overwrite.
You could create a mutable singleton set - adding stuff to the set overwrites the existing singleton value.

Related

Should a HashSet be allowed to be added to itself in Java?

According to the contract for a Set in Java, "it is not permissible for a set to contain itself as an element" (source). However, this is possible in the case of a HashSet of Objects, as demonstrated here:
Set<Object> mySet = new HashSet<>();
mySet.add(mySet);
assertThat(mySet.size(), equalTo(1));
This assertion passes, but I would expect the behavior to be to either have the resulting set be 0 or to throw an Exception. I realize the underlying implementation of a HashSet is a HashMap, but it seems like there should be an equality check before adding an element to avoid violating that contract, no?
Others have already pointed out why it is questionable from a mathematical point of view, by referring to Russell's paradox.
This does not answer your question on a technical level, though.
So let's dissect this:
First, once more the relevant part from the JavaDoc of the Set interface:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Interestingly, the JavaDoc of the List interface makes a similar, although somewhat weaker, and at the same time more technical statement:
While it is permissible for lists to contain themselves as elements, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a list.
And finally, the crux is in the JavaDoc of the Collection interface, which is the common ancestor of both the Set and the List interface:
Some collection operations which perform recursive traversal of the collection may fail with an exception for self-referential instances where the collection directly or indirectly contains itself. This includes the clone(), equals(), hashCode() and toString() methods. Implementations may optionally handle the self-referential scenario, however most current implementations do not do so.
(Emphasis by me)
The bold part is a hint at why the approach that you proposed in your question would not be sufficient:
it seems like there should be an equality check before adding an element to avoid violating that contract, no?
This would not help you here. The key point is that you'll always run into problems when the collection will directly or indirectly contain itself. Imagine this scenario:
Set<Object> setA = new HashSet<Object>();
Set<Object> setB = new HashSet<Object>();
setA.add(setB);
setB.add(setA);
Obviously, neither of the sets contains itself directly. But each of them contains the other - and therefore, itself indirectly. This could not be avoided by a simple referential equality check (using == in the add method).
Avoiding such an "inconsistent state" is basically impossible in practice. Of course it is possible in theory, using referential Reachability computations. In fact, the Garbage Collector basically has to do exactly that!
But it becomes impossible in practice when custom classes are involved. Imagine a class like this:
class Container {
Set<Object> set;
#Override
int hashCode() {
return set.hashCode();
}
}
And messing around with this and its set:
Set<Object> set = new HashSet<Object>();
Container container = new Container();
container.set = set;
set.add(container);
The add method of the Set basically has no way of detecting whether the object that is added there has some (indirect) reference to the set itself.
Long story short:
You cannot prevent the programmer from messing things up.
Adding the collection into itself once causes the test to pass. Adding it twice causes the StackOverflowError which you were seeking.
From a personal developer standpoint, it doesn't make any sense to enforce a check in the underlying code to prevent this. The fact that you get a StackOverflowError in your code if you attempt to do this too many times, or calculate the hashCode - which would cause an instant overflow - should be enough to ensure that no sane developer would keep this kind of code in their code base.
You need to read the full doc and quote it fully:
The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
The actual restriction is in the first sentence. The behavior is unspecified if an element of a set is mutated.
Since adding a set to itself mutates it, and adding it again mutates it again, the result is unspecified.
Note that the restriction is that the behavior is unspecified, and that a special case of that restriction is adding the set to itself.
So the doc says, in other words, that adding a set to itself results in unspecified behavior, which is what you are seeing. It's up to the concrete implementation to deal with (or not).
I agree with you that, from a mathematical perspective, this behavior really doesn't make sense.
There are two interesting questions here: first, to what extent were the designers of the Set interface trying to implement a mathematical set? Secondly, even if they weren't, to what extent does that exempt them from the rules of set theory?
For the first question, I will point you to the documentation of the Set:
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
It's worth mentioning here that current formulations of set theory don't permit sets to be members of themselves. (See the Axiom of regularity). This is due in part to Russell's Paradox, which exposed a contradiction in naive set theory (which permitted a set to be any collection of objects - there was no prohibition against sets including themselves). This is often illustrated by the Barber Paradox: suppose that, in a particular town, a barber shaves all of the men - and only the men - who do not shave themselves. Question: does the barber shave himself? If he does, it violates the second constraint; if he doesn't, it violates the first constraint. This is clearly logically impossible, but it's actually perfectly permissible under the rules of naive set theory (which is why the newer "standard" formulation of set theory explicitly bans sets from containing themselves).
There's more discussion in this question on Math.SE about why sets cannot be an element of themselves.
With that said, this brings up the second question: even if the designers hadn't been explicitly trying to model a mathematical set, would this be completely "exempt" from the problems associated with naive set theory? I think not - I think that many of the problems that plagued naive set theory would plague any kind of a collection that was insufficiently constrained in ways that were analogous to naive set theory. Indeed, I may be reading too much into this, but the first part of the definition of a Set in the documentation sounds suspiciously like the intuitive concept of a set in naive set theory:
A collection that contains no duplicate elements.
Admittedly (and to their credit), they do place at least some constraints on this later (including stating that you really shouldn't try to have a Set contain itself), but you could question whether it's really "enough" to avoid the problems with naive set theory. This is why, for example, you have a "turtles all the way down" problem when trying to calculate the hash code of a HashSet that contains itself. This is not, as some others have suggested, merely a practical problem - it's an illustration of the fundamental theoretical problems with this type of formulation.
As a brief digression, I do recognize that there are, of course, some limitations on how closely any collection class can really model a mathematical set. For example, Java's documentation warns against the dangers of including mutable objects in a set. Some other languages, such as Python, at least attempt to ban many kinds of mutable objects entirely:
The set classes are implemented using dictionaries. Accordingly, the requirements for set elements are the same as those for dictionary keys; namely, that the element defines both __eq__() and __hash__(). As a result, sets cannot contain mutable elements such as lists or dictionaries. However, they can contain immutable collections such as tuples or instances of ImmutableSet. For convenience in implementing sets of sets, inner sets are automatically converted to immutable form, for example, Set([Set(['dog'])]) is transformed to Set([ImmutableSet(['dog'])]).
Two other major differences that others have pointed out are
Java sets are mutable
Java sets are finite. Obviously, this will be true of any collection class: apart from concerns about actual infinity, computers only have a finite amount of memory. (Some languages, like Haskell, have lazy infinite data structures; however, in my opinion, a lawlike choice sequence seems like a more natural way model these than classical set theory, but that's just my opinion).
TL;DR No, it really shouldn't be permitted (or, at least, you should never do that) because sets can't be members of themselves.

Collections in java - how to choose the appropriate one

I'm learning about collections and trying to ascertain the best one to use for my practice exercise.....I've done a lot of reading on them, but still can't find the best approach.....this may sound a bit woolly but any guidance at all would be appreciated....
I need to associate a list of Travellers, with a list of Boarding Passes. Both classes contain a mutable boolean field that will be modified during my programme, else all other fields are immutable. That boolean field must exist. I'll need to create a collection of 10 travellers, and then when all criteria has been met, instantiate a boarding pass, and associate it with them.
There won't be any duplicates of either due to each object having a unique reference variable associated with them, created through an object factory.
From doing some reading I understand that Sets must contain immutable objects, and don't allow duplicate elements, whereas Lists are the opposite.
Because I need to associate them with each other, I was thinking a Map, but I now know that the keys are stored in a set, which would be problematic due to the aforementioned reasons....
Could I override the hashcode() method so that it doesn't taken into consideration the boolean field and therefore as long as all of my other fields are immutable it should be fine? Or is that bad practice?
I also thought about creating a list of Travellers, and then trying to associate a Boarding Pass another way, but couldn't think of how that could be achieved....
Please don't give me any code - just some sort of a steer in the right direction would be really helpful.
If you are looking for a best practice, you need to think what you are planning to do with the data now and in the (near) future. When you know
what this is, you need to check which of the methods (list, set and map) works best for you. If you want to compare the three, have a look here
You've been mislead about the mutability requirements of set members and map keys.
When you do a lookup in a HashMap, you do it based on the key's hashCode. If you have mutable objects as keys, and mutating the object modifies the hashCode value, then this is a problem.
If a key was inserted into the table when it had a hashCode of 123, but later it's modified to have a hashCode of 345, you won't be able to find it again later since it's stored in the 123 bucket.
If the mutable boolean field does not influence your hashCode values (e.g., you didn't override hashCode or equals on your key class), then there's no issue.
That said, since you say you'll only have one unique instance of each passenger, Boris's suggestion in the comments about using an IdentityHashMap is probably the way to go. The IdentityHashMap gives the same behavior as a HashMap whose keys all use the default (identity-based) implementations for hashCode and equals. This way you'll get the expected behavior whether or not you've overridden equals and/or hashCode for other purposes.
(Note that you need to take equality into account as well as the hashCode.)

Java Collections with Mutable Objects

How does a TreeSet, HashSet or LinkedHashSet behave when the objects are mutable? I cannot imagine that they would work in any sense?
If I modify an object after I have added it; what is the behaviour of the list?
Is there a better option for dealing with a collection of mutable objects (which I need to sort/index/etc) other than a linked list or an array and simply iterating through them each time?
The Set interface addresses this issue directly: "Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element."
Addendum:
Is there a better option for dealing with a collection of mutable objects?
When trying to decide which collection implementation is most suitable, it may be worth looking over the core collection interfaces. For Set implementations in particular, as long as equals() and hashCode() are implemented correctly, any unrelated attributes may be mutable. By analogy with a database relation, any attribute may change, but the primary key must be inviolate.
Being mutable is only a problem for the collection if the objects' hashCode and behaviour of compare methods change after it is inserted.
The way you could handle this is to remove the objects from the collection and re-adding them after such a change so that the object.
In essence this results in a inmutable object from the collections' point of view.
Another less performant way could be to keep a set containing all objects and creating a TreeSet/HashSet when you need the set to be sorted or indexed. This is no real solution for a situation where the objects change constantly and you need map access at the same time.
The "best" way to deal with this situation is to keep ancillary data structures for lookup, a bit like indexes in a database. Then all of your modifications need to make sure the indexes are updated. Good examples would be maps or multimaps - before an update, remove the entry from any indexes, and then after an update add them back in with the new values. Obviously this needs care with concurrency etc.

Java hashmaps without the value?

Let's say I want to put words in a data structure and I want to have constant time lookups to see if the word is in this data structure. All I want to do is to see if the word exists. Would I use a HashMap (containsKey()) for this? HashMaps use key->value pairings, but in my case I don't have a value. Of course I could use null for the value, but even null takes space. It seems like there ought to be a better data structure for this application.
The collection could potentially be used by multiple threads, but since the objects contained by the collection would not change, I do not think I have a synchronization/concurrency requirement.
Can anyone help me out?
Use HashSet instead. It's a hash implementation of Set, which is used primarily for exactly what you describe (an unordered set of items).
You'd generally use an implementation of Set, and most usually HashSet. If you did need concurrent access, then ConcurrentHashSet provides a drop-in replacement that provides safe, concurrent access, including safe iteration over the set.
I'd recommend in any case referring to it as simply a Set throughout your code, except in the one place where you construct it; that way, it's easier to drop in one implementation for the other if you later require it.
Even if the set is read-only, if it's used by a thread other than the one that creates it, you do need to think about safe publication (that is, making sure that any other thread sees the set in a consistent state: remember any memory writes, even in constructors, aren't guaranteed to be made available to other threads when or in the otder you expect, unless you take steps to ensure this). This can be done by both of the following:
making sure the only reference(s) to the set are in final fields;
making sure that it really is true that no thread modifies the set.
You can help to ensure the latter by using the Collections.unmodifiableSet() wrapper. This gives you an unmodifiable view of the given set-- so provided no other "normal" reference to the set escapes, you're safe.
You probably want to use a java.util.Set. Implementations include java.util.HashSet, which is the Set equivalent of HashMap.
Even if the objects contained in the collection do not change, you may need to do synchronization. Do new objects need to be added to the Set after the Set is passed to a different thread? If so, you can use Collections.synchronizedSet() to make the Set thread-safe.
If you have a Map with values, and you have some code that just wants to treat the Map as a Set, you can use Map.entrySet() (though keep in mind that entrySet returns a Set view of the keys in the Map; if the Map is mutable, the Map can be changed through the set returned by entrySet).
You want to use a Collection implementing the Set interface, probably HashSet to get the performance you stated. See http://java.sun.com/javase/6/docs/api/java/util/Set.html
Other than Sets, in some circumstances you might want to convert a Map into a Set with Collections.newSetFromMap(Map<E,Boolean>) (some Maps disallow null values, hence the Boolean).
as everyone said HashSet is probably the simplest solution but you won't have constant time lookup in a HashSet (because entries may be chained) and you will store a dummy object (always the same) for every entry...
For information here a list of data structures maybe you'll find one that better fits your needs.

Any disadvantage to using arbitrary objects as Map keys in Java?

I have two kinds of objects in my application where every object of one kind has exactly one corresponding object of the other kind.
The obvious choice to keep track of this relationship is a Map<type1, type2>, like a HashMap. But somehow, I'm suspicious. Can I use an object as a key in the Map, pass it around, have it sitting in another collection, too, and retrieve its partner from the Map any time?
After an object is created, all I'm passing around is an identifier, right? So probably no problem there. What if I serialize and deserialize the key?
Any other caveats? Should I use something else to correlate the object pairs, like a number I generate myself?
The key needs to implement .equals() and .hashCode() correctly
The key must not be changed in any way that changes it's .hashCode() value while it's used as the key
Ideally any object used as a key in a HashMap should be immutable. This would automatically ensure that 2. is always held true.
Objects that could otherwise be GCed might be kept around when they are used as key and/or value.
I have two kinds of objects in my
application where every object of one
kind has exactly one corresponding
object of the other kind.
This really sounds like a has-a relationship and thus could be implemented using a simple attribute.
It depends on the implementation of the map you choose:
HashMap uses equals() and hashCode(). By default (in Object) these are based on the object identity, which will work OK unless you serialize/deserialize. With a proper implementation of equals() and hashCode() based on the content of the object you will have no problems, as long as you don't modify it while it is a key in a hash map.
TreeMap uses compareTo(). There is no default implementation, so you need to provide one. The same limitations apply as for implementing hashCode() and equals() above.
You could use a standard Map, but doing so you will keep strong references to your objects in the Map. If your objects are referenced in another structure and you need the Map just to link them together consider using a WeakHashMap.
And BTW you don't have to override equals and hashCode unless you have to consider several instances of an object as equal...
Can I use an object as a key in the Map, pass it around, have it sitting in another collection, too, and retrieve its partner from the Map any time?
Yes, no problem here at all.
After an object is created, all I'm passing around is an identifier, right? So probably no problem there. What if I serialize and deserialize the key?
That's right, you are only passing a reference around - they will all point to the same actual object. If you serialize or deserialize the object, that would create a new object. However, if your object implements equals and hashCode properly, you should still be able to use the new deserialized object to retrieve items from the map.
Any other caveats? Should I use something else to correlate the object pairs, like a number I generate myself?
As for Caveats, yes, you can't change anything that would cause the hashCode of the object to change while the object is in the Map.
Any object can be a map key. The important thing here is to make sure that you override .equals() and .hashCode() for any objects that will be used as map keys.
The reason you do this is that if you don't, equals will be understood as object equality, and the only way you'll be able to find "equal" map keys is to have a handle to the original object itself.
You override hashcode because it needs to be consistent with equals. This is so that objects that you've defined as equals hash identically.
The failure points are the hashcode and equals functions. If they don't produce consistent and proper return values, the Map will behave strangely. Effective Java has a whole section on them and is highly, highly recommended.
You might consider Google Collection's BiMap.

Categories