How does a TreeSet, HashSet or LinkedHashSet behave when the objects are mutable? I cannot imagine that they would work in any sense?
If I modify an object after I have added it; what is the behaviour of the list?
Is there a better option for dealing with a collection of mutable objects (which I need to sort/index/etc) other than a linked list or an array and simply iterating through them each time?
The Set interface addresses this issue directly: "Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element."
Addendum:
Is there a better option for dealing with a collection of mutable objects?
When trying to decide which collection implementation is most suitable, it may be worth looking over the core collection interfaces. For Set implementations in particular, as long as equals() and hashCode() are implemented correctly, any unrelated attributes may be mutable. By analogy with a database relation, any attribute may change, but the primary key must be inviolate.
Being mutable is only a problem for the collection if the objects' hashCode and behaviour of compare methods change after it is inserted.
The way you could handle this is to remove the objects from the collection and re-adding them after such a change so that the object.
In essence this results in a inmutable object from the collections' point of view.
Another less performant way could be to keep a set containing all objects and creating a TreeSet/HashSet when you need the set to be sorted or indexed. This is no real solution for a situation where the objects change constantly and you need map access at the same time.
The "best" way to deal with this situation is to keep ancillary data structures for lookup, a bit like indexes in a database. Then all of your modifications need to make sure the indexes are updated. Good examples would be maps or multimaps - before an update, remove the entry from any indexes, and then after an update add them back in with the new values. Obviously this needs care with concurrency etc.
Related
I'm learning about collections and trying to ascertain the best one to use for my practice exercise.....I've done a lot of reading on them, but still can't find the best approach.....this may sound a bit woolly but any guidance at all would be appreciated....
I need to associate a list of Travellers, with a list of Boarding Passes. Both classes contain a mutable boolean field that will be modified during my programme, else all other fields are immutable. That boolean field must exist. I'll need to create a collection of 10 travellers, and then when all criteria has been met, instantiate a boarding pass, and associate it with them.
There won't be any duplicates of either due to each object having a unique reference variable associated with them, created through an object factory.
From doing some reading I understand that Sets must contain immutable objects, and don't allow duplicate elements, whereas Lists are the opposite.
Because I need to associate them with each other, I was thinking a Map, but I now know that the keys are stored in a set, which would be problematic due to the aforementioned reasons....
Could I override the hashcode() method so that it doesn't taken into consideration the boolean field and therefore as long as all of my other fields are immutable it should be fine? Or is that bad practice?
I also thought about creating a list of Travellers, and then trying to associate a Boarding Pass another way, but couldn't think of how that could be achieved....
Please don't give me any code - just some sort of a steer in the right direction would be really helpful.
If you are looking for a best practice, you need to think what you are planning to do with the data now and in the (near) future. When you know
what this is, you need to check which of the methods (list, set and map) works best for you. If you want to compare the three, have a look here
You've been mislead about the mutability requirements of set members and map keys.
When you do a lookup in a HashMap, you do it based on the key's hashCode. If you have mutable objects as keys, and mutating the object modifies the hashCode value, then this is a problem.
If a key was inserted into the table when it had a hashCode of 123, but later it's modified to have a hashCode of 345, you won't be able to find it again later since it's stored in the 123 bucket.
If the mutable boolean field does not influence your hashCode values (e.g., you didn't override hashCode or equals on your key class), then there's no issue.
That said, since you say you'll only have one unique instance of each passenger, Boris's suggestion in the comments about using an IdentityHashMap is probably the way to go. The IdentityHashMap gives the same behavior as a HashMap whose keys all use the default (identity-based) implementations for hashCode and equals. This way you'll get the expected behavior whether or not you've overridden equals and/or hashCode for other purposes.
(Note that you need to take equality into account as well as the hashCode.)
We are learning about the Collection Interface and I was wondering if you all have any good advice for it's general use? What can you do with an Collection that you cannot do with an array? What can you do with an array that you cannot do with a Collection(besides allowing duplicates)?
The easy way to think of it is: Collections beat object arrays in basically every single way. Consider:
A collection can be mutable or immutable. A nonempty array must always be mutable.
A collection can allow or disallow null elements. An array must always permit null elements.
A collection can be thread-safe; even concurrent. An array is never safe to publish to multiple threads.
A list or set's equals, hashCode and toString methods do what users expect; on an array they are a common source of bugs.
A collection is type-safe; an array is not. Because arrays "fake" covariance, ArrayStoreException can result at runtime.
A collection can hold a non-reifiable type (e.g. List<Class<? extends E>> or List<Optional<T>>). An array will generate a warning for this.
A collection can have views (unmodifiable, subList...). No such luck for an array.
A collection has a full-fledged API; an array has only set-at-index, get-at-index, length and clone.
Type-use annotations like #Nullable are very confusing with arrays. I promise you can't guess what #A String #B [] #C [] means.
Because of all the reasons above, third-party utility libraries should not bother adding much additional support for arrays, focusing only on collections, so you also have a network effect.
Object arrays will never be first-class citizens in Java APIs.
A couple of the reasons above are covered -- but in much greater detail -- in Effective Java, Third Edition, Item 28, from page 126.
So, why would you ever use object arrays?
You're very tightly optimizing something
You have to interact with an API that uses them and you can't fix it
so convert to/from a List as close to that API as you can
Because varargs (but varargs is overused)
so ... same as previous
Obviously some collection implementations must be using them
I can't think of any other reasons, they suck bad
It's basically a question of the desired level of abstraction.
Most collections can be implemented in terms of arrays, but they provide many more methods on top of it for your convenience. Most collection implementations I know of for instance, can grow and shrink according to demand, or perform other "high-level" operations which basic arrays can't.
Suppose for instance that you're loading strings from a file. You don't know how many new-line characters the file contains, thus you don't know what size to use when allocating the array. Therefore an ArrayList is a better choice.
The details are in the sub interfaces of Collection, like Set, List, and Map. Each of those types has semantics. A Set typically cannot contain duplicates, and has no notion of order (although some implementations do), following the mathematical concept of a Set. A List is closest to an Array. A Map has specific behavior for push and get. You push an object by its key, and you retrieve with the same key.
There are even more details in the implementations of each collection type. For example, any of the hash based collections (e.g. HashSet, HasMap) are based on the hashcode() method that exists on any Java object.
You could simulate the semantics of any collection type based of an array, but you would have to write a lot of code to do it. For example, to back a Map with an array, you would need to write a method that puts any object entered into your Map into a specific bucket in the array. You would need to handle duplicates. For an array simulating a Set, you would need to write code to not allow duplicates.
The Collection interface is just a base interface for specialised collections -- I am not aware yet of a class that simply just implements Collection; instead classes implement specialized interfaces which extend Collection. These specialized interfaces and abstract classes provide functionality for working with sets (unique objects), growing arrays (e.g. ArrayList), key-value maps etc -- all of which you cannot do out of the box with an array.
However, iterating through an array and setting/reading items from an array remains one of the fastest methods of dealing with data in Java.
One advantage is the Iterator interface. That is all Collections implement an Iterator. An Iterator is an object that knows how to iterate over the given collection and present the programmer with a uniformed interface regardless of the underlying implementation. That is, a linked list is traversed differently from a binary tree, but the iterator hides these differences from the programmer making it easier for the programmer to use one or the other collection.
This also leads to the ability to use various implementations of Collections interchangeably if the client code targets the Collection interface iteself.
I have a hash-based collection of objects, such as HashSet or HashMap. What issues can I run into when the implementation of hashCode() is such that it can change with time because it's computed from some mutable fields?
How does it affect Hibernate? Is there any reason why having hashCode() return object's ID by default is bad? All not-yet-persisted objects have id=0, if that matters.
What is the reasonable implementation of hashCode for Hibernate-mapped entities? Once set the ID is immutable, but it's not true for the moment of saving an entity to database.
I'm not worried about performance of a HashSet with a dozen entities with key=0. What I care about is whether it's safe for my application and Hibernate to use ID as hash code, because ID changes as it is generated on persist.
If the hash code of the same object changes over time, the results are basically unpredictable. Hash collections use the hash code to assign objects to buckets -- if your hash code suddenly changes, the collection obviously doesn't know, so it can fail to find an existing object because it hashes to a different bucket now.
Returning an object's ID by itself isn't bad, but if many of them have id=0 as you mentioned, it will reduce the performance of the hash table: all objects with the same hash code go into the same bucket, so your hash table is now no better than a linear list.
Update: Theoretically, your hash code can change as long as nobody else is aware of it -- this implies exactly what #bestsss mentioned in his comment, which is to remove your object from any collections that may be holding it and insert it again once the hash code has changed. In practice, a better alternative is to generate your hash code from the actual content fields of your object rather than relying on the database ID.
If you add an object to a hash-based collection, then mutate its state so as to change its hashcode (and by implication probably the behaviour in .equals() calls), you may see effects including but not limited to:
Stuff you put in the collection seeming to not be there any more
Getting something out which is different to what you asked for
This is surely not what you want. So, I recommend making the hashcode only out of immutable fields. This is usually done by making the fields final and setting their values in the constructor.
http://community.jboss.org/wiki/EqualsandHashCode
Don’t change hashcode of elements in hash based collection after put.
Many programmers fall into the pitfall.
You could think hashcode is kind of address in collection, so you couldn’t change address of an element after it’s put in the collection.
The Javadoc spefically says that the built-in Collections don't support this. So don't do it.
Let's say I want to put words in a data structure and I want to have constant time lookups to see if the word is in this data structure. All I want to do is to see if the word exists. Would I use a HashMap (containsKey()) for this? HashMaps use key->value pairings, but in my case I don't have a value. Of course I could use null for the value, but even null takes space. It seems like there ought to be a better data structure for this application.
The collection could potentially be used by multiple threads, but since the objects contained by the collection would not change, I do not think I have a synchronization/concurrency requirement.
Can anyone help me out?
Use HashSet instead. It's a hash implementation of Set, which is used primarily for exactly what you describe (an unordered set of items).
You'd generally use an implementation of Set, and most usually HashSet. If you did need concurrent access, then ConcurrentHashSet provides a drop-in replacement that provides safe, concurrent access, including safe iteration over the set.
I'd recommend in any case referring to it as simply a Set throughout your code, except in the one place where you construct it; that way, it's easier to drop in one implementation for the other if you later require it.
Even if the set is read-only, if it's used by a thread other than the one that creates it, you do need to think about safe publication (that is, making sure that any other thread sees the set in a consistent state: remember any memory writes, even in constructors, aren't guaranteed to be made available to other threads when or in the otder you expect, unless you take steps to ensure this). This can be done by both of the following:
making sure the only reference(s) to the set are in final fields;
making sure that it really is true that no thread modifies the set.
You can help to ensure the latter by using the Collections.unmodifiableSet() wrapper. This gives you an unmodifiable view of the given set-- so provided no other "normal" reference to the set escapes, you're safe.
You probably want to use a java.util.Set. Implementations include java.util.HashSet, which is the Set equivalent of HashMap.
Even if the objects contained in the collection do not change, you may need to do synchronization. Do new objects need to be added to the Set after the Set is passed to a different thread? If so, you can use Collections.synchronizedSet() to make the Set thread-safe.
If you have a Map with values, and you have some code that just wants to treat the Map as a Set, you can use Map.entrySet() (though keep in mind that entrySet returns a Set view of the keys in the Map; if the Map is mutable, the Map can be changed through the set returned by entrySet).
You want to use a Collection implementing the Set interface, probably HashSet to get the performance you stated. See http://java.sun.com/javase/6/docs/api/java/util/Set.html
Other than Sets, in some circumstances you might want to convert a Map into a Set with Collections.newSetFromMap(Map<E,Boolean>) (some Maps disallow null values, hence the Boolean).
as everyone said HashSet is probably the simplest solution but you won't have constant time lookup in a HashSet (because entries may be chained) and you will store a dummy object (always the same) for every entry...
For information here a list of data structures maybe you'll find one that better fits your needs.
Is there any practical difference between a Set and Collection in Java, besides the fact that a Collection can include the same element twice? They have the same methods.
(For example, does Set give me more options to use libraries which accept Sets but not Collections?)
edit: I can think of at least 5 different situations to judge this question. Can anyone else come up with more? I want to make sure I understand the subtleties here.
designing a method which accepts an argument of Set or Collection. Collection is more general and accepts more possibilities of input. (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Collection.)
designing a method which returns a Set or Collection. Set offers more guarantees than Collection (even if it's just the guarantee not to include one element twice). (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Set.)
designing a class that implements the interface Set or Collection. Similar issues as #2. Users of my class/interface get more guarantees, subclassers/implementers have more responsibility.
designing an interface that extends the interface Set or Collection. Very similar to #3.
writing code that uses a Set or Collection. Here I might as well use Set; the only reasons for me to use Collection is if I get back a Collection from someone else's code, or if I have to handle a collection that contains duplicates.
Collection is also the supertype of List, Queue, Deque, and others, so it gives you more options. For example, I try to use Collection as a parameter to library methods that shouldn't explicitly depend on a certain type of collection.
Generally, you should use the right tool for the job. If you don't want duplicates, use Set (or SortedSet if you want ordering, or LinkedHashSet if you want to maintain insertion order). If you want to allow duplicates, use List, and so on.
I think you already have it figured out- use a Set when you want to specifically exclude duplicates. Collection is generally the lowest common denominator, and it's useful to specify APIs that accept/return this, which leaves you room to change details later on if needed. However if the details of your application require unique entries, use Set to enforce this.
Also worth considering is whether order is important to you; if it is, use List, or LinkedHashSet if you care about order and uniqueness.
See Java's Collection tutorial for a good walk-through of Collection usage. In particular, check out the class hierarchy.
As #mmyers states, Collection includes Set, as well as List.
When you declare something as a Set, rather than a Collection, you are saying that the variable cannot be a List or a Map. It will always be a Collection, though. So, any function that accepts a Collection will accept a Set, but a function that accepts a Set cannot take a Collection (unless you cast it to a Set).
One other thing to consider... Sets have extra overhead in time, memory, and coding in order to guarantee that there are no duplicates. (Time and memory because sets are usually backed by a HashMap or a Tree, which adds overhead over a list or an array. Coding because you have to implement the hashCode() and equals() methods.)
I usually use sets when I need a fast implementation of contains() and use Collection or List otherwise, even if the collection shouldn't have duplicates.
You should use a Set when that is what you want.
For example, a List without any order or duplicates. Methods like contains are quite useful.
A collection is much more generic. I believe that what mmyers wrote on their usage says it all.
The practical difference is that Set enforces the set logic, i.e. no duplicates and unordered, while Collection does not. So if you need a Collection and you have no particular requirement for avoiding duplicates then use a Collection. If you have the requirement for Set then use Set. Generally use the highest interface possibble.
As Collection is a super type of Set and SortedSet these can be passed to a method which expects a Collection. Collection just means it may or may not be sorted, order or allow duplicates.