As the documentation of LinkedHashSet states, it is
Hash table and linked list implementation of the Set interface, with
predictable iteration order. This implementation differs from HashSet
in that it maintains a doubly-linked list running through all of its
entries.
So it's essentially a HashSet with FIFO queue of keys implemented by a linked list. Considering that LinkedList is Deque and permits, in particular, insertion at the beginning, I wonder why doesn't LinkedHashSet have the addFirst(E e) method in addition to the methods present in the Set interface. It seems not hard to implement this.
As Eliott Frisch said, the answer is in the next sentence of the paragraph you quoted:
… This linked list defines the iteration ordering, which is the order
in which elements were inserted into the set (insertion-order). …
An addFirst method would break the insertion order and thereby the design idea of LinkedHashSet.
If I may add a bit of guesswork too, other possible reasons might include:
It’s not so simple to implement as it appears since a LinkedHashSet is really implemented as a LinkedHasMap where the values mapped to are not used. At least you would have to change that class too (which in turn would also break its insertion order and thereby its design idea).
As that other guy may have intended in a comment, they didn’t find it useful.
That said, you are asking the question the wrong way around. They designed a class with a functionality for which they saw a need. They moved on to implement it using a hash table and a linked list. You are starting out from the implementation and using it as a basis for a design discussion. While that may occasionally add something useful, generally it’s not the way to good designs.
While I can in theory follow your point that there might be a situation where you want a double-ended queue with set property (duplicates are ignored/eliminated), I have a hard time imagining when a Deque would not fulfil your needs in this case (Eliott Frisch mentioned the under-used ArrayDeque). You need pretty large amounts of data and/or pretty strict performance requirements before the linear complexity of contains and remove would be prohibitive. And in that case you may already be better off custom designing your own data structure.
Related
I have recently tried to understand the so-called Iterator pattern.
Think if have understood it's purpose but I'm still not sure. So please correct me concerning this:
The purpose of the iterator pattern is to abstract away the underlying structure in which the data
are kept. Data-structure can be an array, a tree, a list ...
It's important methods are next() (returns an object), hasNext() (returns a boolean) and remove().
The methods are implemented in a way which is appropriate way for the used data-structure. So the developer
who uses a iterator-implementing-class don't have to care. Just uses the provided methods which are
the same for every iterator-implementing-class.
Have I get it right?
What you have summarized is correct. Iterator design pattern hides the underlying complexity of a collection\aggregate by providing an iterator interface in between the collection and the data retriever.
Next,we can take this one level higher by defining an abstraction for the iterator. This means we can write iterators to traverse the same collection in multiple ways. For e.g.: If we have a binary tree collection then we can write three iterators for inorder, postorder and preorder traversals.
Lastly, Iterator pattern allows to have an abstraction for the collection being iterated as well. This implies that one can implement a family of iterators for a family of collections.
One more important thing to note is that an interator knows enough about the inner structure of the collection to be able to iterate it. And that it is the responsibility of the collection instance to create the correct iterator(out of the possible family of iterators) for itself and return it back to the client.
If you are interested in reading more about the iterator pattern, I have explained the above points in depth in a writeup on my blog: http://www.javabrahman.com/design-patterns/iterator-design-pattern-in-java/
Yes your understanding almost covers most of it.
For getting the same in a more technical scenes refer IteratorDesignPattern https://sourcemaking.com/design_patterns/iterator
Java has tons of different Collections designed for concurrency and thread safety, and I'm at a loss as to which one to choose for my situation.
Multiple threads may be calling .add() and .remove(), and I will be copying this list frequently with something like List<T> newList = new ArrayList<T>(concurrentList). I will never be looping over the concurrent list.
I thought about something like CopyOnWriteArrayList, but I've read that it can be very inefficient because it copies itself every time it's modified. I'm hoping to find a good compromise between safety and efficiency.
What is the best list (or set) for this situation?
As #SpiderPig said, the best case scenario with a List would be an immutable, singly-linked list.
However, looking at what's being done here, a List is unnecessary (#bhspencer's comment). A ConcurrentSkipListSet will work most efficiently (#augray).
This Related Thread's accepted answer offers more insight on the pros and cons of different concurrent collections.
You might want to look into whether a ctrie would be appropriate for your use case - it has thread-safe add and remove operations, and "copying" (in actuality, taking a snapshot of) the data structure runs in O(1). I'm aware of two JVM implementations of the data structure: implementation one, implementation two.
Collections.newSetFromMap(new ConcurrentHashMap<...>())
This is typically how a normal Set is done (HashSet is really a modified wrapper over HashMap). It offers both the advantages of performance/concurrecy from ConcurrentHashMap, and does not have extra features like ConcurrentSkipListSet (ordering), COW lists (copying every modification), or concurrent queues (FIFO/LIFO ordering).
Edit: I didn't see #bhspencer's comment on the original post, apologies for stealing the spotlight.
Hashset being hashing based would be better than List.
Add last and remove first will be good with LinkedList.
Search will be fast in arraylist being array index based.
Thanks,
In an interview, I was asked the following question:
Your application requires to store objects such that the order of
entries returned while iterating through the structure is
deterministic. In other words, if you iterate over the same
structure twice, the order of elements returned in both iterations
will be the same. Which of the following classes would you use?
Assume that structure is not mutated. (Check ANY that apply)
HashMap
LinkedHashSet
HashTable
LinkedHashMap
TreeSet
TreeMap
I suggested using a LinkedHashSet. Was this the correct answer? Why or why not?
A determenistic order just means that it's constantly reproducible - the same input will always provide the same iteration order. In this case, the answer is "all of the above". Although most Set's and Map's ordering can't be trusted, it is still determenistic, and will remain the same until the underlying implementation is changed (e.g., if you change or upgrade JVMs).
A predictable order is something more, though - it means that the collection guarantees the order items are returned when iterating the collection. Both "linked" types you mentioned above do that - the order that items were inserted to the collection is the order they will be returned when iterating over it. The "tree" types also guarantee a deterministic order of iteration - a sorted one.
As noted by #Elliot Frisch, they are all "deterministic" and will iterate in the same order if nothing has changed. That said, to paraphrase Animal House, some collections are more deterministic than others. :-)
Hash... collections have a deterministic iteration order which the JVM can "predict", but is very challenging for a human to predict and not worth the effort. In practice, they are not "predictable". As #Mureinik points out, the order is officially "unspecified" and subject to change of you change JVMs. The API docs describe this as "generally chaotic ordering" and all sane programmers would agree.
Linked... collections have "predictable iteration order" in that they iterate in the order elements were inserted, with the important caveat that if you insert the same element twice it retains the original order. i.e.
add("Tom");
add("Fred");
add("Tom");
would iterate "Tom", "Fred", not "Fred", "Tom"
This is clearly "more predictable" than Hash..., but still a bit challenging if elements get inserted multiple times and ordering is crucial. For stuff like properties files, XML, or JSON, Linked... collections are generally a good choice as they maintain the original order for nicer human viewing and comparison.
Tree... collections iterate the "most predictably", using the ordering provided by a Comparator at construction time, or else the "natural ordering" if the elements are Comparable. Assuming you have a predicable comparison method, they are completely predictable. In the Tom/Fred example, it would always iterate as "Fred", "Tom", unless your Comparator is unusual.
When answering this type of questions, I would highly suggest doing so according to the Java API Specification and not based on your assumptions of the implementations. So for example, even though you could argue that all the of those collections would have a deterministic iteration order provided that they are not mutated between repeated iterations, because you think it would not make sense to implement it that way, the only real answer—according to the options you listed—strictly adhering to the Java API Specification would be all except HashMap and HashTable.
The reason for answering this is that of all of them, according to the Java API Specification, the only classes that give no guarantee on the iteration order are those two I mentioned (HashMap and HashTable). So in general, when you program in Java, you should never assume an specific implementation of the API and or the JVM or anything that is based on an Specification unless that Specification.
So, as an example, what could be a problem of assuming a particular implementation for the HashSet collection? The thing is that the JDK that might be used to run your program (which is not necessarily the same you use for developing) could implement the HashTable with certain optimizations as long as it doesn't violate the Java API Specification, and such optimization could be such that if for example, a HashTable is not used for 10 minutes, then it could call the rehash function automatically in order to reorganized and optimize the access to its entries. And this is a possible scenario, because as noted here https://docs.oracle.com/javase/7/docs/api/java/util/Hashtable.html "The exact details as to when and whether the rehash method is invoked are implementation-dependent".
What is the most efficient way of maintaining a list that does not allow duplicates, but maintains insertion order and also allows the retrieval of the last inserted element in Java?
Try LinkedHashSet, which keeps the order of input.
Note that re-inserting an element would update its position in the input order, thus you might first try and check whether the element is already contained in the set.
Edit:
You could also try the Apache commons collections class ListOrderedSet which according to the JavaDoc (if I didn't missread anything again :) ) would decorate a set in order to keep insertion order and provides a get(index) method.
Thus, it seems you can get what you want by using new ListOrderedSet(new HashSet());
Unfortunately this class doesn't provide a generic parameter, but it might get you started.
Edit 2:
Here's a project that seems to represent commons collections with generics, i.e. it has a ListOrderedSet<E> and thus you could for example call new ListOrderedSet<String>(new HashSet<String>());
I don't think there's anything in the JDK which does this.
However, LinkedHashMap, which is used as the basis for LinkedHashSet, comes close: it maintains a circular doubly-linked list of the entries in the map. It only tracks the head of the list not the tail, but because the list is circular, header.before is the tail (the most recently inserted element).
You could therefore implement what you need on top of this. LinkedHashMap has not been designed for extension, so this is somewhat awkward. You could copy the code into your own class and add a suitable last() method (be aware of licensing issues here), or you could extend the existing class, and add a method which uses reflection to get at the private header and before fields.
That would get you a Map, rather than a Set. However, HashSet is already a wrapper which makes a Map look like a Set. Again, it is not designed for general extension, but you could write a subclass whose constructor calls the super constructor, then uses more reflection to replace the superclass's value of map with an instance of your new map. From there on, the class should do exactly what you want.
As an aside, the library classes here were all written by Josh Bloch and Neal Gafter. Those guys are two of the giants of Java. And yet the code in there is largely horrible. Never meet your heroes.
Just use a TreeSet.
Is there any practical difference between a Set and Collection in Java, besides the fact that a Collection can include the same element twice? They have the same methods.
(For example, does Set give me more options to use libraries which accept Sets but not Collections?)
edit: I can think of at least 5 different situations to judge this question. Can anyone else come up with more? I want to make sure I understand the subtleties here.
designing a method which accepts an argument of Set or Collection. Collection is more general and accepts more possibilities of input. (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Collection.)
designing a method which returns a Set or Collection. Set offers more guarantees than Collection (even if it's just the guarantee not to include one element twice). (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Set.)
designing a class that implements the interface Set or Collection. Similar issues as #2. Users of my class/interface get more guarantees, subclassers/implementers have more responsibility.
designing an interface that extends the interface Set or Collection. Very similar to #3.
writing code that uses a Set or Collection. Here I might as well use Set; the only reasons for me to use Collection is if I get back a Collection from someone else's code, or if I have to handle a collection that contains duplicates.
Collection is also the supertype of List, Queue, Deque, and others, so it gives you more options. For example, I try to use Collection as a parameter to library methods that shouldn't explicitly depend on a certain type of collection.
Generally, you should use the right tool for the job. If you don't want duplicates, use Set (or SortedSet if you want ordering, or LinkedHashSet if you want to maintain insertion order). If you want to allow duplicates, use List, and so on.
I think you already have it figured out- use a Set when you want to specifically exclude duplicates. Collection is generally the lowest common denominator, and it's useful to specify APIs that accept/return this, which leaves you room to change details later on if needed. However if the details of your application require unique entries, use Set to enforce this.
Also worth considering is whether order is important to you; if it is, use List, or LinkedHashSet if you care about order and uniqueness.
See Java's Collection tutorial for a good walk-through of Collection usage. In particular, check out the class hierarchy.
As #mmyers states, Collection includes Set, as well as List.
When you declare something as a Set, rather than a Collection, you are saying that the variable cannot be a List or a Map. It will always be a Collection, though. So, any function that accepts a Collection will accept a Set, but a function that accepts a Set cannot take a Collection (unless you cast it to a Set).
One other thing to consider... Sets have extra overhead in time, memory, and coding in order to guarantee that there are no duplicates. (Time and memory because sets are usually backed by a HashMap or a Tree, which adds overhead over a list or an array. Coding because you have to implement the hashCode() and equals() methods.)
I usually use sets when I need a fast implementation of contains() and use Collection or List otherwise, even if the collection shouldn't have duplicates.
You should use a Set when that is what you want.
For example, a List without any order or duplicates. Methods like contains are quite useful.
A collection is much more generic. I believe that what mmyers wrote on their usage says it all.
The practical difference is that Set enforces the set logic, i.e. no duplicates and unordered, while Collection does not. So if you need a Collection and you have no particular requirement for avoiding duplicates then use a Collection. If you have the requirement for Set then use Set. Generally use the highest interface possibble.
As Collection is a super type of Set and SortedSet these can be passed to a method which expects a Collection. Collection just means it may or may not be sorted, order or allow duplicates.