Google Collections ImmutableMap iteration order

Google Collections ImmutableMap iteration order - java

I need combination of Google Collection ImmutableMap and LinkedHashMap — immutable map with defined iteration order. It seems that ImmutableMap itself actually has defined iteration order, at least its documentation says:
An immutable, hash-based Map with reliable user-specified iteration order.
However there are no more details. Quick test shows that this might be true, but I want to make sure.
My question is: can I rely on iteration order of ImmutableMap? If I do ImmutableMap.copyOf(linkedHashMap), will it have same iteration order as original linked hash map? What about immutable maps created by builder? Some link to authoritative answer would help, since Google didn't find anything useful. (And no, links to the sources don't count).

To be more precise, the ImmutableMap factory methods and builder return instances that follow the iteration order of the inputs provided when the map in constructed. However, an ImmutableSortedMap, which is a subclass of ImmutableMap. sorts the keys.

I've actually found discussion about this, with answers from library authors:
Kevin Bourrillion: What we mean by "user-specified" is "it can be whatever order you want it to
be"; in other words, whatever order you provide the entries to us in the
first place, that's the order we use.
Jared Levy: You can also copy a TreeMap or LinkedHashMap that have the desired order.
Yes, I should have believed the javadoc, although I think that javadoc can be better in this case. It seems I'm not first who was confused by it. If nothing else, this Q/A will help Google next time someone searches for "ImmutableMap iteration" :-)

You should believe the javadoc. If it is not enough, read the source code or report the bug.
A quick view to the source code shows that the map is backed by array and iteration will be done through ImmutableSet that is also backed by an array. So I think the documentation is correct and the order of the elements will be kept as it is.

Related

Why doesn't LinkedHashSet have addFirst method?

As the documentation of LinkedHashSet states, it is
Hash table and linked list implementation of the Set interface, with
predictable iteration order. This implementation differs from HashSet
in that it maintains a doubly-linked list running through all of its
entries.
So it's essentially a HashSet with FIFO queue of keys implemented by a linked list. Considering that LinkedList is Deque and permits, in particular, insertion at the beginning, I wonder why doesn't LinkedHashSet have the addFirst(E e) method in addition to the methods present in the Set interface. It seems not hard to implement this.

As Eliott Frisch said, the answer is in the next sentence of the paragraph you quoted:
… This linked list defines the iteration ordering, which is the order
in which elements were inserted into the set (insertion-order). …
An addFirst method would break the insertion order and thereby the design idea of LinkedHashSet.
If I may add a bit of guesswork too, other possible reasons might include:
It’s not so simple to implement as it appears since a LinkedHashSet is really implemented as a LinkedHasMap where the values mapped to are not used. At least you would have to change that class too (which in turn would also break its insertion order and thereby its design idea).
As that other guy may have intended in a comment, they didn’t find it useful.
That said, you are asking the question the wrong way around. They designed a class with a functionality for which they saw a need. They moved on to implement it using a hash table and a linked list. You are starting out from the implementation and using it as a basis for a design discussion. While that may occasionally add something useful, generally it’s not the way to good designs.
While I can in theory follow your point that there might be a situation where you want a double-ended queue with set property (duplicates are ignored/eliminated), I have a hard time imagining when a Deque would not fulfil your needs in this case (Eliott Frisch mentioned the under-used ArrayDeque). You need pretty large amounts of data and/or pretty strict performance requirements before the linear complexity of contains and remove would be prohibitive. And in that case you may already be better off custom designing your own data structure.

Most Efficient but Thread-Safe List/Set

Java has tons of different Collections designed for concurrency and thread safety, and I'm at a loss as to which one to choose for my situation.
Multiple threads may be calling .add() and .remove(), and I will be copying this list frequently with something like List<T> newList = new ArrayList<T>(concurrentList). I will never be looping over the concurrent list.
I thought about something like CopyOnWriteArrayList, but I've read that it can be very inefficient because it copies itself every time it's modified. I'm hoping to find a good compromise between safety and efficiency.
What is the best list (or set) for this situation?

As #SpiderPig said, the best case scenario with a List would be an immutable, singly-linked list.
However, looking at what's being done here, a List is unnecessary (#bhspencer's comment). A ConcurrentSkipListSet will work most efficiently (#augray).
This Related Thread's accepted answer offers more insight on the pros and cons of different concurrent collections.

You might want to look into whether a ctrie would be appropriate for your use case - it has thread-safe add and remove operations, and "copying" (in actuality, taking a snapshot of) the data structure runs in O(1). I'm aware of two JVM implementations of the data structure: implementation one, implementation two.

Collections.newSetFromMap(new ConcurrentHashMap<...>())
This is typically how a normal Set is done (HashSet is really a modified wrapper over HashMap). It offers both the advantages of performance/concurrecy from ConcurrentHashMap, and does not have extra features like ConcurrentSkipListSet (ordering), COW lists (copying every modification), or concurrent queues (FIFO/LIFO ordering).
Edit: I didn't see #bhspencer's comment on the original post, apologies for stealing the spotlight.

Hashset being hashing based would be better than List.
Add last and remove first will be good with LinkedList.
Search will be fast in arraylist being array index based.
Thanks,

Which java data structures have a deterministic order of iteration?

In an interview, I was asked the following question:
Your application requires to store objects such that the order of
entries returned while iterating through the structure is
deterministic. In other words, if you iterate over the same
structure twice, the order of elements returned in both iterations
will be the same. Which of the following classes would you use?
Assume that structure is not mutated. (Check ANY that apply)
HashMap
LinkedHashSet
HashTable
LinkedHashMap
TreeSet
TreeMap
I suggested using a LinkedHashSet. Was this the correct answer? Why or why not?

A determenistic order just means that it's constantly reproducible - the same input will always provide the same iteration order. In this case, the answer is "all of the above". Although most Set's and Map's ordering can't be trusted, it is still determenistic, and will remain the same until the underlying implementation is changed (e.g., if you change or upgrade JVMs).
A predictable order is something more, though - it means that the collection guarantees the order items are returned when iterating the collection. Both "linked" types you mentioned above do that - the order that items were inserted to the collection is the order they will be returned when iterating over it. The "tree" types also guarantee a deterministic order of iteration - a sorted one.

As noted by #Elliot Frisch, they are all "deterministic" and will iterate in the same order if nothing has changed. That said, to paraphrase Animal House, some collections are more deterministic than others. :-)
Hash... collections have a deterministic iteration order which the JVM can "predict", but is very challenging for a human to predict and not worth the effort. In practice, they are not "predictable". As #Mureinik points out, the order is officially "unspecified" and subject to change of you change JVMs. The API docs describe this as "generally chaotic ordering" and all sane programmers would agree.
Linked... collections have "predictable iteration order" in that they iterate in the order elements were inserted, with the important caveat that if you insert the same element twice it retains the original order. i.e.
add("Tom");
add("Fred");
add("Tom");
would iterate "Tom", "Fred", not "Fred", "Tom"
This is clearly "more predictable" than Hash..., but still a bit challenging if elements get inserted multiple times and ordering is crucial. For stuff like properties files, XML, or JSON, Linked... collections are generally a good choice as they maintain the original order for nicer human viewing and comparison.
Tree... collections iterate the "most predictably", using the ordering provided by a Comparator at construction time, or else the "natural ordering" if the elements are Comparable. Assuming you have a predicable comparison method, they are completely predictable. In the Tom/Fred example, it would always iterate as "Fred", "Tom", unless your Comparator is unusual.

When answering this type of questions, I would highly suggest doing so according to the Java API Specification and not based on your assumptions of the implementations. So for example, even though you could argue that all the of those collections would have a deterministic iteration order provided that they are not mutated between repeated iterations, because you think it would not make sense to implement it that way, the only real answer—according to the options you listed—strictly adhering to the Java API Specification would be all except HashMap and HashTable.
The reason for answering this is that of all of them, according to the Java API Specification, the only classes that give no guarantee on the iteration order are those two I mentioned (HashMap and HashTable). So in general, when you program in Java, you should never assume an specific implementation of the API and or the JVM or anything that is based on an Specification unless that Specification.
So, as an example, what could be a problem of assuming a particular implementation for the HashSet collection? The thing is that the JDK that might be used to run your program (which is not necessarily the same you use for developing) could implement the HashTable with certain optimizations as long as it doesn't violate the Java API Specification, and such optimization could be such that if for example, a HashTable is not used for 10 minutes, then it could call the rehash function automatically in order to reorganized and optimize the access to its entries. And this is a possible scenario, because as noted here https://docs.oracle.com/javase/7/docs/api/java/util/Hashtable.html "The exact details as to when and whether the rehash method is invoked are implementation-dependent".

How to store list of countries in Java

I need a Java structure which can store list of all countries. Which Java data structure will you recommend?

You may use Set collection implementations like HashSet which avoids duplicates (if just names). If you want to keep country code and name, then may be Map collection.

Are you going to iterate over the collection?
If so, java.util.ArrayList.
Are you going to use it to do some kind of look up? Like a 'does this exist' scenario?
If so, java.util.HashSet
Do you need to attach additional information to each country?
If so, java.util.HashMap
Do you need a lookup and an ordered iteration?
If so, java.util.TreeSet.
There's also concurrency to be concerned about, but I didn't see any mention of it, so I'll leave off those guys.

You should use a HashSet which provides both the uniqueness of elements in a set and the constant time of key search.

Try to use Dictionary<k,v> , I recommend it

I think a HashSet should be suitable so long as you don't expect to have duplicates. HashSets provide constant time lookup which will speed up searches. You can consider other Thread-safe variants like Collections.SynchronizedSet or CopyOnWriteArrayList if you expect the data structure to be accessed and modified by multiple threads. You'll need to provide more details on the use case to narrow down your options

I've done very similar thing recently as I have a list of languages in my app. I just use an Enum to do that. List of countries is fairly stable, so you shouldn't need to recompile this class to often;)

Is there a way to create a List/Set which keeps insertion order and does not allow duplicates in Java?

What is the most efficient way of maintaining a list that does not allow duplicates, but maintains insertion order and also allows the retrieval of the last inserted element in Java?

Try LinkedHashSet, which keeps the order of input.
Note that re-inserting an element would update its position in the input order, thus you might first try and check whether the element is already contained in the set.
Edit:
You could also try the Apache commons collections class ListOrderedSet which according to the JavaDoc (if I didn't missread anything again :) ) would decorate a set in order to keep insertion order and provides a get(index) method.
Thus, it seems you can get what you want by using new ListOrderedSet(new HashSet());
Unfortunately this class doesn't provide a generic parameter, but it might get you started.
Edit 2:
Here's a project that seems to represent commons collections with generics, i.e. it has a ListOrderedSet<E> and thus you could for example call new ListOrderedSet<String>(new HashSet<String>());

I don't think there's anything in the JDK which does this.
However, LinkedHashMap, which is used as the basis for LinkedHashSet, comes close: it maintains a circular doubly-linked list of the entries in the map. It only tracks the head of the list not the tail, but because the list is circular, header.before is the tail (the most recently inserted element).
You could therefore implement what you need on top of this. LinkedHashMap has not been designed for extension, so this is somewhat awkward. You could copy the code into your own class and add a suitable last() method (be aware of licensing issues here), or you could extend the existing class, and add a method which uses reflection to get at the private header and before fields.
That would get you a Map, rather than a Set. However, HashSet is already a wrapper which makes a Map look like a Set. Again, it is not designed for general extension, but you could write a subclass whose constructor calls the super constructor, then uses more reflection to replace the superclass's value of map with an instance of your new map. From there on, the class should do exactly what you want.
As an aside, the library classes here were all written by Josh Bloch and Neal Gafter. Those guys are two of the giants of Java. And yet the code in there is largely horrible. Never meet your heroes.

Just use a TreeSet.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.