Order of element in hashSet [duplicate] - java

This question already has answers here:
Is the order of values retrieved from a HashMap the insertion order
(6 answers)
Closed 7 years ago.
I have read in java 1.7 docs that "It makes no guarantees as to the iteration order of the set".
what is meaning of this?
I created a HashSet print its elements 1000 times. but every time i get a fixed order.
however order is not same as insertion order of element.
Set<String> hashSet = new HashSet<>();
for (int i = 0; i < 10; i++) {
hashSet.add("Item+" + i);
}
for (String s : hashSet) {
System.out.println(s);
}

You should try adding a lot more elements (say, 10.000) to the set. The HashSet has a default capacity of 16, but once you add more elements to the set, it is internally reconstructed. In that case, the order might change.

It means that you can not be sure that the order will be the same, for instance if you run the same code on another JVM.
The fact that the order is always the same on your machine, using one specific JVM is irrelevant. If order is important, consider using a TreeSet, a TreeSet will guarantee that the order is always the same, no matter where you run your code.
Of course: a TreeSet requires that the items can be ordered in some way (e.g. alphabetically). If you want to preserve the order in which elements are added, you may prefer a List such as an ArrayList.

The order of the entries in a HashMap or HashSet is predictable in theory for current generation and older implementations.
However, the prediction depends on at least:
the hash values of the keys,
the initial capacity of the set or map,
the precise sequence in which the keys were added to and removed from the set / map,
the specific implementation of HashSet or HashMap used (the behaviour is Java version dependent, and possibly depended on patch level), and
for Java 8 and later, whether or not the keys are Comparable.
If you have all of that information (and you are prepared to emulate the insertion / removal sequence), you can accurately predict the iteration order. However, it would be tricky to implement, and expensive to run ...
In your example, the hash values are the same, the initial HashSet capacity is the same, the insertion order is the same, and the HashSet implementation is the same. In those circumstances (and given the precise algorithms used) the iteration order is going to repeatable ... even if though it would difficult to predict.
In this case, the order is not "random" because there is no randomness in the process that builds the HashSet. Just calculations that are complicated and opaque ... but deterministic.
I have read in java 1.7 docs that "It makes no guarantees as to the iteration order of the set". what is meaning of this?
What is means is that the javadoc is not committing to any specific behaviour vis-a-vis the ordering. Certainly, there is no commitment to portable behaviour.
See also: Order of values retrieved from a HashMap

In hash collections, entries appear sorted by result of some internal hashing function.
For the same set of entries added to the same collection in the same order, returning order will be always the same though hash function values also remains the same, except if internal structure is reorganized between calls (i. e. by expanding or shrinking of the collection) - on reorganization, values of internal hashing function are recalculated and entries take another places in internal hash table.
BTW, entry iterator of hash collection guarantees only that you will receive all entries you've put there which wasn't removed.

Maybe you can see same "sorting" but this is not real, it's up to the JVM so, if you want a sorted list
If you have a logical sorting use Collections.sort() or implement your own Comparator
If you want the Collection sorted by insertion order use a List and Iterator
List iterators guarantee first and foremost that you get the list's elements in the internal order of the list (aka. insertion order). More specifically it is in the order you've inserted the elements or on how you've manipulated the list. Sorting can be seen as a manipulation of the data structure, and there are several ways to sort the list.

Related

Is the order of HashMap elements reproducible?

First of all, I want to make it clear that I would never use a HashMap to do things that require some kind of order in the data structure and that this question is motivated by my curiosity about the inner details of Java HashMap implementation.
You can read in the java documentation on Object about the Object method hashCode.
I understand from there that hashCode implementation for classes such as String and basic types wrappers (Integer, Long,...) is predictable once the value contained by the object is given. An example of that would be that calls to hashCode for any String object containing the value hello should return always: 99162322
Having an algorithm that always insert into an empty Java HashMap where Strings are used as keys the same values in the same order. Then, the order of its elements at the end should be always the same, am I wrong?
Since the hash code for a concrete value is always the same, if there are not collisions the order should be the same.
On the other hand, if there are collisions, I think (I don't know the facts) that the collisions resolutions should result in the same order for exactly the same input elements.
So, isn't it right that two HashMap objects with the same elements, inserted in the same order should be traversed (by an iterator) giving the same elements sequence?
As far as I know the order (assuming we call "order" the order of elements as returned by values() iterator) of the elements in HashMap are kept until map rehash is performed. We can influence on probability of that event by providing capacity and/or loadFactor to the constructor.
Nevertheless, we should never rely on this statement because the internal implementation of HashMap is not a part of its public contract and is a subject to change in future.
I think you are asking "Is HashMap non-deterministic?". The answer is "probably not" (look at the source code of your favourite implementation to find out).
However, bear in mind that because the Java standard does not guarantee a particular order, the implementation is free to alter at any time (e.g. in newer JRE versions), giving a different (yet deterministic) result.
Whether or not that is true is entirely dependent upon the implementation. What's more important is that it isn't guaranteed. If you order is important to you there are options. You could create your own implementation of Map that does preserve order, you can use a SortedMap/LinkedHashMap or you can use something like the apache commons-collections OrderedMap: http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/OrderedMap.html.

Searching LinkedHashMap, faster method than sequential?

I am wondering if there is a more efficient method for getting objects out of my LinkedHashMap with timestamps greater than a specified time. I.e. something better than the following:
Iterator<Foo> it = foo_map.values().iterator();
Foo foo;
while(it.hasNext()){
foo = it.next();
if(foo.get_timestamp() < minStamp) continue;
break;
}
In my implementation, each of my objects has essentially three values: an "id," "timestamp," and "data." The objects are insterted in order of their timestamps, so when I call an iterator over the set, I get ordered results (as required by the linked hashmap contract). The map is keyed to the object's id, so I can quickly lookup them up by id.
When I look them up by a timestamp condition, however, I get an iterator with sorted results. This is an improvement over a generic hashmap, but I still need to iterate sequentially over much of the range until I find the next entry with a higher timestamp than the specified one.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential? If I went with a treemap as an alternative, would it offer overall speed advantages, or is it doing essentially the same thing in the background? Since the collection is sorted by insertion order already, I'm thinking tree map has a lot more overhead I don't need?
There is no faster way ... if you just use a LinkedHashMap.
If you want faster access, you need to use a different data structure. For example, a TreeSet with an appropriate comparator might be a better solution for this aspect of your problem. For example if your TreeSet is ordered by date, then calling tailSet with an appropriate dummy value can give you all elements greater or equal to a given date.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential?
Not for a LinkedHashMap.
However, if the ordered list was an ArrayList instead, then you could use "binary search" on the list ... provided that you could lock it to prevent concurrent modifications while you are searching. (Actually, concurrency is a potential issue to consider no matter how you implement this ... including your current linear search.)
If you want to keep the ability to do id lookups, then you need two data structures; e.g. a TreeSet and a HashMap which share their element objects. A TreeSet will probably be more efficient than trying to maintain an ArrayList in order assuming that there are random insertions and/or random deletions.

How to test if a Java iterator always uses the same order (reproducible ordering)?

I have a code in which for-each-loops on a Set need to rely on the fact that the iterator returns the elements always in the same order, e.g.
for(ParameterObject parameter : parameters) {
/* ... */
}
The iterators returned by HashSet are not guaranteed to have this property, however it is documented that the iterators of LinkedHashSet do have this property. So my code uses a LinkedHashSet and everything works fine.
However, I am wondering if I could endow the my code with a check that the set passed to it conforms to the requirement. It appears as if this is not possible (except of a direct test on LinkedHashSet). There is no interface implemented by LinkedHashSet which I could test on and there is no interface implemented by LinkedHashSet.iterator() which I could test on. It would be nice if there is an interface like OrderConsistentCollection or OrderConsistentIterator.
(I need this property here).
There isn't a way you can check for it -- but you can ensure it anyway, by simply copying the set into a collection that does have that property. A LinkedHashSet would do the trick, but if all you need is the iteration, an ArrayList would probably serve you better.
List<Foo> parameters = new ArrayList<>(parametersSet);
Now parameters will always return an iterator with the same ordering.
That said, you'd probably be fine with Evgeniy Dorofeev's suggestion, which points out that even the sets that don't guarantee a particular ordering usually do have a stable ordering (even if they don't guarantee it). HashSet acts that way, for instance. You'd actually have to have a pretty funky set, or take active randomization measures, to not have a stable ordering.
HashSet's ordering is not guaranteed, but it depends on the hash codes of its elements as well as the order in which they were inserted; they don't want to guarantee anything because they don't want to lock themselves into any one strategy, and even this loose of a contract would make for essentially random order if the objects' hash codes came from Object.hashCode(). Rather than specifying an ordering with complex implications, and then saying it's subject to change, they just said there's no guarantees. But those are the two factors for ordering, and if the set isn't being modified, then those two factors are going to be stable from one iteration to the next.
'HashSet.iterator does not return in any particular order' means that the elements returned by iterator are not sorted or ordered like in List or LinkedHashSet. But the HashSet.iterator will always return the elements in one and the same order while the HashSet is the same.
HashSet iterator is actually predictable, see this
HashSet set = new HashSet();
set.add(9);
set.add(2);
set.add(5);
set.add(1);
System.out.println(set);
I can foretell the output, it will be 1, 2, 5, 9. Because the elements kind of sorted by hashCode.

How do HashSets in Java work? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How does Java hashmap work?
Can someone explain to me how HashSets in java work and why they are faster than using ArrayLists?
A HashSet is actually a HashMap where the value is always the same.
The way a HashMap works is described in many places (it is referred to as "hashtable" as well). In short: it generates hashes of keys (objects) and positions them into a table. Then each time you look for a key, its hash is computed and the bucket in the table is referenced directly. This means you have just one operation (best case) to access the map.
The HashSet simply contains the keys, so .contains(..) is O(1). That and remove(..) are the only operations a HashSet is faster than an ArrayList (which is O(n)). Iteration is the same, addition is the same.
First, HashSet, unlike ArrayList is a Set: It cannot contain duplicates while ArrayList can - so they are built for different purposes. It also does not guarantee ordering - again, unlike a list.
Second - a HashSet is built on the hash table data structure, that allows O(1) seek time for an element.
Note that many times, a HashSet is slower then an ArrayList - if you want to iterate on elements for example - usually doing it in an ArrayList will be faster then in a HashSet [because of bad cache performance of hash, among other reasons]
These are 2 different data structures.
The concept behind HashSet is key probing.
I.e. you use a transformation of the input key to get an index of the location of the value in an array.
This is a constant O(1) operation since an array allows random access.
The arraylist is also O(1) operation for access since it is also backed by an array.
But only for random access and insertion.
The search though is O(N) operation for an arraylist since you have to search through all the elemements in the list to get to the value unlike the HashSet where you just transform the key and access the array. Search in a HashSet is O(1)
As a matter of fact, for example iterating over and appending to an ArrayList is faster.
And heck, you cannot even sort a HashSet.
But the fastest of all is the NoOp. There is nothing just remotely as fast as the NoOp. Granted, it doesn't do much, the NoOp. But it's really fast at that!
You need to be more precise in what you consider to be "faster than".

TreeMap or HashMap? [duplicate]

This question already has answers here:
Difference between HashMap, LinkedHashMap and TreeMap
(17 answers)
What is the difference between a HashMap and a TreeMap? [duplicate]
(8 answers)
Closed 8 years ago.
When to use hashmaps or treemaps?
I know that I can use TreeMap to iterate over the elements when I need them to be sorted.
But is just that? There is no optimization when I just want to consult the maps, or some optimal specific uses?
TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately.
Unless you need the entries to be sorted, I'd stick with HashMap. Or there's ConcurrentHashMap of course. I can't remember the details of the differences between all of them, but HashMap is a perfectly reasonable "default" option :)
For completeness, I should point out that there was a discussion on Stack Overflow a month or so ago about the internals of various maps. See the comments in this question, which I will copy into this answer if bestsss is happy for me to do so.
Hashtables (usually) perform search operations (look up) bounded within the complexity of O(n)<=T(n)<=O(1), with an average case complexity of O(1 + n/k); however, binary search trees, (BST's), perform search operations (lookup) bounded within the complexity of O(n)<=T(n)<=O(log_2(n)), with an average case complexity of O(log_2(n)). The implementation for each (and every) data structure should be known (by you), to understand the advantages, drawbacks, time complexity of operations, and code complexity.
For example, the number of entries in a hashtable often have some fixed number of entries (some part of which may not be filled at all) with lists of collisions. Trees, on the other hand, usually have two pointers (references) per node, but this can be more if the implementation allows more than two child nodes per node, and this allows the tree to grow as nodes are added, but may not allow duplicates. (The default implementation of a Java TreeMap does not allow for duplicates)
There are special cases to consider as well, for example, what if the number of elements in a particular data structure increases without bound or approaches the limit of an underlying part of the data structure? What about amortized operations that perform some rebalancing or cleanup operation?
For example, in a hashtable, when the number of elements in the table become sufficiently large, and arbitrary number of collisions can occur. On the other hand, trees usually require come re-balancing procedure after an insertion (or deletion).
So, if you have something like a cache (Ex. the number of elements in bounded, or size is known) then a hashtable is probably your best bet; however, if you have something more like a dictionary (Ex. populated once and looked up many times) then I'd use a tree.
This is only in the general case, however, (no information was given). You have to understand process that happen how they happen to make the right choice in deciding which data structure to use.
When I need a multi-map (ranged lookup) or sorted flattening of a collection, then it can't be a hashtable.
The largest difference between the two is the underlying structure used in the implementation.
HashMaps use an array and a hashing function to store elements. When you try to insert or delete an item in the array the hashing function converts the key into an index on the array where the object is/should be stored (ignoring conflicts). While hashmaps are generally very fast because they don't need to iterate over large amounts of data, they slow down when they're filled because they need to copy all the key/values into a new array.
TreeMaps store a the data in a sorted tree structure. While this means that they'll never have to allocate more space and copy over to it, operations require that part of the data already stored be iterated over. Sometimes changing large amounts of the structure.
Out of the two Hashmaps will generally have better performance when you don't need sorting.
Inserting new elements into a HashMap will, on average, be a good deal faster than inserting elements into a TreeMap. Unless you need your elements sorted, I'd go with the HashMap.
Don't forget there is also LinkedHashMap which is nearly as fast as HashMap for add/contains/remove operations but also maintains the insertion order.

Categories