What does deterministic mean? - java

I am reading the Java Hashmap documentation but I don't understand this sentence.
Note that the iteration order for
HashMap is non-deterministic. If you
want deterministic iteration, use
LinkedHashMap.
What does deterministic mean?

The simplest definition:
Given the same inputs, you always get the same outputs.
Above, it's saying that iterating through the exact same HashMap may give different results at different times, even when you haven't changed anything. Usually that doesn't matter, but if it does, you should use a LinkedHashMap.

In an order which can be "determined" in advance.
Because of the way hashing works, the elements in the map are "scrambled" into arbitrary locations. The scrambling positions cannot easily be determined in advance -- they aren't determinable -- you don't know the resulting order.

In simpler terms: When you call keys(), values() or entrySet() you get back a collection, over which you can iterate. That line is saying you can't expect the order in which the iterator returns objects will be any particular order. Especially, it can be different from both the insertion order and the natural ordering by key values.
If you want the iterator to work in insertion order, use a LinkedHashMap. If you want to iterate by key value, use a TreeMap. Be aware that both of these have slightly worse performance than a plain HashMap, as they both have to do extra work to keep track of the order.

Strictly speaking, HashMap iteration order is almost certainly not non-deterministic. Like the vast majority of computational processes, if you go through it exactly the same way, the results will be exactly the same. A truly non-deterministic system would incorporate some external random element, which is highly unlikely to be the case here. At least in most cases.
What they really mean, I think, is that just because the map contains a particular set of elements, you shouldn't expect that when you iterate over them they will come out in a particular order. That doesn't mean the order of iteration is random, it just means that as a developer you shouldn't bank on knowing what it is.
In most cases, the reason for this is that there will be some dependency on some implementation details that can vary from platform to platform, and/or on order of access. And the latter may in turn be determined by thread scheduling and event timing, which are innately unpredictable.
In most cases, on any individual platform and with the most common threading model -- a single threaded application -- if you always insert and delete a particular set of things in sequence X you will always get them out in sequence Y. It's just that Y will be so exactly dependent on X, and on the platform, that there's no point even thinking about what it's going to be.
Basically, even though it isn't random, it might just as well be.

deterministic : can be determined
non-deterministic : can't be determined

It's an algorithm that when given a particular input will produce the same output.
A good example I found:
Consider a shopping list: a list of
items to buy.
It can be interpreted in two ways:
* The instruction to buy all of those items, in any order.
This is a nondeterministic algorithm.
* The instruction to buy all of those items, in the order given. This is a
deterministic algorithm.

Deterministic means the result is predictable / forseeable.

Non-deterministic means that there isn't one single result that you can figure it out beforehand. An arithmetical expression, like 1 + 2 or log e, is deterministic. There's exactly one correct answer and you can figure it out upfront. Throw a handful of sand in the air, and where each grain will fall is effectively non-deterministic for any major degree of accuracy.
This probably isn't precisely correct, as you could look at the source code of the underlying library and JVM implementation and there would probably be some way that you could determine the ordering that would result. It might be more correct for them to say,"No particular order is guaranteed," or something of that sort.
What's relevant in this case is that you can't rely on the ordering.

This is the property of HashMap where elements are not iterated in the same order in which they were inserted as HashMap does not insert elements in order. Therefore the line in documentation

Non deterministic means there is no well defined behaviour.
In the case of the HashMap depending on how you inserted elements you might have the one or other order of iteration.

HashMap doesn't maintain order what you add, if you want your output be the order what you add, you should use LinkedHashMap, so deterministic means output orderdly what you add in.
Here is example:
1.non-deterministic
HashMap<String, Integer> map = new HashMap<String,Integer>();
map.put("a",5);
map.put("b",16);
map.put("c",46);
System.out.println(map); //ouptput:{a=5, c=46, b=16}
2.deterministic
HashMap<String, Integer> map = new LinkedHashMap<String,Integer>();
map.put("a",5);
map.put("b",16);
map.put("c",46);
System.out.println(map); //output:{a=5, b=16, c=46}

Related

Order of element in hashSet [duplicate]

This question already has answers here:
Is the order of values retrieved from a HashMap the insertion order
(6 answers)
Closed 7 years ago.
I have read in java 1.7 docs that "It makes no guarantees as to the iteration order of the set".
what is meaning of this?
I created a HashSet print its elements 1000 times. but every time i get a fixed order.
however order is not same as insertion order of element.
Set<String> hashSet = new HashSet<>();
for (int i = 0; i < 10; i++) {
hashSet.add("Item+" + i);
}
for (String s : hashSet) {
System.out.println(s);
}
You should try adding a lot more elements (say, 10.000) to the set. The HashSet has a default capacity of 16, but once you add more elements to the set, it is internally reconstructed. In that case, the order might change.
It means that you can not be sure that the order will be the same, for instance if you run the same code on another JVM.
The fact that the order is always the same on your machine, using one specific JVM is irrelevant. If order is important, consider using a TreeSet, a TreeSet will guarantee that the order is always the same, no matter where you run your code.
Of course: a TreeSet requires that the items can be ordered in some way (e.g. alphabetically). If you want to preserve the order in which elements are added, you may prefer a List such as an ArrayList.
The order of the entries in a HashMap or HashSet is predictable in theory for current generation and older implementations.
However, the prediction depends on at least:
the hash values of the keys,
the initial capacity of the set or map,
the precise sequence in which the keys were added to and removed from the set / map,
the specific implementation of HashSet or HashMap used (the behaviour is Java version dependent, and possibly depended on patch level), and
for Java 8 and later, whether or not the keys are Comparable.
If you have all of that information (and you are prepared to emulate the insertion / removal sequence), you can accurately predict the iteration order. However, it would be tricky to implement, and expensive to run ...
In your example, the hash values are the same, the initial HashSet capacity is the same, the insertion order is the same, and the HashSet implementation is the same. In those circumstances (and given the precise algorithms used) the iteration order is going to repeatable ... even if though it would difficult to predict.
In this case, the order is not "random" because there is no randomness in the process that builds the HashSet. Just calculations that are complicated and opaque ... but deterministic.
I have read in java 1.7 docs that "It makes no guarantees as to the iteration order of the set". what is meaning of this?
What is means is that the javadoc is not committing to any specific behaviour vis-a-vis the ordering. Certainly, there is no commitment to portable behaviour.
See also: Order of values retrieved from a HashMap
In hash collections, entries appear sorted by result of some internal hashing function.
For the same set of entries added to the same collection in the same order, returning order will be always the same though hash function values also remains the same, except if internal structure is reorganized between calls (i. e. by expanding or shrinking of the collection) - on reorganization, values of internal hashing function are recalculated and entries take another places in internal hash table.
BTW, entry iterator of hash collection guarantees only that you will receive all entries you've put there which wasn't removed.
Maybe you can see same "sorting" but this is not real, it's up to the JVM so, if you want a sorted list
If you have a logical sorting use Collections.sort() or implement your own Comparator
If you want the Collection sorted by insertion order use a List and Iterator
List iterators guarantee first and foremost that you get the list's elements in the internal order of the list (aka. insertion order). More specifically it is in the order you've inserted the elements or on how you've manipulated the list. Sorting can be seen as a manipulation of the data structure, and there are several ways to sort the list.

Is the order of HashMap elements reproducible?

First of all, I want to make it clear that I would never use a HashMap to do things that require some kind of order in the data structure and that this question is motivated by my curiosity about the inner details of Java HashMap implementation.
You can read in the java documentation on Object about the Object method hashCode.
I understand from there that hashCode implementation for classes such as String and basic types wrappers (Integer, Long,...) is predictable once the value contained by the object is given. An example of that would be that calls to hashCode for any String object containing the value hello should return always: 99162322
Having an algorithm that always insert into an empty Java HashMap where Strings are used as keys the same values in the same order. Then, the order of its elements at the end should be always the same, am I wrong?
Since the hash code for a concrete value is always the same, if there are not collisions the order should be the same.
On the other hand, if there are collisions, I think (I don't know the facts) that the collisions resolutions should result in the same order for exactly the same input elements.
So, isn't it right that two HashMap objects with the same elements, inserted in the same order should be traversed (by an iterator) giving the same elements sequence?
As far as I know the order (assuming we call "order" the order of elements as returned by values() iterator) of the elements in HashMap are kept until map rehash is performed. We can influence on probability of that event by providing capacity and/or loadFactor to the constructor.
Nevertheless, we should never rely on this statement because the internal implementation of HashMap is not a part of its public contract and is a subject to change in future.
I think you are asking "Is HashMap non-deterministic?". The answer is "probably not" (look at the source code of your favourite implementation to find out).
However, bear in mind that because the Java standard does not guarantee a particular order, the implementation is free to alter at any time (e.g. in newer JRE versions), giving a different (yet deterministic) result.
Whether or not that is true is entirely dependent upon the implementation. What's more important is that it isn't guaranteed. If you order is important to you there are options. You could create your own implementation of Map that does preserve order, you can use a SortedMap/LinkedHashMap or you can use something like the apache commons-collections OrderedMap: http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/OrderedMap.html.

Is map/collection order stable between calls?

If I have a hash map and iterate over the objects repeatedly, is it correct that I'm not guaranteed the same order for every call? For example, could the following print two lines that differ from each other:
Map<String,Integer> map = new HashMap<String,Integer>()
{{ put("a", 1); put("b", 2); put("c", 3); }};
System.out.println(map);
System.out.println(map);
And is this the case for sets and collections in general? If so, what's the best way in case you have to iterate twice over the same collection in the same order (regardless of what order that is)? I guess converting to a list.
The contracts of Map and Set do not make any guarantees about iteration order, but those of SortedSet and SortedMap (implemented by TreeMap and TreeSet) do.
Furthermore, even the non-sorted implementations generally behave deterministically and have a repeatable iteration order for each specific instance, as long as it's not modified in any way. However, that's an implementation detail and should not be relied upon.
you're correct, the order of a map is not guaranteed. you might want to look at something like TreeMap if you need the order to stay the same between calls.
having said that, chances are that code will print out the same thing twice, it's just not guaranteed.
When using a HashMap, there's no guarantee that the iteration order will be the same on each iteration.
Instead, consider a LinkedHashMap, which is a hash table with a predictable iteration order.
I don't think any of the existing answers are quite speaking to exactly what you're asking about. Sure, a HashMap with certain contents may not iterate in the same way as a different one with equal contents, or in a different VM invocation, or running on different JDK versions, etc. But you're just asking whether that exact instance, if not modified, will iterate the same way as itself.
This is a good example of a de facto specification. It's true that it is not present in the letter of the spec that this will be the case. However, every single JDK collection behaves this way (provided, in the case of an access-ordered LinkedHashMap, that you iterate all the way through each time). And it is difficult to conceive of a collection implementation that would not have this property. I've implemented and reviewed many collections and there was just once that I considered a collection that would iterate differently each time; it was an extremely strange case and I ended up throwing out the whole idea because iterating differently each time was just too damn weird (that is, violated that de facto specification I mentioned).
So I say go ahead and depend on it. That's not a blanket recommendation to depend on any old unspecified behavior you want. But in this case, it's just not going to hurt you.
While the library spec does not guarantee that the order remains the same over time, it probably will be identical as long as the underlying data structure (i.e., the arrays that implement the hash table) are not altered. So as long as you don't insert or remove items from the hash table, it's not unreasonable to assume that the entry order will not change.
Looking at a typical implementation of HashMap shows this to be the case, such as the one at:
http://www.docjar.com/html/api/java/util/HashMap.java.html
That being said, it's not something your code should rely on.

Efficient EnumSet + List

Someone knows a nice solution for EnumSet + List
I mean I need to store enum values and I also need to preserve the order , and to be able to access its index of the enum value in the collection in O(1) time.
The closest thing I can come to think of, present in the API is the LinkedHashSet:
From http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashSet.html:
Hash table and linked list implementation of the Set interface, with predictable iteration order.
I doubt it's possible to do what you want. Basically, you want to look up indexes in constant time, even after modifying the order of the list. Unless you allow remove / reorder operations to take O(n) time, I believe you can't get away with lower than O(log n) (which can be achieved by a heap structure).
The only way I can see to satisfy ordering and O(1) access is to duplicate the data in a List and an array of indexes (wrapped in a nice little OrderedEnumSet, of course).

Is it OK to have a Java Comparator where order can change dynamically?

I have a set of time stamped values I'd like to place in a sorted set.
public class TimedValue {
public Date time;
public double value;
public TimedValue(Date time, double value) {
this.time = time;
this.value = value;
}
}
The business logic for sorting this set says that values must be ordered in descending value order, unless it's more than 7 days older than the newest value.
So as a test, I came up with the following code...
DateFormat dateFormatter = new SimpleDateFormat("MM/dd/yyyy");
TreeSet<TimedValue> mySet = new TreeSet<TimedValue>(new DateAwareComparator());
mySet.add(new TimedValue(dateFormatter.parse("01/01/2009"), 4.0 )); // too old
mySet.add(new TimedValue(dateFormatter.parse("01/03/2009"), 3.0)); // Most relevant
mySet.add(new TimedValue(dateFormatter.parse("01/09/2009"), 2.0));
As you can see, initially the first value is more relevant than the second, but once the final value is added to the set, the first value has expired and should be the least relevant.
My initial tests say that this should work... that the TreeSet will dynamically reorder the entire list as more values are added.
But even though I see it, I'm not sure I believe it.
Will a sorted collection reorder the entire set as each element is added? Are there any gotchas to using a sorted collection in this manner (i.e performance)? Would it be better to manually sort the list after all values have been added (I'm guessing it would be)?
Follow-up:
As many (and even I to a certain extent) suspected, the sorted collection does not support this manner of "dynamic reordering". I believe my initial test was "working" quite by accident. As I added more elements to the set, the "order" broke down quite rapidly. Thanks for all the great responses, I refactored my code to use approaches suggested by many of you.
I don't see how your comparator can even detect the change, unless it remembers the newest value it's currently seen - and that sounds like an approach which is bound to end in tears.
I suggest you do something along the following lines:
Collect your data in an unordered set (or list)
Find the newest value
Create a comparator based on that value, such that all comparisons using that comparator will be fixed (i.e. it will never return a different result based on the same input values; the comparator itself is immutable although it depends on the value originally provided in the constructor)
Create a sorted collection using that comparator (in whatever way seems best depending on what you then want to do with it)
I would advise against this for a few reasons:
Since it's basically a red-black tree behind the scenes (which doesn't necessarily have to be rebuilt from scratch on every insertion), you might easily end up with values in the wrong part of the tree (invalidating most of the TreeSet API).
The behavior is not defined in the spec, and thus may change later even if it's working now.
In the future, when anything goes strangely wrong in anything remotely touching this code, you'll spend time suspecting that this is the cause.
I would recommend either recreating/resorting the TreeSet before searching it or (my preference) iterating through the set before the search and removing any of the objects that are too old. You could even, if you wanted to trade some memory for speed, keep a second list ordered by date and backed by the same objects so that you all you would have to do to filter your TreeSet is remove objects from the TreeSet based on the time-sorted list.
I don't believe the JDK libraries or even 3rd party libraries are written to handle a comparator whose results are not consistent. I wouldn't depend on this working. I would worry more if your Comparator can return not-equal for two values when called one time and can return equal for the same two values if called later.
Read carefully the contract of Comparator.compare(). Does your Comparator satisfy those constraints?
To elaborate, if your Comparator returns that two values are not equals when you call it once, but then later returns that the two values are equal because a later value was added to the set and has changed the output of the Comparator, the definition of "Set" (no duplicates) becomes undone.
Jon Skeet's advice in his answer is excellent advice and will avoid the need to worry about these sorts of problems. Truly, if your Comparator does not return values consistent with equals() then you can have big problems. Whether or not a sorted set will re-sort each time you add something, I wouldn't depend on, but the worst thing that would occur from order changing is your set would not remain sorted.
No, this won't work.
If you are using comparable keys in a collection, the results of the comparison between two keys must remain the same over time.
When storing keys in a binary tree, each fork in the path is chosen as the result of the comparison operation. If a later comparison returns a different result, a different fork will be taken, and the previously stored key will not be found.
I am 99% certain this will not work. If a value in the Set suddenly changes its comparison behaviour, it is possible (quite likely, actually) that it will not be found anymore; i.e. set.contains(value) will return false, because the search algorithm will at one point do a comparison and continue in the wrong subtree because that comparison now returns a different result than it did when the value was inserted.
I think the non-changing nature of a Comparator is supposed to be on a per-sort basis, so as long as you are consistent for the duration of a given sorting operation, you are ok (so long as none of the items cross the 7 day boundary mid-sort).
However, you might want to make it more obvious that you are asking specifically about a TreeSet, which I imagine re-uses information from previous sorts to save time when you add a new item so this is a bit of a special case. The TreeSet javadocs specifically defer to the Comparator semantics, so you are probably not officially supported, but you'd have to read the code to get a good idea of whether or not you are safe.
I think you'd be better off doing a complete sort when you need the data sorted, using a single time as "now" so that you don't risk jumping that boundary if your sort takes long enough to make it likely.
It's possible that a record will change from <7 days to >7 days mid-sort, so what you're doing violates the rules for a comparator. Of course this doesn't mean it won't work: lots of things that are documented as "unpredictable" in fact work if you know exactly what is happening internally.
I think the textbook answer is: This is not reliable with the built-in sorts. You would have to write your own sort function.
At the very least, I would say that you can't rely on a TreeSet or any "sorted structure" magically resorting itself when dates roll over the boundary. At best this might work if you re-sort just before displaying, and don't rely on anything remaining correct between updates.
At worst, inconsistent comparisons might break the sorts badly. You have no assurance that this won't put you into an infinite loop or some other deadly black hole.
So I'd say: Read the source code from Sun for whatever classes or functions you plan to use, and see if you can figure out what will happen. Testing is good, but there are potentially tricky cases that are difficult to test. THe most obvious is: What if while it's in the process of sorting, a record rolls over the date boundary? That is, it might look at a record once and say it's <7 but the next time it sees it it's >7. That could be bad, bad news.
One obvious trick that occurs to me: Convert the date to an age at the time you add the record to the structure, rather than dynamically. That way it can't change within the sort. If the structure is going to live for more than a few minutes, recalculate the ages at some appropriate time and then re-sort. I doubt someone will say your program is incorrect because you said a record was less than 7 days old when really it's 7 days, 0 hours, 0 minutes, and 2 seconds old. Even if someone noticed, how accurate is their watch?
As already noted, the Comparator cannot do this for you, because the transitivity is violated. Basically, in order to be able to sort the items, you must be able to compare each two of them (independent of the rest), which obviously you cannot do. So your scenario basically either would not work or would produce not consistent result.
Maybe something simpler would be good enough for you:
apply simple Comparator that uses the Value as you need
and simply remove from your list/collection all elements that are 7 days older then the newest. Basically whenever a new item is added, you check if it is the newest, and if it is, remove those that are 7 days older then this.
This would not work if you also remove the items from the list, in which case you would need to keep all those you removed in a separate list (which by the way you would sort by date) and add those back to the original list in case the MAX(date) is smaller after removal.

Categories