why does iterator not enforce hasnext call? - java

It seems like quite a good method if hasNext and Next worked like this:
boolean hasNextCalled = false;
boolean hasNext() {
hasNextCalled = true
}
next() {
assert hasNextCalled
}
This way we would never land up in a case where we would get NoSuchElementException().
Any practical reason why hasNext() call is not enforced ?

What would be the benefit? You're simply replacing a NoSuchElementException with an AssertionError, plus introducing a tiny bit of overhead. Also, since Iterator is an interface, you couldn't implement this once; it would have to go in every implementation of Iterator. Plus the documentation doesn't impose a requirement to call hasNext before calling next, so your proposal would break the current contract. Such a change would break any code that was written to rely on a NoSuchElementException. Finally, assertions can be turned off in production code, so you would still need the NoSuchElementException mechanism.

NoSuchElementException is a runtime exception, and reflects programmer error...exactly like your approach does. It's not obligatory to call hasNext() because maybe you don't need to -- you know the size of the collection in advance, for example, and know how many calls to next() you can make.
The point is that you're exchanging one way of reporting programmer error for...another way of reporting programmer error that can disable some useful approaches.

Maybe we already know that there are elements left. For example, maybe we're iterating over two equally-sized lists in lockstep, and we only need to call hasNext on one iterator to check for both. Also, asserting the hasNext call doesn't actually prevent anyone from calling next without hasNext, especially if assertions are off.

You may know there's a next(), for example if you always have pairs of elements, one call to hasNext() will allow two calls to next().

My 2 cents. Here the API makes an assumption about the client usage and forces that. Let us say I am always sure that I get back only 1 result then it is better to by pass the hasNext() call and directly retrieve the element by calling just next()

Firstly, the assert you suggest would only run when assertions are enabled.
But the main issue is that you consider only one use-case. The library is designed to support programmers in their work with minimal restrictions, and each class and method has to fit into a cohesive and coherent whole.
Other posters give good reasons also (as I was typing), especially that Iterator is an Interface, and has many implementations.

Related

Is !list.isEmpty() and list.size()>0 equal?

I've seen code as below:
if (!substanceList.isEmpty() && (substanceList.size() > 0))
{
substanceText = createAmountText(substanceList);
}
Would the following be a valid refactor?
if (!substanceList.isEmpty())
{
substanceText = createAmountText(substanceList);
}
I would be grateful for an explanation of the above code and whether the second version may cause errors?
If in doubt, read the Javadoc:
Collection.isEmpty():
Returns true if this collection contains no elements.
Collection.size():
Returns the number of elements in this collection
So, assuming the collection is implemented correctly:
collection.isEmpty() <=> collection.size() == 0
Or, conversely:
!collection.isEmpty() <=> collection.size() != 0
Since the number of elements should only be positive, this means that:
!collection.isEmpty() <=> collection.size() > 0
So yes, the two forms are equivalent.
Caveat: actually, they're only equivalent if your collection isn't being modified from another thread at the same time.
This:
!substanceList.isEmpty() && (substanceList.size() > 0)
is equivalent to, by the logic I present above:
!substanceList.isEmpty() && !substanceList.isEmpty()
You can only simplify this to
!substanceList.isEmpty()
if you can guarantee that its value doesn't change in between evaluations of substanceList.isEmpty().
Practically, it is unlikely that you need to care about the difference between these cases, at least at this point in the code. You might need to care about the list being changed in another thread, however, if it can become empty before (or while) executing createAmountText. But that's not something that was introduced by this refactoring.
TL;DR: using if (!substanceList.isEmpty()) { does practically the same thing, and is clearer to read.
The only difference between the first and the second approach is that the first approach performs a redundant check. nothing else.
Thus, you'd rather avoid the redundant check and go with the second approach.
Actually, you can read the source code downloaded in the JDK:
/**
* Returns <tt>true</tt> if this list contains no elements.
*
* #return <tt>true</tt> if this list contains no elements
*/
public boolean isEmpty() {
return size == 0;
}
I think that this settles all the queries.
Implementation of isEmpty() in AbstractCollection is as follows:
public boolean isEmpty() {
return size() == 0;
}
So you can safely assume that !list.isEmpty() is equivalent to list.size() > 0.
As for "what is better code", if you want to check if the list is empty or not, isEmpty() is definitely more expressive.
Subclasses might also override isEmpty() from AbstractCollection and implement in more efficient manner than size() == 0. So (purely theoretically) isEmpty() might be more efficient.
The javadocs for Collection.size() and Collection.isEmpty() say:
boolean isEmpty()
Returns true if this collection contains no elements.
int size()
Returns the number of elements in this collection
Since "contains no elements" implies that the number of elements in the collection is zero, it follows that list.isEmpty() and list.size() == 0 will evaluate to the same value.
I want to some explanation of above code
The second version is correct. The first version looks like it was written either by an automatic code generator, or a programmer who doesn't really understand Java. There is no good reason to write the code that way.
(Note: if some other thread could be concurrently modifying the list, then both versions are problematic unless there is proper synchronization. If the list operations are not synchronized then may be memory hazards. But in the first version, there is also the possibility of a race condition ... where the list appears be empty and have a non-zero size!)
and want to know second way may be caused some error.
It won't.
Incidentally list.isEmpty() is preferable to list.size() == 0 for a couple of reasons:
It is more concise (fewer characters).
It expresses the intent of your code more precisely.
It may be more efficient. Some collection implementations may need to count the elements in the collection to compute the size. That may be an O(N) operation, and could other undesirable effects. For example, if a collection is a lazy list that only gets reified as you iterate the elements, then calling size() may result in excessive memory use.
Yes, it can be refactored as you did. The issue with both approaches is that you would do the check every time you want to call the method createAmountText on a List. This means you would be repeating the logic, a better way would be to use DRY (don't Repeat Yourself) principle and factor these checks into your method.
So your method's body should encapsulated by this check.
It should look like:
<access modifier> String createAmountText(List substanceList){
if(substanceList != null && !substanceList.isEmpty()){
<The methods logic>
}
return null;
}
Sure - the two methods can be used to express the same thing.
But worth adding here: going for size() > 0 is somehow a more direct violation of the Tell, don't ask principle: you access an "implementation detail", to then make a decision based on that.
In that sense, isEmpty() should be your preferred choice here!
Of course, you are still violating TDA when using isEmpty() - because you are again fetching status from some object to then make a decision on it.
So the really best choice would be to write code that doesn't need at all to make such a query to internal state of your collection to then drive decisions on it. Instead, simply make sure that createAmountText() properly deals with you passing in an empty list! Why should users of this list, or of that method need to care whether the list is empty or not?!
Long story short: maybe that is "over thinking" here - but again: not using these methods would lead you to write less code! And that is always an indication of a good idea.

Thread-safe iteration over a collection

We all know when using Collections.synchronizedXXX (e.g. synchronizedSet()) we get a synchronized "view" of the underlying collection.
However, the document of these wrapper generation methods states that we have to explicitly synchronize on the collection when iterating of the collections using an iterator.
Which option do you choose to solve this problem?
I can only see the following approaches:
Do it as the documentation states: synchronize on the collection
Clone the collection before calling iterator()
Use a collection which iterator is thread-safe (I am only aware of CopyOnWriteArrayList/Set)
And as a bonus question: when using a synchronized view - is the use of foreach/Iterable thread-safe?
You've already answered your bonus question really: no, using an enhanced for loop isn't safe - because it uses an iterator.
As for which is the most appropriate approach - it really depends on how your context:
Are writes very infrequent? If so, CopyOnWriteArrayList may be most appropriate.
Is the collection reasonably small, and the iteration quick? (i.e. you're not doing much work in the loop) If so, synchronizing may well be fine - especially if this doesn't happen too often (i.e. you won't have much contention for the collection).
If you're doing a lot of work and don't want to block other threads working at the same time, the hit of cloning the collection may well be acceptable.
Depends on your access model. If you have low concurrency and frequent writes, 1 will have the best performance. If you have high concurrency with and infrequent writes, 3 will have the best performance. Option 2 is going to perform badly in almost all cases.
foreach calls iterator(), so exactly the same things apply.
You could use one of the newer collections added in Java 5.0 which support concurrent access while iterating. Another approach is to take a copy using toArray which is thread safe (during the copy).
Collection<String> words = ...
// enhanced for loop over an array.
for(String word: words.toArray(new String[0])) {
}
I might be totally off with your requirements, but if you are not aware of them, check out google-collections with "Favor immutability" in mind.
I suggest dropping Collections.synchronizedXXX and handle all locking uniformly in the client code. The basic collections don't support the sort of compound operations useful in threaded code, and even if you use java.util.concurrent.* the code is more difficult. I suggest keeping as much code as possible thread-agnostic. Keep difficult and error-prone thread-safe (if we are very lucky) code to a minimum.
All three of your options will work. Choosing the right one for your situation will depend on what your situation is.
CopyOnWriteArrayList will work if you want a list implementation and you don't mind the underlying storage being copied every time you write. This is pretty good for performance as long as you don't have very big collections.
ConcurrentHashMap or "ConcurrentHashSet" (using Collections.newSetFromMap) will work if you need a Map or Set interface, obviously you don't get random access this way. One great! thing about these two is that they will work well with large data sets - when mutated they just copy little bits of the underlying data storage.
It does depend on the result one needs to achieve cloning/copying/toArray(), new ArrayList(..) and the likes obtain a snapshot and does not lock the the collection.
Using synchronized(collection) and iteration through ensure by the end of the iteration would be no modification, i.e. effectively locking it.
side note:(toArray() is usually preferred with some exceptions when internally it needs to create a temporary ArrayList). Also please note, anything but toArray() should be wrapped in synchronized(collection) as well, provided using Collections.synchronizedXXX.
This Question is rather old (sorry, i am a bit late..) but i still want to add my Answer.
I would choose your second choice (i.e. Clone the collection before calling iterator()) but with a major twist.
Asuming, you want to iterate using iterator, you do not have to coppy the Collection before calling .iterator() and sort of negating (i am using the term "negating" loosely) the idea of the iterator pattern, but you could write a "ThreadSafeIterator".
It would work on the same premise, coppying the Collection, but without letting the iterating class know, that you did just that. Such an Iterator might look like this:
class ThreadSafeIterator<T> implements Iterator<T> {
private final Queue<T> clients;
private T currentElement;
private final Collection<T> source;
AsynchronousIterator(final Collection<T> collection) {
clients = new LinkedList<>(collection);
this.source = collection;
}
#Override
public boolean hasNext() {
return clients.peek() != null;
}
#Override
public T next() {
currentElement = clients.poll();
return currentElement;
}
#Override
public void remove() {
synchronized(source) {
source.remove(currentElement);
}
}
}
Taking this a Step furhter, you might use the Semaphore Class to ensure thread-safety or something. But take the remove method with a grain of salt.
The point is, by using such an Iterator, no one, neither the iterating nor the iterated Class (is that a real word) has to worrie about Thread safety.

Thread safety in java

All,
I started learning Java threads in the past few days and have only read about scenarios where even after using synchronizer methods/blocks, the code/class remains vulnerable to concurrency issues. Can anyone please provide a scenario where synchronized blocks/methods fail ? And, what should be the alternative in these cases to ensure thread safety.
Proper behaviour under concurrent access is a complex topic, and it's not as simple as just slapping synchronized on everything, as now you have to think about how operations might interleave.
For instance, imagine you have a class like a list, and you want to make it threadsafe. So you make all the methods synchronized and continue. Chances are, clients might be using your list in the following way:
int index = ...; // this gets set somewhere, maybe passed in as an argument
// Check that the list has enough elements for this call to make sense
if (list.size() > index)
{
return list.get(index);
}
else
{
return DEFAULT_VALUE;
}
In a single-threaded environment this code is perfectly safe. However, if the list is being accessed (and possibly modified) concurrently, it's possible for the list's size to change after the call to size(), but before the call to get(). So the list could "impossibly" throw an IndexOutOfBoundsException (or similar) in this case, even though the size was checked beforehand.
There's no shortcut of how to fix this - you simply need to think carefully about the use-cases for your class/interface, and ensure that you can actually guarantee them when interleaved with any other valid operations. Often this might require some additional complexity, or simply more specifics in the documentation. If the hypothetical list class specified that it always synchronized on its own monitor, than that specific situation could be fixed as
synchronized(list)
{
if (list.size() > index)
{
return list.get(index);
}
}
but under other synchronization schemes, this would not work. Or it might be too much of a bottleneck. Or forcing the clients to make the multiple calls within the same lexical scope may be an unacceptable constraint. It all depends on what you're trying to achieve, as to how you can make your interface safe, performant and elegant.
Scenario 1 Classic deadlock:
Object Mutex1;
Object Mutex2;
public void method1(){
synchronized(Mutex1){
synchronized(Mutex2){
}
}
}
public void method2(){
synchronized(Mutex2){
synchronized(Mutex1){
}
}
}
Other scenarios include anything with a shared resource even a variable, because one thread could change the variables contents, or even make it point to null without the other thread knowing. Writing to IO has similar issues try writing code to a file using two threads or out to a sockeet.
Very good articles about concurrency and the Java Memory Model can be found at Angelika Langers website
"vulnerable to concurrency issues" is very vague. It would help to know what you have actually read and where. Two things that come to mind:
Just slapping on "synchronized" somewhere does not mean the code is synchronized correctly - it can be very hard to do correctly, and developers frequently miss some problematic scenarios even when they think they're doing it right.
Even if the synchronization correctly prevents non-deterministic changes to the data, you can still run into deadlocks.
Synchronized methods prevent other methods/blocks requiring same monitor from being executed when you execute them.
But if you have 2 methods, lets say int get() and set(int val) and have somewhere else method which does
obj.set(1+obj.get());
and this method runs in two threads, you can end with value increased by one or by two, depending on unpredictable factors.
Therefore you must somehow protect using such methods too (but only if its needed).
btw. use each monitor for as few functions/blocks as possible, so only those who can wrongly influence each other are synchronized.
And try to expose as few as possible methods requiring further protection.

Why should pop() take an argument?

Quick background
I'm a Java developer who's been playing around with C++ in my free/bored time.
Preface
In C++, you often see pop taking an argument by reference:
void pop(Item& removed);
I understand that it is nice to "fill in" the parameter with what you removed. That totally makes sense to me. This way, the person who asked to remove the top item can have a look at what was removed.
However, if I were to do this in Java, I'd do something like this:
Item pop() throws StackException;
This way, after the pop we return either: NULL as a result, an Item, or an exception would be thrown.
My C++ text book shows me the example above, but I see plenty of stack implementations taking no arguments (stl stack for example).
The Question
How should one implement the pop function in C++?
The Bonus
Why?
To answer the question: you should not implement the pop function in C++, since it is already implemented by the STL. The std::stack container adapter provides the method top to get a reference to the top element on the stack, and the method pop to remove the top element. Note that the pop method alone cannot be used to perform both actions, as you asked about.
Why should it be done that way?
Exception safety: Herb Sutter gives a good explanation of the issue in GotW #82.
Single-responsibility principle: also mentioned in GotW #82. top takes care of one responsibility and pop takes care of the other.
Don't pay for what you don't need: For some code, it may suffice to examine the top element and then pop it, without ever making a (potentially expensive) copy of the element. (This is mentioned in the SGI STL documentation.)
Any code that wishes to obtain a copy of the element can do this at no additional expense:
Foo f(s.top());
s.pop();
Also, this discussion may be interesting.
If you were going to implement pop to return the value, it doesn't matter much whether you return by value or write it into an out parameter. Most compilers implement RVO, which will optimize the return-by-value method to be just as efficient as the copy-into-out-parameter method. Just keep in mind that either of these will likely be less efficient than examining the object using top() or front(), since in that case there is absolutely no copying done.
The problem with the Java approach is that its pop() method has at least two effects: removing an element, and returning an element. This violates the single-responsibility principle of software design, which in turn opens door for design complexities and other issues. It also implies a performance penalty.
In the STL way of things the idea is that sometimes when you pop() you're not interested in the item popped. You just want the effect of removing the top element. If the function returns the element and you ignore it then that's a wasted copy.
If you provide two overloads, one which takes a reference and another which doesn't then you allow the user to choose whether he (or she) is interested in the returned element or not. The performance of the call will optimal.
The STL doesn't overload the pop() functions but rather splits these into two functions: back() (or top() in the case of the std::stack adapter) and pop(). The back() function just returns the element, while the pop() function just removes it.
Using C++0x makes the whole thing hard again.
As
stack.pop(item); // move top data to item without copying
makes it possible to efficiently move the top element from the stack. Whereas
item = stack.top(); // make a copy of the top element
stack.pop(); // delete top element
doesn't allow such optimizations.
The only reason I can see for using this syntax in C++:
void pop(Item& removed);
is if you're worried about unnecessary copies taking place.
if you return the Item, it may require an additional copy of the object, which may be expensive.
In reality, C++ compilers are very good at copy elision, and almost always implement return value optimization (often even when you compile with optimizations disabled), which makes the point moot, and may even mean the simple "return by value" version becomes faster in some cases.
But if you're into premature optimization (if you're worried that the compiler might not optimize away the copy, even though in practice it will do it), you might argue for "returning" parameters by assigning to a reference parameter.
More information here
IMO, a good signature for the eqivalent of Java's pop function in C++ would be something like:
boost::optional<Item> pop();
Using option types is the best way to return something that may or may not be available.

Why does Java toString() loop infinitely on indirect cycles?

This is more a gotcha I wanted to share than a question: when printing with toString(), Java will detect direct cycles in a Collection (where the Collection refers to itself), but not indirect cycles (where a Collection refers to another Collection which refers to the first one - or with more steps).
import java.util.*;
public class ShonkyCycle {
static public void main(String[] args) {
List a = new LinkedList();
a.add(a); // direct cycle
System.out.println(a); // works: [(this Collection)]
List b = new LinkedList();
a.add(b);
b.add(a); // indirect cycle
System.out.println(a); // shonky: causes infinite loop!
}
}
This was a real gotcha for me, because it occurred in debugging code to print out the Collection (I was surprised when it caught a direct cycle, so I assumed incorrectly that they had implemented the check in general). There is a question: why?
The explanation I can think of is that it is very inexpensive to check for a collection that refers to itself, as you only need to store the collection (which you have already), but for longer cycles, you need to store all the collections you encounter, starting from the root. Additionally, you might not be able to tell for sure what the root is, and so you'd have to store every collection in the system - which you do anyway - but you'd also have to do a hash lookup on every collection element. It's very expensive for the relatively rare case of cycles (in most programming). (I think) the only reason it checks for direct cycles is because it so cheap (one reference comparison).
OK... I've kinda answered my own question - but have I missed anything important? Anyone want to add anything?
Clarification: I now realize the problem I saw is specific to printing a Collection (i.e. the toString() method). There's no problem with cycles per se (I use them myself and need to have them); the problem is that Java can't print them. Edit Andrzej Doyle points out it's not just collections, but any object whose toString is called.
Given that it's constrained to this method, here's an algorithm to check for it:
the root is the object that the first toString() is invoked on (to determine this, you need to maintain state on whether a toString is currently in progress or not; so this is inconvenient).
as you traverse each object, you add it to an IdentityHashMap, along with a unique identifier (e.g. an incremented index).
but if this object is already in the Map, write out its identifier instead.
This approach also correctly renders multirefs (a node that is referred to more than once).
The memory cost is the IdentityHashMap (one reference and index per object); the complexity cost is a hash lookup for every node in the directed graph (i.e. each object that is printed).
I think fundamentally it's because while the language tries to stop you from shooting yourself in the foot, it shouldn't really do so in a way that's expensive. So while it's almost free to compare object pointers (e.g. does obj == this) anything beyond that involves invoking methods on the object you're passing in.
And at this point the library code doesn't know anything about the objects you're passing in. For one, the generics implementation doesn't know if they're instances of Collection (or Iterable) themselves, and while it could find this out via instanceof, who's to say whether it's a "collection-like" object that isn't actually a collection, but still contains a deferred circular reference? Secondly, even if it is a collection there's no telling what it's actual implementation and thus behaviour is like. Theoretically one could have a collection containing all the Longs which is going to be used lazily; but since the library doesn't know this it would be hideously expensive to iterate over every entry. Or in fact one could even design a collection with an Iterator that never terminated (though this would be difficult to use in practice because so many constructs/library classes assume that hasNext will eventually return false).
So it basically comes down to an unknown, possibly infinite cost in order to stop you from doing something that might not actually be an issue anyway.
I'd just like to point out that this statement:
when printing with toString(), Java will detect direct cycles in a collection
is misleading.
Java (the JVM, the language itself, etc) is not detecting the self-reference. Rather this is a property of the toString() method/override of java.util.AbstractCollection.
If you were to create your own Collection implementation, the language/platform wouldn't automatically safe you from a self-reference like this - unless you extend AbstractCollection, you would have to make sure you cover this logic yourself.
I might be splitting hairs here but I think this is an important distinction to make. Just because one of the foundation classes in the JDK does something doesn't mean that "Java" as an overall umbrella does it.
Here is the relevant source code in AbstractCollection.toString(), with the key line commented:
public String toString() {
Iterator<E> i = iterator();
if (! i.hasNext())
return "[]";
StringBuilder sb = new StringBuilder();
sb.append('[');
for (;;) {
E e = i.next();
// self-reference check:
sb.append(e == this ? "(this Collection)" : e);
if (! i.hasNext())
return sb.append(']').toString();
sb.append(", ");
}
}
The problem with the algorithm that you propose is that you need to pass the IdentityHashMap to all Collections involved. This is not possible using the published Collection APIs. The Collection interface does not define a toString(IdentityHashMap) method.
I imagine that whoever at Sun put the self reference check into the AbstractCollection.toString() method thought of all of this, and (in conjunction with his colleagues) decided that a "total solution" is over the top. I think that the current design / implementation is correct.
It is not a requirement that Object.toString implementations be bomb-proof.
You are right, you already answered your own question. Checking for longer cycles (especially really long ones like period length 1000) would be too much overhead and is not needed in most cases. If someone wants it, he has to check it himself.
The direct cycle case, however, is easy to check and will occur more often, so it's done by Java.
You can't really detect indirect cycles; it's a typical example of the halting problem.

Categories