Inconsistency in Java Collection APIs

Inconsistency in Java Collection APIs - java

First of all, I am unable to find an interface for Stack data structure. It exists, but the one I find extends a Vector which I would try to avoid using. So, if you really need a stack, would you recommend me implementing my own stack class that has-a a ArrayDeque internally or would you recommend me using the Stack class that extends the Vector? I am very disappointed that a good Stack interface is non-existent in Java.
Secondly, Queue provides, add(e), remove() and element() methods. On top of that, they also provide offer(e), poll() and peek() methods. The former throws exception, while the latter returns true or false or null. Which one would you use if the Queue you want to use is for a non-concurrent case?

To answer your first "question:" Is there a drop-in replacement for Java Stack that is not synchronized?
And the second question: (I hate to have to say it, but) RTFD. Seriously.
public interface Queue<E> extends Collection<E>
... Each of these methods exists in two forms: one throws an exception if the operation fails, the other returns a special value (either null or false, depending on the operation). The latter form of the insert operation is designed specifically for use with capacity-restricted Queue implementations; in most implementations, insert operations cannot fail.
http://download.oracle.com/javase/7/docs/api/java/util/Queue.html
Neither set of methods has anything to do with concurrency. They simply allow you to choose between two programming styles (and hopefully, you're consistent!): do you want to have to check return values, or catch exceptions?

Deque<E> Q = new LinkedList<E>();
Q.add(e);
Q.remove();
Q.element();
Deque<E> S = new LinkedList<E>();
S.push(e);
S.pop();
S.peek();
I think I would rather use these methods in the common scenarios.
And when I have to worry about the success of the operation at runtime (example, concurrent case), I will go for the offer(E e, TimeUnit timeout) and poll(TimeUnit timeout) usage.

Try the Apache Commons Collections API if the JDK collections API doesn't exactly meet your needs
From Apache ArrayStack documentation which seems to match your requirements:
An implementation of the Stack API that is based on an ArrayList instead of a Vector, so it is not synchronized to protect against multi-threaded access. The implementation is therefore operates faster in environments where you do not need to worry about multiple thread contention.

Related

The add(E e) method in Collection<E>&Queue<E>&BlockingQueue<E> interfaces

Question 1:
When reading JDK source codes I found that method boolean add(E e); defined in interface Collection<E>&Queue<E>&BlockingQueue<E> .
I cannot understand this. In my understanding if a super interface has defined a method, then a sub interface extending this super interface need not to define this method again.
So why has this method been defined three times?
Question 2:
Also I noticed that unlike the boolean add(E e); method which has been claimed in interface Queue<E> and then re-claimed in interface BlockingQueue<E>, the E poll(); method has been only claimed in interface Queue<E> and has not been re-claimed in interface BlockingQueue<E>.
So why are they treated differently?

Question 1: When reading JDK source codes I found boolean add(E e); defined in interface Collection&Queue&BlockingQueue .
I cannot understand this. In my understanding if a super interface has defined a method, then a sub interface extending this super interface need not to define this method again.
Yes, you don't need to redefine it. I'd guess it's probable because you'd need to repeat the declaration in order to add new JavaDoc comments, i.e. each of the interfaces has a specific description of what add(E) etc. do.
Question 2: also I noticed that unlike the boolean add(E e); method which has been claimed in interface Queue and then re-claimed in interface BlockingQueue, the E poll(); method has been only claimed in interface Queue and has not been re-claimed in interface BlockingQueue.
So why they are treated differently?
It's probably the same as above. poll(E) works the same way for general queues and blocking ones, hence no need for a different documentation.

The difference here is only on the documentation. The E poll() method signature was not mentioned in the BlockingQueue because the contract is the same for both Queue and BlockingQueue and the documentation for both explains the behaviour of the 2 data structure to be the same.
If you look at the documentation of Queue.add() and BlockingQueue.add(), you will see an additional sentence added on BlockingQueue.add() method.
Queue.add():
Inserts the specified element into this queue if it is possible to do
so immediately without violating capacity restrictions, returning true
upon success and throwing an IllegalStateException if no space is
currently available.
BlockingQueue.add():
Inserts the specified element into this queue if it is possible to do
so immediately without violating capacity restrictions, returning true
upon success and throwing an IllegalStateException if no space is
currently available. When using a capacity-restricted queue, it is
generally preferable to use offer.
This is to help developers when using the relevant object.
Technically, the add() method are the same so BlockingQueue never needed to explicitly specify that method signature.

I believe this is more to do with defining the behavior of what is expected from the method to behave. Technically, there is no need to defined add in the Queue or Blocking Queue but since the behaviour of add can be different depending on whether Collection is a Set or Queue, it's much better to re-define it so that people can understand what exactly is expected from Queue add method.
So, as long as behavior is not supposed to change or enhance, there is no need to declare the method in sub-interface but if you expect the behavior to be specific than you should go ahead and re-declare it.

Difference between add() and offer() methods of Queue interface

I was going though FIFO implementation in Java and came across this java.util.Queue interface. Dequeue implements it which in turn is implemented by Linked List.
I wrote the following code
public class FIFOTest {
public static void main(String args[]){
Queue<String> myQueue = new LinkedList<String>();
myQueue.add("US");
myQueue.offer("Canada");
for(String element : myQueue){
System.out.println("Element : " + element);
}
}
}
Both seem to do the same thing. Add data to the head of the queue. What is the difference between these two methods? Any special cases in which either would be more beneficial than other?

LinkedList#offer(E) is implemented as
public boolean offer(E e) {
return add(e);
}
In this case, they are the same thing. They are just needed to satisfy the interfaces. LinkedList implements Deque and List. The LinkedList#add(E) method will not throw an Exception as it will always take more elements, but in another Queue implementation that has limited capacity or takes only certain kinds of elements, add(E) might throw an exception while offer(E) will simply return false.

According to the docs the main difference is that when the operation fails, one (add) throws an exception and the other (offer) returns a special value (false):
Each of these methods exists in two forms: one throws an exception if the operation fails, the other returns a special value (either null or false, depending on the operation). The latter form of the insert operation is designed specifically for use with capacity-restricted Queue implementations; in most implementations, insert operations cannot fail.

What is the difference between these two methods?
Queue.add - throws an exception if the operation fails,
Queue.offer- returns a special value (either null or false, depending on the operation).
Any special cases in which either would be more beneficial than other?
According to docs, The Queue.offer form of the insert operation is designed specifically for
use with capacity-restricted Queue implementations; in most
implementations, insert operations cannot fail.
For details, read this docs.

add() comes from Collection Interface.
offer() comes from Queue Interface.
The Documentation of offer() method of Queue says
Inserts the specified element into this queue if it is possible to do
so immediately without violating capacity restrictions.
When using a capacity-restricted queue, this method is generally
preferable to {#link #add}, which can fail to insert an element only
by throwing an exception.
The Documentation of add() method of Queue says
Inserts the specified element into this queue if it is possible to do so
immediately without violating capacity restrictions, returning
<tt>true</tt> upon success and throwing an <tt>IllegalStateException</tt>
if no space is currently available.

why does iterator not enforce hasnext call?

It seems like quite a good method if hasNext and Next worked like this:
boolean hasNextCalled = false;
boolean hasNext() {
hasNextCalled = true
}
next() {
assert hasNextCalled
}
This way we would never land up in a case where we would get NoSuchElementException().
Any practical reason why hasNext() call is not enforced ?

What would be the benefit? You're simply replacing a NoSuchElementException with an AssertionError, plus introducing a tiny bit of overhead. Also, since Iterator is an interface, you couldn't implement this once; it would have to go in every implementation of Iterator. Plus the documentation doesn't impose a requirement to call hasNext before calling next, so your proposal would break the current contract. Such a change would break any code that was written to rely on a NoSuchElementException. Finally, assertions can be turned off in production code, so you would still need the NoSuchElementException mechanism.

NoSuchElementException is a runtime exception, and reflects programmer error...exactly like your approach does. It's not obligatory to call hasNext() because maybe you don't need to -- you know the size of the collection in advance, for example, and know how many calls to next() you can make.
The point is that you're exchanging one way of reporting programmer error for...another way of reporting programmer error that can disable some useful approaches.

Maybe we already know that there are elements left. For example, maybe we're iterating over two equally-sized lists in lockstep, and we only need to call hasNext on one iterator to check for both. Also, asserting the hasNext call doesn't actually prevent anyone from calling next without hasNext, especially if assertions are off.

You may know there's a next(), for example if you always have pairs of elements, one call to hasNext() will allow two calls to next().

My 2 cents. Here the API makes an assumption about the client usage and forces that. Let us say I am always sure that I get back only 1 result then it is better to by pass the hasNext() call and directly retrieve the element by calling just next()

Firstly, the assert you suggest would only run when assertions are enabled.
But the main issue is that you consider only one use-case. The library is designed to support programmers in their work with minimal restrictions, and each class and method has to fit into a cohesive and coherent whole.
Other posters give good reasons also (as I was typing), especially that Iterator is an Interface, and has many implementations.

Thread-safe iteration over a collection

We all know when using Collections.synchronizedXXX (e.g. synchronizedSet()) we get a synchronized "view" of the underlying collection.
However, the document of these wrapper generation methods states that we have to explicitly synchronize on the collection when iterating of the collections using an iterator.
Which option do you choose to solve this problem?
I can only see the following approaches:
Do it as the documentation states: synchronize on the collection
Clone the collection before calling iterator()
Use a collection which iterator is thread-safe (I am only aware of CopyOnWriteArrayList/Set)
And as a bonus question: when using a synchronized view - is the use of foreach/Iterable thread-safe?

You've already answered your bonus question really: no, using an enhanced for loop isn't safe - because it uses an iterator.
As for which is the most appropriate approach - it really depends on how your context:
Are writes very infrequent? If so, CopyOnWriteArrayList may be most appropriate.
Is the collection reasonably small, and the iteration quick? (i.e. you're not doing much work in the loop) If so, synchronizing may well be fine - especially if this doesn't happen too often (i.e. you won't have much contention for the collection).
If you're doing a lot of work and don't want to block other threads working at the same time, the hit of cloning the collection may well be acceptable.

Depends on your access model. If you have low concurrency and frequent writes, 1 will have the best performance. If you have high concurrency with and infrequent writes, 3 will have the best performance. Option 2 is going to perform badly in almost all cases.
foreach calls iterator(), so exactly the same things apply.

You could use one of the newer collections added in Java 5.0 which support concurrent access while iterating. Another approach is to take a copy using toArray which is thread safe (during the copy).
Collection<String> words = ...
// enhanced for loop over an array.
for(String word: words.toArray(new String[0])) {
}

I might be totally off with your requirements, but if you are not aware of them, check out google-collections with "Favor immutability" in mind.

I suggest dropping Collections.synchronizedXXX and handle all locking uniformly in the client code. The basic collections don't support the sort of compound operations useful in threaded code, and even if you use java.util.concurrent.* the code is more difficult. I suggest keeping as much code as possible thread-agnostic. Keep difficult and error-prone thread-safe (if we are very lucky) code to a minimum.

All three of your options will work. Choosing the right one for your situation will depend on what your situation is.
CopyOnWriteArrayList will work if you want a list implementation and you don't mind the underlying storage being copied every time you write. This is pretty good for performance as long as you don't have very big collections.
ConcurrentHashMap or "ConcurrentHashSet" (using Collections.newSetFromMap) will work if you need a Map or Set interface, obviously you don't get random access this way. One great! thing about these two is that they will work well with large data sets - when mutated they just copy little bits of the underlying data storage.

It does depend on the result one needs to achieve cloning/copying/toArray(), new ArrayList(..) and the likes obtain a snapshot and does not lock the the collection.
Using synchronized(collection) and iteration through ensure by the end of the iteration would be no modification, i.e. effectively locking it.
side note:(toArray() is usually preferred with some exceptions when internally it needs to create a temporary ArrayList). Also please note, anything but toArray() should be wrapped in synchronized(collection) as well, provided using Collections.synchronizedXXX.

This Question is rather old (sorry, i am a bit late..) but i still want to add my Answer.
I would choose your second choice (i.e. Clone the collection before calling iterator()) but with a major twist.
Asuming, you want to iterate using iterator, you do not have to coppy the Collection before calling .iterator() and sort of negating (i am using the term "negating" loosely) the idea of the iterator pattern, but you could write a "ThreadSafeIterator".
It would work on the same premise, coppying the Collection, but without letting the iterating class know, that you did just that. Such an Iterator might look like this:
class ThreadSafeIterator<T> implements Iterator<T> {
private final Queue<T> clients;
private T currentElement;
private final Collection<T> source;
AsynchronousIterator(final Collection<T> collection) {
clients = new LinkedList<>(collection);
this.source = collection;
}
#Override
public boolean hasNext() {
return clients.peek() != null;
}
#Override
public T next() {
currentElement = clients.poll();
return currentElement;
}
#Override
public void remove() {
synchronized(source) {
source.remove(currentElement);
}
}
}
Taking this a Step furhter, you might use the Semaphore Class to ensure thread-safety or something. But take the remove method with a grain of salt.
The point is, by using such an Iterator, no one, neither the iterating nor the iterated Class (is that a real word) has to worrie about Thread safety.

Why does Java toString() loop infinitely on indirect cycles?

This is more a gotcha I wanted to share than a question: when printing with toString(), Java will detect direct cycles in a Collection (where the Collection refers to itself), but not indirect cycles (where a Collection refers to another Collection which refers to the first one - or with more steps).
import java.util.*;
public class ShonkyCycle {
static public void main(String[] args) {
List a = new LinkedList();
a.add(a); // direct cycle
System.out.println(a); // works: [(this Collection)]
List b = new LinkedList();
a.add(b);
b.add(a); // indirect cycle
System.out.println(a); // shonky: causes infinite loop!
}
}
This was a real gotcha for me, because it occurred in debugging code to print out the Collection (I was surprised when it caught a direct cycle, so I assumed incorrectly that they had implemented the check in general). There is a question: why?
The explanation I can think of is that it is very inexpensive to check for a collection that refers to itself, as you only need to store the collection (which you have already), but for longer cycles, you need to store all the collections you encounter, starting from the root. Additionally, you might not be able to tell for sure what the root is, and so you'd have to store every collection in the system - which you do anyway - but you'd also have to do a hash lookup on every collection element. It's very expensive for the relatively rare case of cycles (in most programming). (I think) the only reason it checks for direct cycles is because it so cheap (one reference comparison).
OK... I've kinda answered my own question - but have I missed anything important? Anyone want to add anything?
Clarification: I now realize the problem I saw is specific to printing a Collection (i.e. the toString() method). There's no problem with cycles per se (I use them myself and need to have them); the problem is that Java can't print them. Edit Andrzej Doyle points out it's not just collections, but any object whose toString is called.
Given that it's constrained to this method, here's an algorithm to check for it:
the root is the object that the first toString() is invoked on (to determine this, you need to maintain state on whether a toString is currently in progress or not; so this is inconvenient).
as you traverse each object, you add it to an IdentityHashMap, along with a unique identifier (e.g. an incremented index).
but if this object is already in the Map, write out its identifier instead.
This approach also correctly renders multirefs (a node that is referred to more than once).
The memory cost is the IdentityHashMap (one reference and index per object); the complexity cost is a hash lookup for every node in the directed graph (i.e. each object that is printed).

I think fundamentally it's because while the language tries to stop you from shooting yourself in the foot, it shouldn't really do so in a way that's expensive. So while it's almost free to compare object pointers (e.g. does obj == this) anything beyond that involves invoking methods on the object you're passing in.
And at this point the library code doesn't know anything about the objects you're passing in. For one, the generics implementation doesn't know if they're instances of Collection (or Iterable) themselves, and while it could find this out via instanceof, who's to say whether it's a "collection-like" object that isn't actually a collection, but still contains a deferred circular reference? Secondly, even if it is a collection there's no telling what it's actual implementation and thus behaviour is like. Theoretically one could have a collection containing all the Longs which is going to be used lazily; but since the library doesn't know this it would be hideously expensive to iterate over every entry. Or in fact one could even design a collection with an Iterator that never terminated (though this would be difficult to use in practice because so many constructs/library classes assume that hasNext will eventually return false).
So it basically comes down to an unknown, possibly infinite cost in order to stop you from doing something that might not actually be an issue anyway.

I'd just like to point out that this statement:
when printing with toString(), Java will detect direct cycles in a collection
is misleading.
Java (the JVM, the language itself, etc) is not detecting the self-reference. Rather this is a property of the toString() method/override of java.util.AbstractCollection.
If you were to create your own Collection implementation, the language/platform wouldn't automatically safe you from a self-reference like this - unless you extend AbstractCollection, you would have to make sure you cover this logic yourself.
I might be splitting hairs here but I think this is an important distinction to make. Just because one of the foundation classes in the JDK does something doesn't mean that "Java" as an overall umbrella does it.
Here is the relevant source code in AbstractCollection.toString(), with the key line commented:
public String toString() {
Iterator<E> i = iterator();
if (! i.hasNext())
return "[]";
StringBuilder sb = new StringBuilder();
sb.append('[');
for (;;) {
E e = i.next();
// self-reference check:
sb.append(e == this ? "(this Collection)" : e);
if (! i.hasNext())
return sb.append(']').toString();
sb.append(", ");
}
}

The problem with the algorithm that you propose is that you need to pass the IdentityHashMap to all Collections involved. This is not possible using the published Collection APIs. The Collection interface does not define a toString(IdentityHashMap) method.
I imagine that whoever at Sun put the self reference check into the AbstractCollection.toString() method thought of all of this, and (in conjunction with his colleagues) decided that a "total solution" is over the top. I think that the current design / implementation is correct.
It is not a requirement that Object.toString implementations be bomb-proof.

You are right, you already answered your own question. Checking for longer cycles (especially really long ones like period length 1000) would be too much overhead and is not needed in most cases. If someone wants it, he has to check it himself.
The direct cycle case, however, is easy to check and will occur more often, so it's done by Java.

You can't really detect indirect cycles; it's a typical example of the halting problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Inconsistency in Java Collection APIs - java

Related

The add(E e) method in Collection<E>&Queue<E>&BlockingQueue<E> interfaces

Difference between add() and offer() methods of Queue interface

why does iterator not enforce hasnext call?

Thread-safe iteration over a collection

Why does Java toString() loop infinitely on indirect cycles?

Categories

Resources