Documentation of BlockingQueue says bulk operations are not thread-safe, though it doesn't explicitly mention the method drainTo().
BlockingQueue implementations are
thread-safe. All queuing methods
achieve their effects atomically using
internal locks or other forms of
concurrency control. However, the bulk
Collection operations addAll,
containsAll, retainAll and removeAll
are not necessarily performed
atomically unless specified otherwise
in an implementation. So it is
possible, for example, for addAll(c)
to fail (throwing an exception) after
adding only some of the elements in c.
Documentation of drainTo() method specifies that the collection, to which the elements of BlockingQueue are drained to, cannot be modified in thread-safe fashion. But, it doesn't mention anything about drainTo() operation being thread-safe.
Removes all available elements from
this queue and adds them to the given
collection. This operation may be more
efficient than repeatedly polling this
queue. A failure encountered while
attempting to add elements to
collection c may result in elements
being in neither, either or both
collections when the associated
exception is thrown. Attempts to drain
a queue to itself result in
IllegalArgumentException. Further, the
behavior of this operation is
undefined if the specified collection
is modified while the operation is in
progress.
So, is drainTo() method thread-safe? In other words, if one thread has invoked drainTo() method on a blocking queue and other one is calling add() or put() on the same queue, is the queue's state consistent at the end of both the operations?
I think you are confusing the terms "thread-safe" and "atomic". They do not mean the same thing. A method can be thread-safe without being atomic, and can be atomic (for a single thread) without being thread-safe.
Thread-safe is a rubbery term that is hard to define without being circular. According to Goetz, a good working model is that a method is thread-safe if it is "as correct" when used in a multi-threaded context as it is run in a single-threaded context. The rubberyness is in the fact that correctness is subjective unless you have a formal specification to measure against.
By contrast, atomic is easy to define. It simply means that the operation either happens completely or it doesn't happen at all.
So the answer to your question is that drainTo() is thread-safe, but not atomic. It is not atomic because it could throw an exception half way through draining. However, modulo that, the queue will still be in a consistent state, whether or not other threads were doing things to the queue at the same time.
(It is implicit in the above discussion that the specific implementation of the BlockingQueue interface implements the interface correctly. If it doesn't, all bets are off.)
drainTo() is thread safe in the sense that any operation on the queue that happens at the same time will not change the result nor will it corrupt the state of the queue. Otherwise, the method would be pretty pointless.
You could run into problems if the target collection (the one to which the results are added) does something "clever". But since you usually drain the queue to a collection to which only a single thread has access, it's more of a theoretical problem.
stumbled upon this question and felt like adding an implementation info.
From Java 8 source of PriorityBlockingQueue :
/**
* #throws UnsupportedOperationException {#inheritDoc}
* #throws ClassCastException {#inheritDoc}
* #throws NullPointerException {#inheritDoc}
* #throws IllegalArgumentException {#inheritDoc}
*/
public int drainTo(Collection<? super E> c, int maxElements) {
if (c == null)
throw new NullPointerException();
if (c == this)
throw new IllegalArgumentException();
if (maxElements <= 0)
return 0;
final ReentrantLock lock = this.lock;
lock.lock();
try {
int n = Math.min(size, maxElements);
for (int i = 0; i < n; i++) {
c.add((E) queue[0]); // In this order, in case add() throws.
dequeue();
}
return n;
} finally {
lock.unlock();
}
}
You can see that a ReentrantLock is used to lock the critical section. The methods poll() and offer() are also using the same lock. So the BlockingQueue implementation in this case of PriorityBlockingQueue is indeed Blocking!
Related
I came across the example below of a Java class which was claimed to be thread-safe. Could anyone please explain how it could be thread-safe? I can clearly see that the last method in the class is not being guarded against concurrent access of any reader thread. Or, am I missing something here?
public class Account {
private Lock lock = new ReentrantLock();
private int value = 0;
public void increment() {
lock.lock();
value++;
lock.unlock();
}
public void decrement() {
lock.lock();
value--;
lock.unlock();
}
public int getValue() {
return value;
}
}
The code is not thread-safe.
Suppose that one thread calls decrement and then a second thread calls getValue. What happens?
The problem is that there is no "happens before" relationship between the decrement and the getValue. That means that there is no guarantee, that the getValue call will see the results of the decrement. Indeed, the getValue could "miss" the results of an indefinite sequence of increment and decrement calls.
Actually, unless we see the code that uses the Account class, the question of thread-safety is ill-defined. The conventional notion of thread-safety1 of a program is about whether the code behaves correctly irrespective of thread-related non-determinacy. In this case, we don't have a specification of what "correct" behaviour is, or indeed an executable program to test or examine.
But my reading of the code2 is that there is an implied API requirement / correctness criterion that getValue returns the current value of the account. That cannot be guaranteed if there are multiple threads, therefore the class is not thread-safe.
Related links:
http://blogs.msdn.com/b/ericlippert/archive/2009/10/19/what-is-this-thing-you-call-thread-safe.aspx
1 - The Concurrency in Practice quote in #CKing's answer is also appealing to a notion of "correctness" by mentioning "invalid state" in the definition. However, the JLS sections on the memory model don't specify thread-safety. Instead, they talk about "well-formed executions".
2 - This reading is supported by the OP's comment below. However, if you don't accept that this requirement is real (e.g. because it is not stated explicitly), then the flip-side is that behaviour of the "account" abstraction depends on how code outside of the Account class ... which makes this a "leaky abstraction".
This is not thread safe purely due to the fact there is no guarantees about how the compiler can re-order. Since value is not volatile here is your classic example:
while(account.getValue() != 0){
}
This can be hoisted to look like
while(true){
if(account.getValue() != 0){
} else {
break;
}
}
I can imagine there are other permutations of compiler fun which can cause this to subtly fail. But accessing this getValue via multiple threads can result in failure.
There are several distinct issues here:
Q: If multiple threads make overlapped calls to increment() and decrement(), and then they stop, and then enough time passes with no threads calling increment() or decrement(), will getValue() return the correct number?
A: Yes. The locking in the increment and decrement methods insures that each increment and decrement operation will happen atomically. They can not interfere with one another.
Q: How long is enough time?
A: That's hard to say. The Java language specification does not guarantee that a thread calling getValue() will ever see the latest value written by some other thread because getValue() accesses the value without any synchronization at all.
If you change getValue() to lock and unlock the same lock object or if you declare count to be volatile, then zero amount of time would be enough.
Q: Can a call to getValue() return an invalid value?
A: No, It can only ever return the initial value, or the result of complete increment() call or the result of a complete decrement() operation.
But, the reason for this has nothing to do with the lock. The lock does not prevent any thread from calling getValue() while some other thread is in the middle of incrementing or decrementing the value.
The thing that prevents getValue() from returning a completely invalid value is that value is an int, and the JLS guarantees that updates and reads of int variables are always atomic.
The short answer :
By definition,Account is a thread-safe class even though the geValue method is not guarded
The long answer
From Java Concurrency in practice a class is said to be thread safe when :
No set of operations performed sequentially or concurrently on
instances of a thread-safe class can cause an instance to be in an
invalid state.
Since the the getValue method will not result in the Account class being in an invalid state at any given time, your class is said to be thread safe.
The documentation for Collections#synchronizedCollection resonates this sentiment :
Returns a synchronized (thread-safe) collection backed by the
specified collection. In order to guarantee serial access, it is
critical that all access to the backing collection is accomplished
through the returned collection. It is imperative that the user
manually synchronize on the returned collection when iterating over
it:
Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized (c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
Notice how the documentation says that the collection (which is an object of an inner class named SynchronizedCollection in the Collections class) is thread-safe and yet asks the client code to guard the collection while iterating over it. Infact, the iterator method in SynchronizedCollection is not synchronized. This is very similar to your example where Account is thread-safe but client code still needs to ensure atomicity when calling getValue.
It's completely thread safe.
Nobody can simultaneously increment and decrement value so you won't lose or gain a count in error.
The fact that getValue() will return different values through time is something that will happen anyway: simultaneity is not relevant.
You do not have to protect getValue. Accessing it from multiple threads at the same time does not lead to any negative effects. The object state cannot become invalid no matter when or from how many threads you call this methid (because it does not change).
Having said that - you can write a non-thread-safe code that uses this class.
For example something like
if (acc.getValue()>0) acc.decrement();
is potentially dangerous because it can lead to race conditions. Why?
Let's say you have a business rule "never decrement below 0", your current value is 1, and there are two threads executing this code. There's a chance that they'll do it in the following order:
Thread 1 checks that acc.getValue is >0. Yes!
Thread 2 that acc.getValue is >0. Yes!
Thread 1 calls decrement. value is 0
Thread 2 calls decrement. value is now -1
What happened? Each function made sure it was not going below zero, but together they managed to do that. This is called race condition.
To avoid this you must not protect the elementary operations, but rather any pieces of code that must be executed uninterrupted.
So, this class is thread-safe but only for very limited use.
sound like a silly question. I just started Java Concurrency.
I have a LinkedList that acts as a task queue and is accessed by multiple threads. They removeFirst() and execute it, other threads put more tasks (.add()). Tasks can have the thread put them back to the queue.
I notice that when there are a lot of tasks and they are put back to the queue a lot, the number of tasks I add to the queue initially are not what come out, 1, or sometimes 2 is missing.
I checked everything and I synchronized every critical section + notifyAll().
Already mark the LinkedList as 'volatile'.
Exact number is 384 tasks, each is put back 3072 times.
The problem doesn't occur if there is a small number of tasks & put back. Also if I System.out.println() all the steps then it doesn't happens anymore so I can't debug.
Could it be possible that LinkedList.add() is not fast enough so the threads somehow miss it?
Simplified code:
public void callByAllThreads() {
Task executedTask = null;
do
{
// access by multiple thread
synchronized(asyncQueue) {
executedTask = asyncQueue.poll();
if(executedTask == null) {
inProcessCount.incrementAndGet(); // mark that there is some processing going on
}
}
if(executedTask != null) {
executedTask.callMethod(); // subclass of task can override this method
synchronized(asyncQueue) {
inProcessCount.decrementAndGet();
asyncQueue.notifyAll();
}
}
}
while(executedTask != null);
}
The Task can override callMethod:
public void callMethodOverride() {
synchronized(getAsyncQueue()) {
getAsyncQueue().add(this);
getAsyncQueue().notifyAll();
}
}
From the docs for LinkedList:
Note that this implementation is not synchronized. If multiple threads access a linked list concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
i.e. you should synchronize access to the list. You say you are, but if you are seeing items get "lost" then you probably aren't synchronizing properly. Instead of trying to do that, you could use a framework class that does it for you ...
... If you are always removing the next available (first) item (effectively a producer/consumer implementation) then you could use a BlockingQueue implementation, This is guaranteed to be thread safe, and has the advantage of blocking the consumer until an item is available. An example is the ArrayBlockingQueue.
For non-blocking thread-safe queues you can look at ConcurrentLinkedQueue
Marking the list instance variable volatile has nothing to do with your list being synchronized for mutation methods like add or removeFirst. volatile is simply to do with ensuring that read/write for that instance variable is communicated correctly between, and ordered correctly within, threads. Note I said that variable, not the contents of that variable (see the Java Tutorials > Atomic Access)
LinkedList is definitely not thread safe; you cannot use it safely with multiple threads. It's not a question of "fast enough," it's a question of changes made by one thread being visible to other threads. Marking it volatile doesn't help; that only affects references to the LinkedList being changed, not changes to the contents of the LinkedList.
Consider ConcurrentLinkedQueue or ConcurrentLinkedDeque.
LinkedList is not thread safe, so yes, multiple threads accessing it simultaneously will lead to problems. Synchronizing critical sections can solve this, but as you are still having problems you probably made a mistake somewhere. Try wrapping it in a Collections.synchronizedList() to synchronize all method calls.
Linked list is not thread safe , you can use ConcurrentLinkedQueue if it fits your need,which seems possibly can.
As documentation says
An unbounded thread-safe queue based on linked nodes. This queue
orders elements FIFO (first-in-first-out). The head of the queue is
that element that has been on the queue the longest time. The tail of
the queue is that element that has been on the queue the shortest
time. New elements are inserted at the tail of the queue, and the
queue retrieval operations obtain elements at the head of the queue. A
ConcurrentLinkedQueue is an appropriate choice when many threads will
share access to a common collection. This queue does not permit null
elements.
You increment your inProcessCount when executedTask == null which is obviously the opposite of what you want to do. So it’s no wonder that it will have inconsistent values.
But there are other issues as well. You call notifyAll() at several places but as long as there is no one calling wait() that has no use.
Note further that if you access an integer variable consistently from inside synchronized blocks only throughout the code, there is no need to make it an AtomicInteger. On the other hand, if you use it, e.g. because it will be accessed at other places without additional synchronization, you can move the code updating the AtomicInteger outside the synchronized block.
Also, a method which calls a method like getAsyncQueue() three times looks suspicious to a reader. Just call it once and remember the result in a local variable, then everone can be confident that it is the same reference on all three uses. Generally, you have to ensure that all code is using the same list, hence the appropriate modifier for the variable holding it is final, not volatile.
I was going though FIFO implementation in Java and came across this java.util.Queue interface. Dequeue implements it which in turn is implemented by Linked List.
I wrote the following code
public class FIFOTest {
public static void main(String args[]){
Queue<String> myQueue = new LinkedList<String>();
myQueue.add("US");
myQueue.offer("Canada");
for(String element : myQueue){
System.out.println("Element : " + element);
}
}
}
Both seem to do the same thing. Add data to the head of the queue. What is the difference between these two methods? Any special cases in which either would be more beneficial than other?
LinkedList#offer(E) is implemented as
public boolean offer(E e) {
return add(e);
}
In this case, they are the same thing. They are just needed to satisfy the interfaces. LinkedList implements Deque and List. The LinkedList#add(E) method will not throw an Exception as it will always take more elements, but in another Queue implementation that has limited capacity or takes only certain kinds of elements, add(E) might throw an exception while offer(E) will simply return false.
According to the docs the main difference is that when the operation fails, one (add) throws an exception and the other (offer) returns a special value (false):
Each of these methods exists in two forms: one throws an exception if the operation fails, the other returns a special value (either null or false, depending on the operation). The latter form of the insert operation is designed specifically for use with capacity-restricted Queue implementations; in most implementations, insert operations cannot fail.
What is the difference between these two methods?
Queue.add - throws an exception if the operation fails,
Queue.offer- returns a special value (either null or false, depending on the operation).
Any special cases in which either would be more beneficial than other?
According to docs, The Queue.offer form of the insert operation is designed specifically for
use with capacity-restricted Queue implementations; in most
implementations, insert operations cannot fail.
For details, read this docs.
add() comes from Collection Interface.
offer() comes from Queue Interface.
The Documentation of offer() method of Queue says
Inserts the specified element into this queue if it is possible to do
so immediately without violating capacity restrictions.
When using a capacity-restricted queue, this method is generally
preferable to {#link #add}, which can fail to insert an element only
by throwing an exception.
The Documentation of add() method of Queue says
Inserts the specified element into this queue if it is possible to do so
immediately without violating capacity restrictions, returning
<tt>true</tt> upon success and throwing an <tt>IllegalStateException</tt>
if no space is currently available.
I've taken a look into OpenJDK source code of CopyOnWriteArrayList and it seems that all write operations are protected by the same lock and read operations are not protected at all. As I understand, under JMM all accesses to a variable (both read and write) should be protected by lock or reordering effects may occur.
For example, set(int, E) method contains these lines (under lock):
/* 1 */ int len = elements.length;
/* 2 */ Object[] newElements = Arrays.copyOf(elements, len);
/* 3 */ newElements[index] = element;
/* 4 */ setArray(newElements);
The get(int) method, on the other hand, only does return get(getArray(), index);.
In my understanding of JMM, this means that get may observe the array in an inconsistent state if statements 1-4 are reordered like 1-2(new)-4-2(copyOf)-3.
Do I understand JMM incorrectly or is there any other explanations on why CopyOnWriteArrayList is thread-safe?
If you look at the underlying array reference you'll see it's marked as volatile. When a write operation occurs (such as in the above extract) this volatile reference is only updated in the final statement via setArray. Up until this point any read operations will return elements from the old copy of the array.
The important point is that the array update is an atomic operation and hence reads will always see the array in a consistent state.
The advantage of only taking out a lock for write operations is improved throughput for reads: This is because write operations for a CopyOnWriteArrayList can potentially be very slow as they involve copying the entire list.
Getting the array reference is an atomic operation. So, readers will either see the old array or the new array - either way the state is consistent. (set(int,E) computes the new array contents before setting the reference, so the array is consistent when the asignment is made.)
The array reference itself is marked as volatile so that readers do not need to use a lock to see changes to the referenced array. (EDIT: Also, volatile guarantees that the assignment is not re-ordered, which would lead to the assignment being done when the array is possibly in an inconsistent state.)
The write lock is required to prevent concurrent modification, which may result the array holding inconsistent data or changes being lost.
So according to Java 1.8, following are the declarations of array and lock in CopyOnWriteArrayList.
/** The array, accessed only via getArray/setArray. */
private transient volatile Object[] array;
/** The lock protecting all mutators */
final transient ReentrantLock lock = new ReentrantLock();
Following is definition of add method of CopyOnWriteArrayList
public boolean add(E e) {
final ReentrantLock lock = this.lock;
lock.lock();
try {
Object[] elements = getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
setArray(newElements);
return true;
} finally {
lock.unlock();
}
}
As #Adamski has already mentioned array is volatile and only updated via the setArray method . After that, if all the read only calls are made, and so they would be getting the updated value and hence array is always consistent here.
CopyOnWriteArrayList is a concurrent Collection class introduced in Java 5 Concurrency API along with its popular cousin ConcurrentHashMap in Java.
CopyOnWriteArrayList implements List interface like ArrayList, Vector and LinkedList but its a thread-safe collection and it achieves its thread-safety in a slightly different way than Vector or other thread-safe collection class.
As name suggest CopyOnWriteArrayList creates copy of underlying
ArrayList with every mutation operation e.g. add or set. Normally
CopyOnWriteArrayList is very expensive because it involves costly
Array copy with every write operation but its very efficient if you
have a List where Iteration outnumber mutation e.g. you mostly need to
iterate the ArrayList and don't modify it too often.
Iterator of CopyOnWriteArrayList is fail-safe and doesn't throw
ConcurrentModificationException even if underlying
CopyOnWriteArrayList is modified once Iteration begins because
Iterator is operating on separate copy of ArrayList. Consequently all
the updates made on CopyOnWriteArrayList is not available to Iterator.
To get the most updated version do a new read like list.iterator();
That being said, updating this collection alot will kill performance. If you tried to sort a CopyOnWriteArrayList you'll see the list throws an UnsupportedOperationException (the sort invokes set on the collection N times). You should only use this read when you are doing upwards of 90+% reads.
From the JavaDocs:
A ConcurrentLinkedQueue is an appropriate choice when many threads will share access to a common collection. This queue does not permit null elements.
ArrayBlockingQueue is a classic "bounded buffer", in which a fixed-sized array holds elements inserted by producers and extracted by consumers. This class supports an optional fairness policy for ordering waiting producer and consumer threads
LinkedBlockingQueue typically have higher throughput than array-based queues but less predictable performance in most concurrent applications.
I have 2 scenarios, one requires the queue to support many producers (threads using it) with one consumer and the other is the other way around.
I do not understand which implementation to use. Can somebody explain what the differences are?
Also, what is the 'optional fairness policy' in the ArrayBlockingQueue?
ConcurrentLinkedQueue means no locks are taken (i.e. no synchronized(this) or Lock.lock calls). It will use a CAS - Compare and Swap operation during modifications to see if the head/tail node is still the same as when it started. If so, the operation succeeds. If the head/tail node is different, it will spin around and try again.
LinkedBlockingQueue will take a lock before any modification. So your offer calls would block until they get the lock. You can use the offer overload that takes a TimeUnit to say you are only willing to wait X amount of time before abandoning the add (usually good for message type queues where the message is stale after X number of milliseconds).
Fairness means that the Lock implementation will keep the threads ordered. Meaning if Thread A enters and then Thread B enters, Thread A will get the lock first. With no fairness, it is undefined really what happens. It will most likely be the next thread that gets scheduled.
As for which one to use, it depends. I tend to use ConcurrentLinkedQueue because the time it takes my producers to get work to put onto the queue is diverse. I don't have a lot of producers producing at the exact same moment. But the consumer side is more complicated because poll won't go into a nice sleep state. You have to handle that yourself.
Basically the difference between them are performance characteristics and blocking behavior.
Taking the easiest first, ArrayBlockingQueue is a queue of a fixed size. So if you set the size at 10, and attempt to insert an 11th element, the insert statement will block until another thread removes an element. The fairness issue is what happens if multiple threads try to insert and remove at the same time (in other words during the period when the Queue was blocked). A fairness algorithm ensures that the first thread that asks is the first thread that gets. Otherwise, a given thread may wait longer than other threads, causing unpredictable behavior (sometimes one thread will just take several seconds because other threads that started later got processed first). The trade-off is that it takes overhead to manage the fairness, slowing down the throughput.
The most important difference between LinkedBlockingQueue and ConcurrentLinkedQueue is that if you request an element from a LinkedBlockingQueue and the queue is empty, your thread will wait until there is something there. A ConcurrentLinkedQueue will return right away with the behavior of an empty queue.
Which one depends on if you need the blocking. Where you have many producers and one consumer, it sounds like it. On the other hand, where you have many consumers and only one producer, you may not need the blocking behavior, and may be happy to just have the consumers check if the queue is empty and move on if it is.
Your question title mentions Blocking Queues. However, ConcurrentLinkedQueue is not a blocking queue.
The BlockingQueues are ArrayBlockingQueue, DelayQueue, LinkedBlockingDeque, LinkedBlockingQueue, PriorityBlockingQueue, and SynchronousQueue.
Some of these are clearly not fit for your purpose (DelayQueue, PriorityBlockingQueue, and SynchronousQueue). LinkedBlockingQueue and LinkedBlockingDeque are identical, except that the latter is a double-ended Queue (it implements the Deque interface).
Since ArrayBlockingQueue is only useful if you want to limit the number of elements, I'd stick to LinkedBlockingQueue.
ArrayBlockingQueue has lower memory footprint, it can reuse element node, not like LinkedBlockingQueue that have to create a LinkedBlockingQueue$Node object for each new insertion.
SynchronousQueue ( Taken from another question )
SynchronousQueue is more of a handoff, whereas the LinkedBlockingQueue just allows a single element. The difference being that the put() call to a SynchronousQueue will not return until there is a corresponding take() call, but with a LinkedBlockingQueue of size 1, the put() call (to an empty queue) will return immediately. It's essentially the BlockingQueue implementation for when you don't really want a queue (you don't want to maintain any pending data).
LinkedBlockingQueue (LinkedList Implementation but Not Exactly JDK Implementation of LinkedList It uses static inner class Node to maintain Links between elements )
Constructor for LinkedBlockingQueue
public LinkedBlockingQueue(int capacity)
{
if (capacity < = 0) throw new IllegalArgumentException();
this.capacity = capacity;
last = head = new Node< E >(null); // Maintains a underlying linkedlist. ( Use when size is not known )
}
Node class Used to Maintain Links
static class Node<E> {
E item;
Node<E> next;
Node(E x) { item = x; }
}
3 . ArrayBlockingQueue ( Array Implementation )
Constructor for ArrayBlockingQueue
public ArrayBlockingQueue(int capacity, boolean fair)
{
if (capacity < = 0)
throw new IllegalArgumentException();
this.items = new Object[capacity]; // Maintains a underlying array
lock = new ReentrantLock(fair);
notEmpty = lock.newCondition();
notFull = lock.newCondition();
}
IMHO Biggest Difference between ArrayBlockingQueue and LinkedBlockingQueue is clear from constructor one has underlying data structure Array and other linkedList.
ArrayBlockingQueue uses single-lock double condition algorithm and LinkedBlockingQueue is variant of the "two lock queue" algorithm and it has 2 locks 2 conditions ( takeLock , putLock)
ConcurrentLinkedQueue is lock-free, LinkedBlockingQueue is not. Every time you invoke LinkedBlockingQueue.put() or LinkedBlockingQueue.take(), you need acquire the lock first. In other word, LinkedBlockingQueue has poor concurrency. If you care performance, try ConcurrentLinkedQueue + LockSupport.