Data buffering in multithreaded java application - java

I have a multi threaded application which has one producer thread and several consumer threads.
The data is stored in a shared thread safe collection and flushed to a database when there is sufficient data in the buffer.
From the javadocs -
BlockingQueue<E>
A Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element.
take()
Retrieves and removes the head of this queue, waiting if necessary until an element becomes available.
My questions -
Is there another collection that has a E[] take(int n) method? i.e. Blocking queue waits until an element is available. What I want is
that it should wait until 100 or 200 elements are available.
Alternatively, is there another method I could use to address the problem without polling?

I think the only way is to either extend some implementation of BlockingQueue or create some kind of utility method using take:
public <E> void take(BlockingQueue<E> queue, List<E> to, int max)
throws InterruptedException {
for (int i = 0; i < max; i++)
to.add(queue.take());
}

The drainTo method isn't exactly what you're looking for, but would it serve your purpose?
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection, int)
EDIT
You could implement a slightly more performant batch blocking takemin using a combination of take and drainTo:
public <E> void drainTo(final BlockingQueue<E> queue, final List<E> list, final int min) throws InterruptedException
{
int drained = 0;
do
{
if (queue.size() > 0)
drained += queue.drainTo(list, min - drained);
else
{
list.add(queue.take());
drained++;
}
}
while (drained < min);
}

I am not sure if there's a similar class in the standard library that has take(int n) type method, but you should be able to wrap the default BlockingQueue to add that function without too much hassle, don't you think?
Alternative scenario would be to trigger an action where you put elements in the collection, where a threshold set by you would trigger the flushing.

So this should be a threadsafe queue that lets you block on taking an arbitrary number of elements. More eyes to verify the threading code is correct would be welcome.
package mybq;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
public class ChunkyBlockingQueue<T> {
protected final LinkedList<T> q = new LinkedList<T>();
protected final Object lock = new Object();
public void add(T t) {
synchronized (lock) {
q.add(t);
lock.notifyAll();
}
}
public List<T> take(int numElements) {
synchronized (lock) {
while (q.size() < numElements) {
try {
lock.wait();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
ArrayList<T> l = new ArrayList<T>(numElements);
l.addAll(q.subList(0, numElements));
q.subList(0, numElements).clear();
return l;
}
}
}

Related

Alternative to ConcurrentLinkedQueue, do I need to use LinkedList with locks?

i am currently using a ConcurrentLinkedQueue, so that I can use natural order FIFO and also use it in a thread safe application . I have a requirement to log the size of the queue every minute and given that this collection does not guarantee size and also cost to calculate size is O(N), is there any alternative bounded non blocking concurrent queue that I can use where in obtaining size will not be a costly operation and at the same time the add/remove operation is not expensive either?
If there is no collection, do I need to use LinkedList with locks?
If you really (REALLY) need to log a correct, current size of the Queue you are currently dealing with - you need to block. There is simply no other way. You can think that maintaining a separate LongAdder field might help, may be making your own interface as a wrapper around ConcurrentLinkedQueue, something like:
interface KnownSizeQueue<T> {
T poll();
long size();
}
And an implementation:
static class ConcurrentKnownSizeQueue<T> implements KnownSizeQueue<T> {
private final ConcurrentLinkedQueue<T> queue = new ConcurrentLinkedQueue<>();
private final LongAdder currentSize = new LongAdder();
#Override
public T poll() {
T result = queue.poll();
if(result != null){
currentSize.decrement();
}
return result;
}
#Override
public long size() {
return currentSize.sum();
}
}
I just encourage you to add one more method, like remove into the interface and try to reason about the code. You will, very shortly realize, that such implementations will still give you a wrong result. So, do not do it.
The only reliable way to get the size, if you really need it, is to block for each operation. This comes at a high price, because ConcurrentLinkedQueue is documented as:
This implementation employs an efficient non-blocking...
You will lose those properties, but if that is a hard requirement that does not care about that, you could write your own:
static class ParallelKnownSizeQueue<T> implements KnownSizeQueue<T> {
private final Queue<T> queue = new ArrayDeque<>();
private final ReentrantLock lock = new ReentrantLock();
#Override
public T poll() {
try {
lock.lock();
return queue.poll();
} finally {
lock.unlock();
}
}
#Override
public long size() {
try {
lock.lock();
ConcurrentLinkedQueue
return queue.size();
} finally {
lock.unlock();
}
}
}
Or, of course, you can use an already existing structure, like LinkedBlockingDeque or ArrayBlockingQueue, etc - depending on what you need.

Is there an opposite for the DelayQueue?

I would need a queue that will automatically remove elements that are older than a given amount of milliseconds - basically, I want the items in the queue to expire after some time.
I see there is a delay queue that seems to be doing the opposite: 'an element can only be taken when its delay has expired.' (I've never used it).
Maybe there is a queue implementation that does what I need? It would be better if it was bounded.
The problem with this is who and at which point will remove the elements that have expired. If your concern is the size of the queue not growing beyond certain limits, you will have to have a separate "cleaner" thread, removing things from your queue as they expire. You can implement it with a DelayQueue (offer would add to an internal LinkedHashSet and a DelayQueue, poll operates on the set, and additionally a cleaner thread polls the DelayQueue, and remove things from the set as they "ripen").
If you do not care all that much about items being removed from the queue as soon as they expire, you can just override the poll method of a standard queue, to check the expiration of the head, and, if it has expired, clear the rest of the queue and return null.
If you want to remove expired objects you need a DelayQueue and a Thread which will extract expired objects from it, something like this:
static class Wrapper<E> implements Delayed {
E target;
long exp = System.currentTimeMillis() + 5000; // 5000 ms delay
Wrapper(E target) {
this.target = target;
}
E get() {
return target;
}
#Override
public int compareTo(Delayed o) {
return 0;
}
#Override
public long getDelay(TimeUnit unit) {
return unit.convert(exp - System.currentTimeMillis(), TimeUnit.MILLISECONDS);
}
}
public static void main(String[] args) throws Exception {
final DelayQueue<Wrapper<Integer>> q = new DelayQueue<>();
q.add(new Wrapper<>(1));
Thread.sleep(3000);
q.add(new Wrapper<>(2));
new Thread() {
public void run() {
try {
for(;;) {
Wrapper<Integer> w = q.take();
System.out.println(w.get());
}
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
};
}.start();
}
i guess isn't a native implementation like that in java, i'm not sure through. But you can you a cache to hold this situation, not sure if is the best approuch, but you can use google guava for that, seting a Expiration Time for your itens so you'll only recover values that aren't expired.
Here are the docs for the google guava cache implementation: Guava Doc
Hope it helps!

Using CopyOnWriteArrayList in Java

I am studying java.util.concurrent now. I am trying to understand CopyOnWriteArrayList.
As I understood this class looks like ArrayList, but thread-safe. This class is very useful if you have a lot of reading and less writing.
Here is my example. How can I use it (just for study purpose)?
Can I use it that way?
package Concurrency;
import java.util.concurrent.*;
class Entry {
private static int count;
private final int index = count++;
public String toString() {
return String.format(
"index:%-3d thread:%-3d",
index,
Thread.currentThread().getId());
}
}
class Reader implements Runnable {
private CopyOnWriteArrayList<Entry> list;
Reader(CopyOnWriteArrayList<Entry> list) { this.list = list; }
public void run() {
try {
while(true) {
if(!list.isEmpty())
System.out.println("-out " + list.remove(0));
TimeUnit.MILLISECONDS.sleep(100);
}
} catch (InterruptedException e) {
return;
}
}
}
class Writer implements Runnable {
private CopyOnWriteArrayList<Entry> list;
Writer(CopyOnWriteArrayList<Entry> list) { this.list = list; }
public void run() {
try {
while(true) {
Entry tmp = new Entry();
System.out.println("+in " + tmp);
list.add(tmp);
TimeUnit.MILLISECONDS.sleep(10);
}
} catch (InterruptedException e) {
return;
}
}
}
public class FourtyOne {
static final int nThreads = 7;
public static void main(String[] args) throws InterruptedException {
CopyOnWriteArrayList<Entry> list = new CopyOnWriteArrayList<>();
ExecutorService exec = Executors.newFixedThreadPool(nThreads);
exec.submit(new Writer(list));
for(int i = 0; i < nThreads; i++)
exec.submit(new Reader(list));
TimeUnit.SECONDS.sleep(1);
exec.shutdownNow();
}
}
Please note in your example your one writer is writing at 10x the speed of a given reader, causing a lot of copies to be made. Also note that your reader(s) are performing a write operation (remove()) upon the list as well.
Under this situation, you are writing to the list at a astonishingly high rate causing severe performance issues as large amounts of memory is being used everytime you update this list.
CopyOnWriteArrayList is only used when synchronization overheads are an issue and the ratio of reads vs structural modification is high. The cost of a total array copy is amortized by the performance gains seen when one or more readers try to access the list at the same time. This contrasts that of a traditional synchronized list where each access (read or write) is controlled under some mutex such that only one thread can perform some operation upon the list at once.
If a simple thread-safe list is required, consider synchronized list as provided by Collections.synchronizedList().
Please also note:
if(!list.isEmpty()){
System.out.println("-out " + list.remove(0));
}
is not effective programming as there is no guarantee the list will not be empty after the if statement evaluates. To guarantee consistent effect, you'd need to either directly check the return value of list.remove() or wrap the whole segment in a synchronized block (defeating the purpose of using a thread-safe structure).
The remove() call, being a structurally modifying call should also be replaced a method like get() to ensure no structural modifications are being done whilst the data is being read.
In all, I believe the CopyOnWriteArrayList need only be used in a very specific way and only when traditional synchronization becomes unacceptably slow. Whilst your example may work fine on your own computer, scaling the magnitude of access any larger and you'll be causing the gc to be doing too much work to maintain the heap space.

Why does the iterator.hasNext not work with BlockingQueue?

I was trying to use the iterator methods on a BlockingQueue and discovered that hasNext() is non-blocking - i.e. it will not wait until more elements are added and will instead return false when there are no elements.
So here are the questions :
Is this bad design, or wrong
expectation?
Is there a way to use the blocking
methods of the BLockingQueue with
its parent Collection class methods
(e.g. if some method were expecting
a collection, can I pass a blocking
queue and hope that its processing
will wait until the Queue has more
elements)
Here is a sample code block
public class SomeContainer{
public static void main(String[] args){
BlockingQueue bq = new LinkedBlockingQueue();
SomeContainer h = new SomeContainer();
Producer p = new Producer(bq);
Consumer c = new Consumer(bq);
p.produce();
c.consume();
}
static class Producer{
BlockingQueue q;
public Producer(BlockingQueue q) {
this.q = q;
}
void produce(){
new Thread(){
public void run() {
for(int i=0; i<10; i++){
for(int j=0;j<10; j++){
q.add(i+" - "+j);
}
try {
Thread.sleep(30000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
};
}.start();
}
}
static class Consumer{
BlockingQueue q;
public Consumer(BlockingQueue q) {
this.q = q;
}
void consume() {
new Thread() {
public void run() {
Iterator itr = q.iterator();
while (itr.hasNext())
System.out.println(itr.next());
}
}.start();
}
}
}
This Code only prints the iteration once at the most.
Just don't use iterators with Queues. Use peek() or poll() instead or take() if it's a BlockingQueue:
void consume() {
new Thread() {
#Override
public void run() {
Object value;
// actually, when using a BlockingQueue,
// take() would be better than poll()
while ((value=q.poll())!=null)
System.out.println(value);
}
}.start();
}
A Queue is an Iterable because it is a Collection and hence needs to provide an iterator() method, but that shouldn't ever be used, or you shouldn't be using a Queue in the first place.
1) Is this bad design, or wrong expectation?
Wrong expectations since it would otherwise violate the contract of Iterator which on Iterator.next() says: Throws: NoSuchElementException - iteration has no more elements.
If next() would block the exception would never be thrown.
2) Is there a way to use the blocking methods
Yes, for instance by extending the class and overriding the next and hasNext methods to use blocking routines instead. Note that hasNext would need to always return true in this case - which again violates the contract.
if an iterator blocked on hasNext then the iteration would never finish unless you explicitly broke out of it, this would be quite a strange design.
In any case the LinkedBlockingQueue javadoc has this to say
Returns an iterator over the elements in this queue in proper sequence.
The returned <tt>Iterator</tt> is a "weakly consistent" iterator that will
never throw {#link ConcurrentModificationException}, and guarantees to
traverse elements as they existed upon construction of the iterator, and
may (but is not guaranteed to) reflect any modifications subsequent to
construction.
I think that it may be reasonable under certain circumstances to have an Iterable whose iterator() will block, although having a seperate BlockingIteratorwould be foolish. The reason for this is because that lests you use an enhanced for loop, which can,in some cases, make your code cleaner. (If it would not accomplish that in your particular circumstance, do not do this at all.)
for(Request request:requests) process(request);
However, the iterator is still not free from a termination condition! The iterator should terminate once the queue has been closed to new items, and runs out of elements.
The issue still remains, though, that if the loop was already blocking on the iterator's next() method, the only way to exit if the queue is closed is to throw an exception, which the surrounding code would need to handle correctly, If you choose to do this, make sure you explain very clearly and precisely, how your implementation works in the javadoc comments.
The Iterator for LinkedBlockingQueue has this as its hasNext implementation:
private Node<E> current;
public boolean hasNext() {
return current != null;
}
so this will only work per call. You can wrap the method in a while(true) loop if you want to wait for elements and use the standard java Iterator idiom:
while (true) {
if(itr.hasNext()) {
System.out.println(itr.next());
}
}

Assigning a object to a field defined outside a synchronized block - is it thread safe?

Is there anything wrong with the thread safety of this java code? Threads 1-10 add numbers via sample.add(), and Threads 11-20 call removeAndDouble() and print the results to stdout. I recall from the back of my mind that someone said that assigning item in same way as I've got in removeAndDouble() using it outside of the synchronized block may not be thread safe. That the compiler may optimize the instructions away so they occur out of sequence. Is that the case here? Is my removeAndDouble() method unsafe?
Is there anything else wrong from a concurrency perspective with this code? I am trying to get a better understanding of concurrency and the memory model with java (1.6 upwards).
import java.util.*;
import java.util.concurrent.*;
public class Sample {
private final List<Integer> list = new ArrayList<Integer>();
public void add(Integer o) {
synchronized (list) {
list.add(o);
list.notify();
}
}
public void waitUntilEmpty() {
synchronized (list) {
while (!list.isEmpty()) {
try {
list.wait(10000);
} catch (InterruptedException ex) { }
}
}
}
public void waitUntilNotEmpty() {
synchronized (list) {
while (list.isEmpty()) {
try {
list.wait(10000);
} catch (InterruptedException ex) { }
}
}
}
public Integer removeAndDouble() {
// item declared outside synchronized block
Integer item;
synchronized (list) {
waitUntilNotEmpty();
item = list.remove(0);
}
// Would this ever be anything but that from list.remove(0)?
return Integer.valueOf(item.intValue() * 2);
}
public static void main(String[] args) {
final Sample sample = new Sample();
for (int i = 0; i < 10; i++) {
Thread t = new Thread() {
public void run() {
while (true) {
System.out.println(getName()+" Found: " + sample.removeAndDouble());
}
}
};
t.setName("Consumer-"+i);
t.setDaemon(true);
t.start();
}
final ExecutorService producers = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
final int j = i * 10000;
Thread t = new Thread() {
public void run() {
for (int c = 0; c < 1000; c++) {
sample.add(j + c);
}
}
};
t.setName("Producer-"+i);
t.setDaemon(false);
producers.execute(t);
}
producers.shutdown();
try {
producers.awaitTermination(600, TimeUnit.SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
sample.waitUntilEmpty();
System.out.println("Done.");
}
}
It looks thread safe to me. Here is my reasoning.
Everytime you access list you do it synchronized. This is great. Even though you pull out a part of the list in item, that item is not accessed by multiple threads.
As long as you only access list while synchronized, you should be good (in your current design.)
Your synchronization is fine, and will not result in any out-of-order execution problems.
However, I do notice a few issues.
First, your waitUntilEmpty method would be much more timely if you add a list.notifyAll() after the list.remove(0) in removeAndDouble. This will eliminate an up-to 10 second delay in your wait(10000).
Second, your list.notify in add(Integer) should be a notifyAll, because notify only wakes one thread, and it may wake a thread that is waiting inside waitUntilEmpty instead of waitUntilNotEmpty.
Third, none of the above is terminal to your application's liveness, because you used bounded waits, but if you make the two above changes, your application will have better threaded performance (waitUntilEmpty) and the bounded waits become unnecessary and can become plain old no-arg waits.
Your code as-is is in fact thread safe. The reasoning behind this is two part.
The first is mutual exclusion. Your synchronization correctly ensures that only one thread at a time will modify the collections.
The second has to do with your concern about compiler reordering. Youre worried that the compile can in fact re order the assigning in which it wouldnt be thread safe. You dont have to worry about it in this case. Synchronizing on the list creates a happens-before relationship. All removes from the list happens-before the write to Integer item. This tells the compiler that it cannot re order the write to item in that method.
Your code is thread-safe, but not concurrent (as in parallel). As everything is accessed under a single mutual exclusion lock, you are serialising all access, in effect access to the structure is single-threaded.
If you require the functionality as described in your production code, the java.util.concurrent package already provides a BlockingQueue with (fixed size) array and (growable) linked list based implementations. These are very interesting to study for implementation ideas at the very least.

Categories