Concurrency help on a custom queue-like data structure - java

I am trying to implement an insertion-performance-focused, queue-like data structure that must meet the following requirements:
Must be thread-safe
Must be able to add to queue without synchronizing
Must be able to get a "snapshot" view of the queue
My issue is getting the 'snapshot' view in a way that doesn't require synchronization of the insert. Since I can block the removal and elements can only be added to the end, getting the elements shouldn't be an issue. The problem I keep running into is that the LinkedList's iterator has an unsupressable concurrent modification fast-fail baked in and 'LinkedList.get(int)' is O(n).
Below is a pared-down example of where I am with what should be a fairly simple task.
public class SnapshotableQueue<T> {
private final LinkedList<T> queue = new LinkedList<>();
private final Object removeLock = new Object();
public void add(T element) {
queue.add(element);
}
public T remove() {
synchronized(removeLock) {
return queue.remove();
}
}
public List<T> getSnapshot() {
synchronized(removeLock) {
int length = queue.size();
List<T> snapshot = new ArrayList<>(length);
???
return snapshot;
}
}
}
Unacceptable Solution #1
for(int i = 0; i < length; i++)
snapshot.add(snapshot.get(i));
'LinkedList.get(int)' is O(n)
Unacceptable Solution #2
Iterator<T> iterator = queue.iterator();
for(int i = 0; i < length; i++)
snapshot.add(iterator.next());
Isn't thread-safe (throws ConcurrentModificationException)
Unacceptable Solution #3
Change queue to ArrayList
'ArrayList.remove(0)' is O(n)

Don't reinvent the wheel, use ConcurrentLinkedQueue instead of LinkedList then use the iterator() to build your snapshot which is natively thread safe.
Your method getSnapshot will then be simply
public List<T> getSnapshot() {
return new ArrayList<>(queue);
}

There is this one in Java CopyOnWriteArrayList it's part of the concurrent package and seems to do exactly what you want.
But mind that it creates a copy of the list every time you insert something, so it should only be used in scenarios were you read a lot more than you write.

Related

Alternative to ConcurrentLinkedQueue, do I need to use LinkedList with locks?

i am currently using a ConcurrentLinkedQueue, so that I can use natural order FIFO and also use it in a thread safe application . I have a requirement to log the size of the queue every minute and given that this collection does not guarantee size and also cost to calculate size is O(N), is there any alternative bounded non blocking concurrent queue that I can use where in obtaining size will not be a costly operation and at the same time the add/remove operation is not expensive either?
If there is no collection, do I need to use LinkedList with locks?
If you really (REALLY) need to log a correct, current size of the Queue you are currently dealing with - you need to block. There is simply no other way. You can think that maintaining a separate LongAdder field might help, may be making your own interface as a wrapper around ConcurrentLinkedQueue, something like:
interface KnownSizeQueue<T> {
T poll();
long size();
}
And an implementation:
static class ConcurrentKnownSizeQueue<T> implements KnownSizeQueue<T> {
private final ConcurrentLinkedQueue<T> queue = new ConcurrentLinkedQueue<>();
private final LongAdder currentSize = new LongAdder();
#Override
public T poll() {
T result = queue.poll();
if(result != null){
currentSize.decrement();
}
return result;
}
#Override
public long size() {
return currentSize.sum();
}
}
I just encourage you to add one more method, like remove into the interface and try to reason about the code. You will, very shortly realize, that such implementations will still give you a wrong result. So, do not do it.
The only reliable way to get the size, if you really need it, is to block for each operation. This comes at a high price, because ConcurrentLinkedQueue is documented as:
This implementation employs an efficient non-blocking...
You will lose those properties, but if that is a hard requirement that does not care about that, you could write your own:
static class ParallelKnownSizeQueue<T> implements KnownSizeQueue<T> {
private final Queue<T> queue = new ArrayDeque<>();
private final ReentrantLock lock = new ReentrantLock();
#Override
public T poll() {
try {
lock.lock();
return queue.poll();
} finally {
lock.unlock();
}
}
#Override
public long size() {
try {
lock.lock();
ConcurrentLinkedQueue
return queue.size();
} finally {
lock.unlock();
}
}
}
Or, of course, you can use an already existing structure, like LinkedBlockingDeque or ArrayBlockingQueue, etc - depending on what you need.

Concurrent iteration and thread safety

I'm reading B. Goetz Java Concurrency In Practice and now I'm at the section about thread-safe collections. He described the so-called "hidden iterators" which may throw ConcurrentModificationException. Here is the example he dispensed:
public class HiddenIterator{
#GuardedBy("this")
private final Set<Integer> set = new HashSet<Integer>();
public synchronized void add(Integer i){ set.add(i); }
public synchronized void remove(Integer i){ set.remove(i); }
public void addTenThings(){
Random r = new Random();
for(int i = 0; i < 10; i++)
add(r.nextInt());
System.out.println("DEBUG: added ten elements to set " + set)
}
}
Now, it's obviously that addTenThings() may throw ConcurrentModificationException as that printing set's content involves iterating it. But he provide the following suggestion for dealing with it:
If HiddenIterator wrapped the HashSet with a synchronizedSet,
encapsulating the synchronization, this sort of error would not occur.
I don't quite understand it. Even if we wrapped set into a synchronized-wrapper, the class would still remain NotThreadSafe. What did he mean?
This is because Collections.synchronizedSet synchronizes every method, including toString. Indeed, if you tried to iterate over a wrapped set manually, you could get ConcurrentModificationException, so you have to synchronize manual iteration yourself. But methods that do hidden iterations already do it, so you don't have to worry about that at least. Here is the corresponding piece of code from the JDK sources:
public String toString() {
synchronized (mutex) {return c.toString();}
}
Here, mutex is initialized to this in the constructor of the wrapper class, so it's basically synchronized (this).

Data buffering in multithreaded java application

I have a multi threaded application which has one producer thread and several consumer threads.
The data is stored in a shared thread safe collection and flushed to a database when there is sufficient data in the buffer.
From the javadocs -
BlockingQueue<E>
A Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element.
take()
Retrieves and removes the head of this queue, waiting if necessary until an element becomes available.
My questions -
Is there another collection that has a E[] take(int n) method? i.e. Blocking queue waits until an element is available. What I want is
that it should wait until 100 or 200 elements are available.
Alternatively, is there another method I could use to address the problem without polling?
I think the only way is to either extend some implementation of BlockingQueue or create some kind of utility method using take:
public <E> void take(BlockingQueue<E> queue, List<E> to, int max)
throws InterruptedException {
for (int i = 0; i < max; i++)
to.add(queue.take());
}
The drainTo method isn't exactly what you're looking for, but would it serve your purpose?
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection, int)
EDIT
You could implement a slightly more performant batch blocking takemin using a combination of take and drainTo:
public <E> void drainTo(final BlockingQueue<E> queue, final List<E> list, final int min) throws InterruptedException
{
int drained = 0;
do
{
if (queue.size() > 0)
drained += queue.drainTo(list, min - drained);
else
{
list.add(queue.take());
drained++;
}
}
while (drained < min);
}
I am not sure if there's a similar class in the standard library that has take(int n) type method, but you should be able to wrap the default BlockingQueue to add that function without too much hassle, don't you think?
Alternative scenario would be to trigger an action where you put elements in the collection, where a threshold set by you would trigger the flushing.
So this should be a threadsafe queue that lets you block on taking an arbitrary number of elements. More eyes to verify the threading code is correct would be welcome.
package mybq;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
public class ChunkyBlockingQueue<T> {
protected final LinkedList<T> q = new LinkedList<T>();
protected final Object lock = new Object();
public void add(T t) {
synchronized (lock) {
q.add(t);
lock.notifyAll();
}
}
public List<T> take(int numElements) {
synchronized (lock) {
while (q.size() < numElements) {
try {
lock.wait();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
ArrayList<T> l = new ArrayList<T>(numElements);
l.addAll(q.subList(0, numElements));
q.subList(0, numElements).clear();
return l;
}
}
}

Java concurrency - improving a copy-on-read collection

I have a multithreaded application, where a shared list has write-often, read-occasionally behaviour.
Specifically, many threads will dump data into the list, and then - later - another worker will grab a snapshot to persist to a datastore.
This is similar to the discussion over on this question.
There, the following solution is provided:
class CopyOnReadList<T> {
private final List<T> items = new ArrayList<T>();
public void add(T item) {
synchronized (items) {
// Add item while holding the lock.
items.add(item);
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>();
synchronized (items) {
// Make a copy while holding the lock.
for (T t : items) copy.add(t);
}
return copy;
}
}
However, in this scenario, (and, as I've learned from my question here), only one thread can write to the backing list at any given time.
Is there a way to allow high-concurrency writes to the backing list, which are locked only during the makeSnapshot() call?
synchronized (~20 ns) is pretty fast and even though other operations can allow concurrency, they can be slower.
private final Lock lock = new ReentrantLock();
private List<T> items = new ArrayList<T>();
public void add(T item) {
lock.lock();
// trivial lock time.
try {
// Add item while holding the lock.
items.add(item);
} finally {
lock.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(), ret;
lock.lock();
// trivial lock time.
try {
ret = items;
items = copy;
} finally {
lock.unlock();
}
return ret;
}
public static void main(String... args) {
long start = System.nanoTime();
Main<Integer> ints = new Main<>();
for (int j = 0; j < 100 * 1000; j++) {
for (int i = 0; i < 1000; i++)
ints.add(i);
ints.makeSnapshot();
}
long time = System.nanoTime() - start;
System.out.printf("The average time to add was %,d ns%n", time / 100 / 1000 / 1000);
}
prints
The average time to add was 28 ns
This means if you are creating 30 million entries per second, you will have one thread accessing the list on average. If you are creating 60 million per second, you will have concurrency issues, however you are likely to be having many more resourcing issue at this point.
Using Lock.lock() and Lock.unlock() can be faster when there is a high contention ratio. However, I suspect your threads will be spending most of the time building the objects to be created rather than waiting to add the objects.
You could use a ConcurrentDoublyLinkedList. There is an excellent implementation here ConcurrentDoublyLinkedList.
So long as you iterate forward through the list when you make your snapshot all should be well. This implementation preserves the forward chain at all times. The backward chain is sometimes inaccurate.
First of all, you should investigate if this really is too slow. Adds to ArrayLists are O(1) in the happy case, so if the list has an appropriate initial size, CopyOnReadList.add is basically just a bounds check and an assignment to an array slot, which is pretty fast. (And please, do remember that CopyOnReadList was written to be understandable, not performant.)
If you need a non-locking operation, you can have something like this:
class ConcurrentStack<T> {
private final AtomicReference<Node<T>> stack = new AtomicReference<>();
public void add(T value){
Node<T> tail, head;
do {
tail = stack.get();
head = new Node<>(value, tail);
} while (!stack.compareAndSet(tail, head));
}
public Node<T> drain(){
// Get all elements from the stack and reset it
return stack.getAndSet(null);
}
}
class Node<T> {
// getters, setters, constructors omitted
private final T value;
private final Node<T> tail;
}
Note that while adds to this structure should deal pretty well with high contention, it comes with several drawbacks. The output from drain is quite slow to iterate over, it uses quite a lot of memory (like all linked lists), and you also get things in the opposite insertion order. (Also, it's not really tested or verified, and may actually suck in your application. But that's always the risk with using code from some random dude on the intertubes.)
Yes, there is a way. It is similar to the way ConcurrentHashMap made, if you know.
You should make your own data structure not from one list for all writing threads, but use several independent lists. Each of such lists should be guarded by it's own lock. .add() method should choose list for append current item based on Thread.currentThread.id (for example, just id % listsCount). This will gives you good concurrency properties for .add() -- at best, listsCount threads will be able to write without contention.
On makeSnapshot() you should just iterate over all lists, and for each list you grab it's lock and copy content.
This is just an idea -- there are many places to improve it.
You can use a ReadWriteLock to allow multiple threads to perform add operations on the backing list in parallel, but only one thread to make the snapshot. While the snapshot is being prepared all other add and snapshot request are put on hold.
A ReadWriteLock maintains a pair of associated locks, one for
read-only operations and one for writing. The read lock may be held
simultaneously by multiple reader threads, so long as there are no
writers. The write lock is exclusive.
class CopyOnReadList<T> {
// free to use any concurrent data structure, ConcurrentLinkedQueue used as an example
private final ConcurrentLinkedQueue<T> items = new ConcurrentLinkedQueue<T>();
private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Lock shared = rwLock.readLock();
private final Lock exclusive = rwLock.writeLock();
public void add(T item) {
shared.lock(); // multiple threads can attain the read lock
// try-finally is overkill if items.add() never throws exceptions
try {
// Add item while holding the lock.
items.add(item);
} finally {
shared.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(); // probably better idea to use a LinkedList or the ArrayList constructor with initial size
exclusive.lock(); // only one thread can attain write lock, all read locks are also blocked
// try-finally is overkill if for loop never throws exceptions
try {
// Make a copy while holding the lock.
for (T t : items) {
copy.add(t);
}
} finally {
exclusive.unlock();
}
return copy;
}
}
Edit:
The read-write lock is so named because it is based on the readers-writers problem not on how it is used. Using the read-write lock we can have multiple threads achieve read locks but only one thread achieve the write lock exclusively. In this case the problem is reversed - we want multiple threads to write (add) and only thread to read (make the snapshot). So, we want multiple threads to use the read lock even though they are actually mutating. Only thread is exclusively making the snapshot using the write lock even though snapshot only reads. Exclusive means that during making the snapshot no other add or snapshot requests can be serviced by other threads at the same time.
As #PeterLawrey pointed out, the Concurrent queue will serialize the writes aqlthough the locks will be used for as minimal a duration as possible. We are free to use any other concurrent data structure, e.g. ConcurrentDoublyLinkedList. The queue is used only as an example. The main idea is the use of read-write locks.

java: concurrent iteration over an immutable Iterable

I have an immutable Iterable<X> with a large number of elements. (it happens to be a List<> but never mind that.)
What I would like to do is start a few parallel / asynchronous tasks to iterate over the Iterable<> with the same iterator, and I'm wondering what interface I should use.
Here's a sample implementation with the to-be-determined interface QuasiIteratorInterface:
public void process(Iterable<X> iterable)
{
QuasiIteratorInterface<X> qit = ParallelIteratorWrapper.iterate(iterable);
for (int i = 0; i < MAX_PARALLEL_COUNT; ++i)
{
SomeWorkerClass worker = new SomeWorkerClass(qit);
worker.start();
}
}
class ParallelIteratorWrapper<T> implements QuasiIteratorInterface<T>
{
final private Iterator<T> iterator;
final private Object lock = new Object();
private ParallelIteratorWrapper(Iterator<T> iterator) {
this.iterator = iterator;
}
static public <T> ParallelIteratorWrapper<T> iterate(Iterable<T> iterable)
{
return new ParallelIteratorWrapper(iterable.iterator());
}
private T getNextItem()
{
synchronized(lock)
{
if (this.iterator.hasNext())
return this.iterator.next();
else
return null;
}
}
/* QuasiIteratorInterface methods here */
}
Here's my problem:
it doesn't make sense to use Iterator directly, since hasNext() and next() have a synchronization problem, where hasNext() is useless if someone else calls next() before you do.
I'd love to use Queue, but the only method I need is poll()
I'd love to use ConcurrentLinkedQueue to hold my large number of elements... except I may have to iterate through the elements more than once, so I can't use that.
Any suggestions?
Create your own Producer interface with the poll() method or equivalent (Guava's Supplier for instance). The implementation options are many but if you have an immutable random access list then you can simply maintain a thread-safe monotonic counter (AtomicInteger for instance) and call list.get(int) eg:
class ListSupplier<T> implements Supplier<T> {
private final AtomicInteger next = new AtomicInteger();
private final List<T> elements; // ctor injected
…
public <T> get() {
// real impl more complicated due to bounds checks
// and what to do when exhausted
return elements.get(next.getAndIncrement());
}
}
That is thread-safe, but you'd probably want to either return an Option style thing or null when exhausted.
Have one dispatcher thread that iterates over Iterable and dispatches elements to multiple worker threads that perform the work on the elements. You can use ThreadPoolExecutor to automate this.

Categories