java: concurrent iteration over an immutable Iterable - java

I have an immutable Iterable<X> with a large number of elements. (it happens to be a List<> but never mind that.)
What I would like to do is start a few parallel / asynchronous tasks to iterate over the Iterable<> with the same iterator, and I'm wondering what interface I should use.
Here's a sample implementation with the to-be-determined interface QuasiIteratorInterface:
public void process(Iterable<X> iterable)
{
QuasiIteratorInterface<X> qit = ParallelIteratorWrapper.iterate(iterable);
for (int i = 0; i < MAX_PARALLEL_COUNT; ++i)
{
SomeWorkerClass worker = new SomeWorkerClass(qit);
worker.start();
}
}
class ParallelIteratorWrapper<T> implements QuasiIteratorInterface<T>
{
final private Iterator<T> iterator;
final private Object lock = new Object();
private ParallelIteratorWrapper(Iterator<T> iterator) {
this.iterator = iterator;
}
static public <T> ParallelIteratorWrapper<T> iterate(Iterable<T> iterable)
{
return new ParallelIteratorWrapper(iterable.iterator());
}
private T getNextItem()
{
synchronized(lock)
{
if (this.iterator.hasNext())
return this.iterator.next();
else
return null;
}
}
/* QuasiIteratorInterface methods here */
}
Here's my problem:
it doesn't make sense to use Iterator directly, since hasNext() and next() have a synchronization problem, where hasNext() is useless if someone else calls next() before you do.
I'd love to use Queue, but the only method I need is poll()
I'd love to use ConcurrentLinkedQueue to hold my large number of elements... except I may have to iterate through the elements more than once, so I can't use that.
Any suggestions?

Create your own Producer interface with the poll() method or equivalent (Guava's Supplier for instance). The implementation options are many but if you have an immutable random access list then you can simply maintain a thread-safe monotonic counter (AtomicInteger for instance) and call list.get(int) eg:
class ListSupplier<T> implements Supplier<T> {
private final AtomicInteger next = new AtomicInteger();
private final List<T> elements; // ctor injected
…
public <T> get() {
// real impl more complicated due to bounds checks
// and what to do when exhausted
return elements.get(next.getAndIncrement());
}
}
That is thread-safe, but you'd probably want to either return an Option style thing or null when exhausted.

Have one dispatcher thread that iterates over Iterable and dispatches elements to multiple worker threads that perform the work on the elements. You can use ThreadPoolExecutor to automate this.

Related

Alternative to ConcurrentLinkedQueue, do I need to use LinkedList with locks?

i am currently using a ConcurrentLinkedQueue, so that I can use natural order FIFO and also use it in a thread safe application . I have a requirement to log the size of the queue every minute and given that this collection does not guarantee size and also cost to calculate size is O(N), is there any alternative bounded non blocking concurrent queue that I can use where in obtaining size will not be a costly operation and at the same time the add/remove operation is not expensive either?
If there is no collection, do I need to use LinkedList with locks?
If you really (REALLY) need to log a correct, current size of the Queue you are currently dealing with - you need to block. There is simply no other way. You can think that maintaining a separate LongAdder field might help, may be making your own interface as a wrapper around ConcurrentLinkedQueue, something like:
interface KnownSizeQueue<T> {
T poll();
long size();
}
And an implementation:
static class ConcurrentKnownSizeQueue<T> implements KnownSizeQueue<T> {
private final ConcurrentLinkedQueue<T> queue = new ConcurrentLinkedQueue<>();
private final LongAdder currentSize = new LongAdder();
#Override
public T poll() {
T result = queue.poll();
if(result != null){
currentSize.decrement();
}
return result;
}
#Override
public long size() {
return currentSize.sum();
}
}
I just encourage you to add one more method, like remove into the interface and try to reason about the code. You will, very shortly realize, that such implementations will still give you a wrong result. So, do not do it.
The only reliable way to get the size, if you really need it, is to block for each operation. This comes at a high price, because ConcurrentLinkedQueue is documented as:
This implementation employs an efficient non-blocking...
You will lose those properties, but if that is a hard requirement that does not care about that, you could write your own:
static class ParallelKnownSizeQueue<T> implements KnownSizeQueue<T> {
private final Queue<T> queue = new ArrayDeque<>();
private final ReentrantLock lock = new ReentrantLock();
#Override
public T poll() {
try {
lock.lock();
return queue.poll();
} finally {
lock.unlock();
}
}
#Override
public long size() {
try {
lock.lock();
ConcurrentLinkedQueue
return queue.size();
} finally {
lock.unlock();
}
}
}
Or, of course, you can use an already existing structure, like LinkedBlockingDeque or ArrayBlockingQueue, etc - depending on what you need.

Concurrency help on a custom queue-like data structure

I am trying to implement an insertion-performance-focused, queue-like data structure that must meet the following requirements:
Must be thread-safe
Must be able to add to queue without synchronizing
Must be able to get a "snapshot" view of the queue
My issue is getting the 'snapshot' view in a way that doesn't require synchronization of the insert. Since I can block the removal and elements can only be added to the end, getting the elements shouldn't be an issue. The problem I keep running into is that the LinkedList's iterator has an unsupressable concurrent modification fast-fail baked in and 'LinkedList.get(int)' is O(n).
Below is a pared-down example of where I am with what should be a fairly simple task.
public class SnapshotableQueue<T> {
private final LinkedList<T> queue = new LinkedList<>();
private final Object removeLock = new Object();
public void add(T element) {
queue.add(element);
}
public T remove() {
synchronized(removeLock) {
return queue.remove();
}
}
public List<T> getSnapshot() {
synchronized(removeLock) {
int length = queue.size();
List<T> snapshot = new ArrayList<>(length);
???
return snapshot;
}
}
}
Unacceptable Solution #1
for(int i = 0; i < length; i++)
snapshot.add(snapshot.get(i));
'LinkedList.get(int)' is O(n)
Unacceptable Solution #2
Iterator<T> iterator = queue.iterator();
for(int i = 0; i < length; i++)
snapshot.add(iterator.next());
Isn't thread-safe (throws ConcurrentModificationException)
Unacceptable Solution #3
Change queue to ArrayList
'ArrayList.remove(0)' is O(n)
Don't reinvent the wheel, use ConcurrentLinkedQueue instead of LinkedList then use the iterator() to build your snapshot which is natively thread safe.
Your method getSnapshot will then be simply
public List<T> getSnapshot() {
return new ArrayList<>(queue);
}
There is this one in Java CopyOnWriteArrayList it's part of the concurrent package and seems to do exactly what you want.
But mind that it creates a copy of the list every time you insert something, so it should only be used in scenarios were you read a lot more than you write.

Synchronized List/Map in Java if only one thread is writing to it

The first thread is filling a collection continuously with objects. A second thread needs to iterate over these objects, but it will not change the collection.
Currently I use Collection.synchronized for making it thread-safe, but is there a fast way to doing it?
Update
It's simple: The first thread (ui) continuously writes the mouse position to the ArrayList, as long as the mousebutton is pressed down. The second thread (render) draws a line based on the list.
Use java.util.concurrent.ArrayBlockingQueue.ArrayBlockingQueue implementation of BlockingQueue. It perfectly suits your needs.
It is perfectly suited for producer-consumer cases as that is one in yours.
You can also configure access policy. Javadoc explains access policy like this:
Fair if true then queue accesses for threads blocked on insertion or removal, are processed in FIFO order; if false the access order is unspecified.
Even if you synchronize the list, it's not necessarily thread-safe while iterating over it, so make sure you synchronize on it:
synchronized(synchronizedList) {
for (Object o : synchronizedList) {
doSomething()
}
}
Edit:
Here's a very clearly written article on the matter:
http://java67.blogspot.com/2014/12/how-to-synchronize-arraylist-in-java.html
As mentioned in comments, you need explicit synchronization on this list, because iteration is not atomic:
List<?> list = // ...
Thread 1:
synchronized(list) {
list.add(o);
}
Thread 2:
synchronized(list) {
for (Object o : list) {
// do actions on object
}
}
There are 3 options which I can currently think of to handle concurrency in ArrayList:-
Using Collections.synchronizedList(list) - currently you are using it.
CopyOnWriteArrayList - behaves much like ArrayList class, except that when the list is modified, instead of modifying the underlying array, a new array in created and the old array is discarded. It will be slower than 1.
Creating custom ArrayList class using ReentrantReadWriteLock. You can create a wrapper around ArrayList class. Use read lock when reading/iterating/looping and use write lock when adding elements in array.
For e.g:-
import java.util.List;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
public class ReadWriteList<E> {
private final List<E> list;
private ReadWriteLock lock = new ReentrantReadWriteLock();
private final Lock r =lock.readLock();
private final Lock w =lock.writeLock();
public ReadWriteList(List<E> list){
this.list=list;
}
public boolean add(E e){
w.lock();
try{
return list.add(e);
}
finally{
w.unlock();
}
}
//Do the same for other modification methods
public E getElement(int index){
r.lock();
try{
return list.get(index);
}
finally{
r.unlock();
}
}
public List<E> getList(){
r.lock();
try{
return list;
}
finally{
r.unlock();
}
}
//Do the same for other read methods
}
If you're reading far more often than writing, you can use CopyOnWriteArrayList
Rather than a List will a Set suit your needs?
If so, you can use Collections.newSetFromMap(new ConcurrentHashMap<>())

Implementation of singleton thread-safe list

I'm using Spring framework. Need to have a list of objects, which should get all data from database at once. When data is changed, list will be null and next get operation should fill data from database again. Is my code correct for multi-thread environment?
#Component
#Scope("singleton")
public class MyObjectHolder {
private volatile List<MyObject> objectList = null;
public List<MyObject> getObjectList() {
if (objectList == null) {
synchronized (objectList) {
if (objectList == null) {
objectList = getFromDB();
}
}
}
return objectList;
}
synchronized
public void clearObjectList() {
objectList = null;
}
}
Short answer: no.
public class MyObjectHolder {
private final List<MyObject> objectList = new List<>();
public List<MyObject> getObjectList() {
return objectList;
}
This is the preferred singleton pattern.
Now you need to figure out how to get the data into the list in a thread-safe way. For this Java already has some pre-made thread-safe lists in the concurrent package, which should be preferred to any synchronized implementation, as they are much faster under heavy threading.
Your problem could be solved like this:
public class MyObjectHolder {
private final CopyOnWriteArrayList<MyObject> objectList = new CopyOnWriteArrayList<>();
public List<MyObject> getObjectList() {
return objectList;
}
public boolean isEmtpy() {
return objectList.isEmpty();
}
public void readDB() {
final List<MyObject> dbList = getFromDB();
// ?? objectList.clear();
objectList.addAll(dbList);
}
}
Please note the absence of any synchronized, yet the thing is completely thread-safe. Java guarantees that the calls on that list are performed atomically. So I can call isEmpty() while someone else is filling up the list. I will only get a snapshot of a moment in time and can't tell what result I will get, but it will in all cases succeed without error.
The DB call is first written into a temporary list, therefore no threading issues can happen here. Then the addAll() will atomically move the content into the real list, again: all thread-safe.
The worst-case scenario is that Thread A is just about done writing the new data, while at the same time Thread B checks if the list contains any elements. Thread B will receive the information that the list is empty, yet a microsecond later it contains tons of data. You need to deal with this situation by either repeatedly polling or by using an observer pattern to notify the other threads.
No, your code is not thread safe. For example, you could assign objectList in one thread at time X, but set it to null (via clearObjectList()) at time X+1 because you are synchronizing on 2 different objects. The first synchronization is on objectList itself and the second synchronization is on the instance of MyObjectHolder. You should look into locks when using a shared resource instead of using synchonize, specifically something like a ReadWriteLock.

Why does the iterator.hasNext not work with BlockingQueue?

I was trying to use the iterator methods on a BlockingQueue and discovered that hasNext() is non-blocking - i.e. it will not wait until more elements are added and will instead return false when there are no elements.
So here are the questions :
Is this bad design, or wrong
expectation?
Is there a way to use the blocking
methods of the BLockingQueue with
its parent Collection class methods
(e.g. if some method were expecting
a collection, can I pass a blocking
queue and hope that its processing
will wait until the Queue has more
elements)
Here is a sample code block
public class SomeContainer{
public static void main(String[] args){
BlockingQueue bq = new LinkedBlockingQueue();
SomeContainer h = new SomeContainer();
Producer p = new Producer(bq);
Consumer c = new Consumer(bq);
p.produce();
c.consume();
}
static class Producer{
BlockingQueue q;
public Producer(BlockingQueue q) {
this.q = q;
}
void produce(){
new Thread(){
public void run() {
for(int i=0; i<10; i++){
for(int j=0;j<10; j++){
q.add(i+" - "+j);
}
try {
Thread.sleep(30000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
};
}.start();
}
}
static class Consumer{
BlockingQueue q;
public Consumer(BlockingQueue q) {
this.q = q;
}
void consume() {
new Thread() {
public void run() {
Iterator itr = q.iterator();
while (itr.hasNext())
System.out.println(itr.next());
}
}.start();
}
}
}
This Code only prints the iteration once at the most.
Just don't use iterators with Queues. Use peek() or poll() instead or take() if it's a BlockingQueue:
void consume() {
new Thread() {
#Override
public void run() {
Object value;
// actually, when using a BlockingQueue,
// take() would be better than poll()
while ((value=q.poll())!=null)
System.out.println(value);
}
}.start();
}
A Queue is an Iterable because it is a Collection and hence needs to provide an iterator() method, but that shouldn't ever be used, or you shouldn't be using a Queue in the first place.
1) Is this bad design, or wrong expectation?
Wrong expectations since it would otherwise violate the contract of Iterator which on Iterator.next() says: Throws: NoSuchElementException - iteration has no more elements.
If next() would block the exception would never be thrown.
2) Is there a way to use the blocking methods
Yes, for instance by extending the class and overriding the next and hasNext methods to use blocking routines instead. Note that hasNext would need to always return true in this case - which again violates the contract.
if an iterator blocked on hasNext then the iteration would never finish unless you explicitly broke out of it, this would be quite a strange design.
In any case the LinkedBlockingQueue javadoc has this to say
Returns an iterator over the elements in this queue in proper sequence.
The returned <tt>Iterator</tt> is a "weakly consistent" iterator that will
never throw {#link ConcurrentModificationException}, and guarantees to
traverse elements as they existed upon construction of the iterator, and
may (but is not guaranteed to) reflect any modifications subsequent to
construction.
I think that it may be reasonable under certain circumstances to have an Iterable whose iterator() will block, although having a seperate BlockingIteratorwould be foolish. The reason for this is because that lests you use an enhanced for loop, which can,in some cases, make your code cleaner. (If it would not accomplish that in your particular circumstance, do not do this at all.)
for(Request request:requests) process(request);
However, the iterator is still not free from a termination condition! The iterator should terminate once the queue has been closed to new items, and runs out of elements.
The issue still remains, though, that if the loop was already blocking on the iterator's next() method, the only way to exit if the queue is closed is to throw an exception, which the surrounding code would need to handle correctly, If you choose to do this, make sure you explain very clearly and precisely, how your implementation works in the javadoc comments.
The Iterator for LinkedBlockingQueue has this as its hasNext implementation:
private Node<E> current;
public boolean hasNext() {
return current != null;
}
so this will only work per call. You can wrap the method in a while(true) loop if you want to wait for elements and use the standard java Iterator idiom:
while (true) {
if(itr.hasNext()) {
System.out.println(itr.next());
}
}

Categories