I have a situation of a single producer and single consumer working with a queue of objects. There are two situations when the queue might be empty:
The consumer handled the objects quicker than the producer was capable of generating new objects (producer uses I/O before generating objects).
The producer is done generating objects.
If the queue is empty, I want the consumer to wait until a new object is available or until the producer signals that it is done.
My research so far got me no where because I still ended up with a loop that checks both the queue and a separate boolean flag (isDone). Given that there's no way of waiting on multiple locks (thought of waiting on the queue AND the flag), what can be done to solve this?
First of all, the suggestion that using a wrapper is "too much overhead" is a guess, and IMO a very bad one. This assumption should be measured with a performance test with actual requirements. If and only if the test fails, then verify using a profiler that wrapping the queue object is why.
Still if you do that and wrapping the queue object (in this case a String) really is the cause of unacceptable performance, then you can use this technique: create a known, unique string to serve as an "end of messages" message.
public static final String NO_MORE_MESSAGES = UUID.randomUUID().toString();
Then when retrieving Strings from the queue, just check (it can be an reference check) if the String is NO_MORE_MESSAGES. If so, then you're done processing.
Simple. Define a special object that the producer can send to signal "done".
One option is to wrap your data in a holder object, which can be used to signal the end of processing.
For example:
public class QueueMessage {
public MessageType type;
public Object thingToWorkOn;
}
where MessageType is an enum defining a "work" message or a "shutdown" message.
You could use LinkedBlockingQueues poll(long timeout, TimeUnit unit) -method in the consumer, and if it returns null (the timout elapsed), check the boolean flag. Another way would be passing some special "EndOfWork"-object into the queue as the last one, so the consumer knows that it's the end of work.
Yet another way would be interrupting the consumer thread from the producer thread, but this would require the producer thread to be aware of the consumer. If they both would be implemented as nested classes, you could use the parent class to hold a boolean running-value, which both could access, and terminate both threads with single boolean.
The following option has been raised too (not sure if this should be in an answer to myself but couldn't find a better place to write this):
Create a wrapper for the queue. This wrapper will have a monitor that will be waited on when reading by the consumer and will be notified by the producer whenever either a new object is added or the flag of isDone is raised.
When the consumer reads objects from the queue, these objects will be wrapped with something similar to what #yann-ramin suggested above. To reduce overhead though, the consumer will provide a single, reusable, instance of QueueMessage upon every read call (it will always be the same instance). The queue wrapper will update the fields accordingly before returning the instance to the consumer.
This avoids any use of timeouts, sleeps, etc.
EDITED
This is a proposed implementation:
/**
* This work queue is designed to be used by ONE producer and ONE consumer
* (no more, no less of neither). The work queue has certain added features, such
* as the ability to signal that the workload generation is done and nothing will be
* added to the queue.
*
* #param <E>
*/
public class DefiniteWorkQueue<E> {
private final E[] EMPTY_E_ARRAY;
private LinkedBlockingQueue<E> underlyingQueue = new LinkedBlockingQueue<E>();
private boolean isDone = false;
// This monitor allows for flagging when a change was done.
private Object changeMonitor = new Object();
public DefiniteWorkQueue(Class<E> clazz) {
// Reuse this instance, makes calling toArray easier
EMPTY_E_ARRAY = (E[]) Array.newInstance(clazz, 0);
}
public boolean isDone() {
return isDone;
}
public void setIsDone() {
synchronized (changeMonitor) {
isDone = true;
changeMonitor.notifyAll();
}
}
public int size() {
return underlyingQueue.size();
}
public boolean isEmpty() {
return underlyingQueue.isEmpty();
}
public boolean contains(E o) {
return underlyingQueue.contains(o);
}
public Iterator<E> iterator() {
return underlyingQueue.iterator();
}
public E[] toArray() {
// The array we create is too small on purpose, the underlying
// queue will extend it as needed under a lock
return underlyingQueue.toArray(EMPTY_E_ARRAY);
}
public boolean add(E o) {
boolean retval;
synchronized (changeMonitor) {
retval = underlyingQueue.add(o);
if (retval)
changeMonitor.notifyAll();
}
return retval;
}
public boolean addAll(Collection<? extends E> c) {
boolean retval;
synchronized (changeMonitor) {
retval = underlyingQueue.addAll(c);
if (retval)
changeMonitor.notifyAll();
}
return retval;
}
public void remove(RemovalResponse<E> responseWrapper) throws InterruptedException {
synchronized (changeMonitor) {
// If there's nothing in the queue but it has not
// ended yet, wait for someone to add something.
if (isEmpty() && !isDone())
changeMonitor.wait();
// When we get here, we've been notified or
// the current underlying queue's state is already something
// we can respond about.
if (!isEmpty()) {
responseWrapper.type = ResponseType.ITEM;
responseWrapper.item = underlyingQueue.remove();
} else if (isDone()) {
responseWrapper.type = ResponseType.IS_DONE;
responseWrapper.item = null;
} else {
// This should not happen
throw new IllegalStateException(
"Unexpected state where a notification of change was made but " +
"nothing is in the queue and work is not done.");
}
}
}
public static class RemovalResponse<E> {
public enum ResponseType {
/**
* Used when the response contains the first item of the queue.
*/
ITEM,
/**
* Used when the work load is done and nothing new will arrive.
*/
IS_DONE
};
private ResponseType type;
private E item;
public ResponseType getType() {
return type;
}
public void setType(ResponseType type) {
this.type = type;
}
public E getItem() {
return item;
}
public void setItem(E item) {
this.item = item;
}
}
}
Related
I ran into this problem when I was trying to create a custom source of event. Which contains a queue that allow my other process to add items into it. Then expect my CEP pattern to print some debug messages when there is a match.
But there is no match no matter what I add to the queue. Then I notice that the queue inside mySource.run() is always empty. Which means the queue I used to create the mySource instance is not the same as the one inside StreamExecutionEnvironment. If I change the queue to static, force all instances to share the same queue, everything works as expected.
DummySource.java
public class DummySource implements SourceFunction<String> {
private static final long serialVersionUID = 3978123556403297086L;
// private static Queue<String> queue = new LinkedBlockingQueue<String>();
private Queue<String> queue;
private boolean cancel = false;
public void setQueue(Queue<String> q){
queue = q;
}
#Override
public void run(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<String> ctx)
throws Exception {
System.out.println("run");
synchronized (queue) {
while (!cancel) {
if (queue.peek() != null) {
String e = queue.poll();
if (e.equals("exit")) {
cancel();
}
System.out.println("collect "+e);
ctx.collectWithTimestamp(e, System.currentTimeMillis());
}
}
}
}
#Override
public void cancel() {
System.out.println("canceled");
cancel = true;
}
}
So I dig into the source code of StreamExecutionEnvironment. Inside the addSource() method. There is a clean() method which looks like it replaces the instance to a new one.
Returns a "closure-cleaned" version of the given function.
Why is that? and Why it needs to be serialize?
I've also try to turn off the clean closure using getConfig(). The result is still the same. My queue instance is not the same one which env is using.
How do I solve this problem?
The clean() method used on functions in Flink is mainly to ensure the Function(like SourceFunction, MapFunction) serialisable. Flink will serialise those functions and distribute them onto task nodes to execute them.
For simple variables in your Flink main code, like int, you can simply reference them in your function. But for the large or not-serialisable ones, better using broadcast and rich source function. Please refer to https://cwiki.apache.org/confluence/display/FLINK/Variables+Closures+vs.+Broadcast+Variables
Is there any way i can have both in one structure -
Semantics of BlockingQueue, ie - non blocking peek, blocking poll and blocking put. Multiple providers one consumer.
RingBuffer, which effectively works as an object pool, so instead of putting new object in ring buffer, i want to reuse existing object there, copying the state. So basically the functionality LMAX disruptor has out of the box.
Is there something which works like that already?
I guess i can try and use Disruptor for that, i already can use it as a blocking queue with blocking put(if the ring buffer is "full") if i understand correctly. It already has the "reusable objects" semantics i need. So the only problem is how to create a client which would be able to PULL objects(instead of using callbacks), so as i'm not really familiar with internal Disruptor structure - can it be done? With all those sequencers, creating a new EventProcessor or something like that?
And no, the obvious solution of having a blocking queue on a client side and getting from it is not an ideal solution, as it breaks the whole point of using the disruptor object pool - you'll need to have a new pool now, or just create a new objects in the callback before putting in that blocking queue etc, and i don't want to have any garbage created at all.
So is there a way to achieve it with Disruptor, or any other performance oriented/garbage free java library?
We open sourced Conversant Diruptor which includes DiruptorBlockingQueue earlier this year. You can find the code on github
Conversant Disruptor is trivial to include in almost any project because it supports the BlockingQueue api and is published on Maven Central.
For curious, i haven't been able to get a "blocking pull" semantics from the Disruptor itself, but of course it's trivial to add "blocking" functionality to the non-blocking pull. "Peek" functionality by itself is possible but not efficient(you need to copy the item again and again on each peek) and can be replaced by just caching the results of "poll".
So, the minimal raw solution, implemented only the methods i need:
public class DisruptorMPSCQueue<T extends ICopyable<T>> {
private final RingBuffer<T> ringBuffer;
private final EventPoller<T> eventPoller;
private T tempPolledEvent;
private EventPoller.Handler<T> pollerHandler = new EventPoller.Handler<T>() {
#Override
public boolean onEvent(final T event, final long sequence, final boolean endOfBatch) throws Exception {
tempPolledEvent.copyFrom(event);
return false;
}
};
public DisruptorMPSCQueue(EventFactory<T> typeConstructor, int size) {
ringBuffer = RingBuffer.createMultiProducer(typeConstructor, size);
eventPoller = ringBuffer.newPoller();
ringBuffer.addGatingSequences(eventPoller.getSequence());
}
/**
* Blocking, can be called from any thread, the event will be copied to the ringBuffer
*/
public void put(final T event) {
long sequence = ringBuffer.next(); // blocked by ringBuffer's gatingSequence
ringBuffer.get(sequence).copyFrom(event);
ringBuffer.publish(sequence);
}
/**
* Not blocking, can be called from any thread, the event will be copied to the ringBuffer
*
* #throws IllegalStateException if the element cannot be added at this time due to capacity restrictions
*/
public void offer(final T event) {
long sequence;
try {
sequence = ringBuffer.tryNext();
} catch (InsufficientCapacityException e) {
throw new IllegalStateException(e); // to mimic blockingQueue
}
ringBuffer.get(sequence).copyFrom(event);
ringBuffer.publish(sequence);
}
/**
* Retrieve top of the queue(removes from the queue). NOT thread-safe, can be called from one thread only.
*
* #param destination top of the queue will be copied to destination
* #return destination object or null if the queue is empty
*/
public T poll(final T destination) {
try {
tempPolledEvent = destination; // yea, the poller usage is a bit dumb
EventPoller.PollState poll = eventPoller.poll(pollerHandler);
if (poll == EventPoller.PollState.PROCESSING) {
return tempPolledEvent;
} else {
return null;
}
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
I wrote a Class 'Producer' which is continuously parsing files from a specific folder. The parsed result will be stored in queue for the Consumer.
public class Producer extends Thread
{
private BlockingQueue<MyObject> queue;
...
public void run()
{
while (true)
{
//Store email attachments into directory
...
//Fill the queue
queue.put(myObject);
sleep(5*60*1000);
}
}
}
My Consumer Class is continuously checking if there is something available in the queue. If so, it's performing some work on the parsed result.
public class Consumer extends Thread
{
private BlockingQueue<MyObject> queue;
...
public void run()
{
while (true)
{
MyObject o = queue.poll();
// Work on MyObject 'o'
...
sleep(5*60*1000);
}
}
}
When I run my programm, 'top' shows that the JAVA process is always on 100%. I guess it's because of the infinite loops.
Is this a good way to implement this or is there a more resource saving way for doing this?
Instead of
MyObject o = queue.poll();
try
MyObject o = queue.take();
The latter will block until there is something available in the queue, whereas the former will always return immediately, whether or not something is available.
I've got the following code:
while(!currentBoard.boardIsValid()){
for (QueueLocation location : QueueLocation.values()){
while(!inbox.isEmpty(location)){
Cell c = inbox.dequeue(location);
notifyNeighbours(c.x, c.y, c.getCurrentState(),previousBoard);
}
}
}
I've got a consumer with a few queues (all of their methods are synchronised). One queue for each producer. The consumer loops over all the queues and checks if they've got a task for him to consume.
If the queue he's checking has a task in it, he consumes it. Otherwise, he goes to the check the next queue until he finishes iterating over all the queues.
As of now, if he iterates over all the queues and they're all empty, he keeps on looping rather than waiting for one of them to contain something (as seen by the outer while).
How can I make the consumer wait until one of the queues has something in it?
I'm having an issue with the following scenario: Lets say there are only 2 queues. The consumer checked the first one and it was empty. Just as he's checking the second one (which is also empty), the producer put something in the first queue. As far as the consumer is concerned, the queues are both empty and so he should wait (even though one of them isn't empty anymore and he should continue looping).
Edit:
One last thing. This is an exercise for me. I'm trying to implement the synchronisation myself. So if any of the java libraries have a solution that implements this I'm not interested in it. I'm trying to understand how I can implement this.
#Abe was close. I would use signal and wait - use the Object class built-ins as they are the lightest weight.
Object sync = new Object(); // Can use an existing object if there's an appropriate one
// On submit to queue
synchronized ( sync ) {
queue.add(...); // Must be inside to avoid a race condition
sync.notifyAll();
}
// On check for work in queue
synchronized ( sync ) {
item = null;
while ( item == null ) {
// Need to check all of the queues - if there will be a large number, this will be slow,
// and slow critical sections (synchronized blocks) are very bad for performance
item = getNextQueueItem();
if ( item == null ) {
sync.wait();
}
}
}
Note that sync.wait releases the lock on sync until the notify - and the lock on sync is required to successfully call the wait method (it's a reminder to the programmer that some type of critical section is really needed for this to work reliably).
By the way, I would recommend a queue dedicated to the consumer (or group of consumers) rather than a queue dedicated to the producer, if feasible. It will simplify the solution.
If you want to block across multiple queues, then one option is to use java's Lock and Condition objects and then use the signal method.
So whenever the producer has data, it should invoke the signallAll.
Lock fileLock = new ReentrantLock();
Condition condition = fileLock.newCondition();
...
// producer has to signal
condition.signalAll();
...
// consumer has to await.
condition.await();
This way only when the signal is provided will the consumer go and check the queues.
I solved a similar situation along the lines of what #Abe suggests, but settled on using a Semaphore in combination with an AtomicBoolean and called it a BinarySemaphore. It does require the producers to be modified so that they signal when there is something to do.
Below the code for the BinarySemaphore and a general idea of what the consumer work-loop should look like:
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;
public class MultipleProdOneConsumer {
BinarySemaphore workAvailable = new BinarySemaphore();
class Consumer {
volatile boolean stop;
void loop() {
while (!stop) {
doWork();
if (!workAvailable.tryAcquire()) {
// waiting for work
try {
workAvailable.acquire();
} catch (InterruptedException e) {
if (!stop) {
// log error
}
}
}
}
}
void doWork() {}
void stopWork() {
stop = true;
workAvailable.release();
}
}
class Producer {
/* Must be called after work is added to the queue/made available. */
void signalSomethingToDo() {
workAvailable.release();
}
}
class BinarySemaphore {
private final AtomicBoolean havePermit = new AtomicBoolean();
private final Semaphore sync;
public BinarySemaphore() {
this(false);
}
public BinarySemaphore(boolean fair) {
sync = new Semaphore(0, fair);
}
public boolean release() {
boolean released = havePermit.compareAndSet(false, true);
if (released) {
sync.release();
}
return released;
}
public boolean tryAcquire() {
boolean acquired = sync.tryAcquire();
if (acquired) {
havePermit.set(false);
}
return acquired;
}
public boolean tryAcquire(long timeout, TimeUnit tunit) throws InterruptedException {
boolean acquired = sync.tryAcquire(timeout, tunit);
if (acquired) {
havePermit.set(false);
}
return acquired;
}
public void acquire() throws InterruptedException {
sync.acquire();
havePermit.set(false);
}
public void acquireUninterruptibly() {
sync.acquireUninterruptibly();
havePermit.set(false);
}
}
}
I have an BlockingQueue<Runnable>(taken from ScheduledThreadPoolExecutor) in producer-consumer environment. There is one thread adding tasks to the queue, and a thread pool executing them.
I need notifications on two events:
First item added to empty queue
Last item removed from queue
Notification = writing a message to database.
Is there any sensible way to implement that?
A simple and naïve approach would be to decorate your BlockingQueue with an implementation that simply checks the underlying queue and then posts a task to do the notification.
NotifyingQueue<T> extends ForwardingBlockingQueue<T> implements BlockingQueue<T> {
private final Notifier notifier; // injected not null
…
#Override public void put(T element) {
if (getDelegate().isEmpty()) {
notifier.notEmptyAnymore();
}
super.put(element);
}
#Override public T poll() {
final T result = super.poll();
if ((result != null) && getDelegate().isEmpty())
notifier.nowEmpty();
}
… etc
}
This approach though has a couple of problems. While the empty -> notEmpty is pretty straightforward – particularly for a single producer case, it would be easy for two consumers to run concurrently and both see the queue go from non-empty -> empty.
If though, all you want is to be notified that the queue became empty at some time, then this will be enough as long as your notifier is your state machine, tracking emptiness and non-emptiness and notifying when it changes from one to the other:
AtomicStateNotifier implements Notifier {
private final AtomicBoolean empty = new AtomicBoolean(true); // assume it starts empty
private final Notifier delegate; // injected not null
public void notEmptyAnymore() {
if (empty.get() && empty.compareAndSet(true, false))
delegate.notEmptyAnymore();
}
public void nowEmpty() {
if (!empty.get() && empty.compareAndSet(false, true))
delegate.nowEmpty();
}
}
This is now a thread-safe guard around an actual Notifier implementation that perhaps posts tasks to an Executor to asynchronously write the events to the database.
The design is most likely flawed but you can do it relatively simple:
You have a single thread adding, so you can check before adding. i.e. pool.getQueue().isEmpty() - w/ one producer, this is safe.
Last item removed cannot be guaranteed but you can override beforeExecute and check the queue again. Possibly w/ a small timeout after isEmpty() returns true. Probably the code below will be better off executed in afterExecute instead.
protected void beforeExecute(Thread t, Runnable r) {
if (getQueue().isEmpty()){
try{
Runnable r = getQueue().poll(200, TimeUnit.MILLISECONDS);
if (r!=null){
execute(r);
} else{
//last message - or on after execute by Setting a threadLocal and check it there
//alternatively you may need to do so ONLY in after execute, depending on your needs
}
}catch(InterruptedException _ie){
Thread.currentThread().interrupt();
}
}
}
sometime like that
I can explain why doing notifications w/ the queue itself won't work well: imagine you add a task to be executed by the pool, the task is scheduled immediately, the queue is empty again and you will need notification.