Consumer-Producer with Threads and BlockingQueues

Consumer-Producer with Threads and BlockingQueues - java

I wrote a Class 'Producer' which is continuously parsing files from a specific folder. The parsed result will be stored in queue for the Consumer.
public class Producer extends Thread
{
private BlockingQueue<MyObject> queue;
...
public void run()
{
while (true)
{
//Store email attachments into directory
...
//Fill the queue
queue.put(myObject);
sleep(5*60*1000);
}
}
}
My Consumer Class is continuously checking if there is something available in the queue. If so, it's performing some work on the parsed result.
public class Consumer extends Thread
{
private BlockingQueue<MyObject> queue;
...
public void run()
{
while (true)
{
MyObject o = queue.poll();
// Work on MyObject 'o'
...
sleep(5*60*1000);
}
}
}
When I run my programm, 'top' shows that the JAVA process is always on 100%. I guess it's because of the infinite loops.
Is this a good way to implement this or is there a more resource saving way for doing this?

Instead of
MyObject o = queue.poll();
try
MyObject o = queue.take();
The latter will block until there is something available in the queue, whereas the former will always return immediately, whether or not something is available.

Related

How to execute Runnable/Thread in correct sequence

Imagine a datastream like this
A,A,B,A,C,C,C,A,B,A,A,A,B...
Now lets assume we have a StreamProcessor that will handle the stream. We can process A,B,C in parallel but individual As,Bs,Cs have to be processed in sequence.
Example:
Thread 1: Processes all As in sequence
Thread 2: Processes all Bs in sequence
and so on...
So for A,B,C I have a StreamProcessor (SP).
Each of the stream elements has a timestamp and thus can be ordered by time (It actually comes in the correct sequence). The elements have to be processed in time sequence.
So now I split up all my stream elements to their processors (SPA,SPB,SPC).
I have a TreeSet in ever SP where I add the elements.
So whenever there is a new element I basically do this:
public synchronized void onNewElementReceived(Element element) {
if (element== null) return;
treeSet.add(element);
if(treeSet.size()>30) logger.warn("There are many elements queueing up for processing");
threadPool.execute(() -> process(treeSet.first()));
}
private synchronized void process(Element element){
//Do the processing
}
This works fine if the stream is slow enough for process to terminate before there is the next element. But what if not? If there are more elements coming how can I make sure that the next element also is the next element that is going to be processed? In the end the operating system decides which Thread is fired when?
Edit: For clarity an example where this will fail:
Assume process() of A elements takes 1 second to execute. Now if the stream provides As faster then we can process them our treeSet will fill with elements of type A (I just realized it does not because we immediatly fetch it again, hmm another problem) anyway the main problem stays. If we receive elements every 100 ms for example we would request 10 executions of the process method, but the order would not be guaranteed anymore, because we do not know, which Runnable is going to be executed first by the system. We only ADDED them in the correct sequence but how to EXECUTE them in the correct sequence?
I could imagine just running a looper thread all the time fetching the first element of the queue and if there is none abort the process. Is that a good approach?

I would do it like this (PseudoCode-Like):
abstract class StreamProcessor extends Thread{
private ThreadSafeList<Element> elements;
void add(Element e) {
elements.addAtEnd(e);
}
#Override
public void run() {
while(hasNotFinished()) {
//If list has element, return the first element and remove it from the list, otherwise block until one is there and then return the first element and remove it.
Element e = elements.blockingRemoveFirst();
this.workWith(e);
}
}
abstract void workWith(Element e);
}
class StreamProcessorA extends StreamProcessor {
#Override
public void workWith(Element e) {
//Do something
}
}
class StreamProcessorB extends StreamProcessor {
#Override
public void workWith(Element e) {
//Do something
}
}
class StreamProcessorC extends StreamProcessor {
#Override
public void workWith(Element e) {
//Do something
}
}
class ElementReceiver {
private StreamProcessor A;
private StreamProcessor B;
private StreamProcessor C;
public synchronized void onNewElementReceived(Element e) {
if(e.type() /*Whatever*/ == ElementType.A) {
A.add(e);
}else if(e.type() == ElementType.B) {
B.add(e);
}else {
C.add(e);
}
}
}
This code consists of four threads.
The first thread receives the Element from some unspecified data source.
If this thread receives one, it checks what type it is (A,B or C).
Each of these types has a corresponding StreamProcessor. The onNewElementReceived will add the received element to the working set of the corresponding StreamProcessor.
Each of these StreamProcessor threads checks until they are for example killed and blocks until it has got an Element and then the method workWith is called that has to be implemented by each subclass.

flink SourceFunction<> is being replaced in StreamExecutionEnvironment.addSource()?

I ran into this problem when I was trying to create a custom source of event. Which contains a queue that allow my other process to add items into it. Then expect my CEP pattern to print some debug messages when there is a match.
But there is no match no matter what I add to the queue. Then I notice that the queue inside mySource.run() is always empty. Which means the queue I used to create the mySource instance is not the same as the one inside StreamExecutionEnvironment. If I change the queue to static, force all instances to share the same queue, everything works as expected.
DummySource.java
public class DummySource implements SourceFunction<String> {
private static final long serialVersionUID = 3978123556403297086L;
// private static Queue<String> queue = new LinkedBlockingQueue<String>();
private Queue<String> queue;
private boolean cancel = false;
public void setQueue(Queue<String> q){
queue = q;
}
#Override
public void run(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<String> ctx)
throws Exception {
System.out.println("run");
synchronized (queue) {
while (!cancel) {
if (queue.peek() != null) {
String e = queue.poll();
if (e.equals("exit")) {
cancel();
}
System.out.println("collect "+e);
ctx.collectWithTimestamp(e, System.currentTimeMillis());
}
}
}
}
#Override
public void cancel() {
System.out.println("canceled");
cancel = true;
}
}
So I dig into the source code of StreamExecutionEnvironment. Inside the addSource() method. There is a clean() method which looks like it replaces the instance to a new one.
Returns a "closure-cleaned" version of the given function.
Why is that? and Why it needs to be serialize?
I've also try to turn off the clean closure using getConfig(). The result is still the same. My queue instance is not the same one which env is using.
How do I solve this problem?

The clean() method used on functions in Flink is mainly to ensure the Function(like SourceFunction, MapFunction) serialisable. Flink will serialise those functions and distribute them onto task nodes to execute them.
For simple variables in your Flink main code, like int, you can simply reference them in your function. But for the large or not-serialisable ones, better using broadcast and rich source function. Please refer to https://cwiki.apache.org/confluence/display/FLINK/Variables+Closures+vs.+Broadcast+Variables

Listening to a jms queue and processing only 10 messages at a time

I have a javax.jms.Queue queue and have my listener listening to this queue. I get the message(a String) and execute a process passing the string as an input parameter to that process.
I want to just run 10 instances of that process running at one time. Once those are finished then only next messages should be processed.
How it can be achieved? As it reads all the message at once and runs as many instances of that process running, causing the server to be hanged.
// using javax.jms.MessageListener
message = consumer.receive(5000);
if (message != null) {
try {
handler.onMessage(message); //handler is MessageListener instance
}
}

Try to put this annotation on your mdb listener:
#ActivationConfigProperty(propertyName = "maxSession", propertyValue = "10")

I am assuming that you have a way of accepting hasTerminated messages from your external processes. This controller thread will communicate with the JMS listener using a Semaphore. The Semaphore is initialized with 10 permits, and every time an external process calls TerminationController#terminate (or however the external processes communicate with your listener process) it adds a permit to the Semaphore, and then JMSListener must first acquire a permit before it can call messageConsumer.release() which ensures that no more than ten processes can be active at a time.
// created in parent class
private final Semaphore semaphore = new Semaphore(10);
#Controller
public class TerminationController {
private final semaphore;
public TerminationController(Semaphore semaphore) {
this.semaphore = semaphore;
}
// Called from external processes when they terminate
public void terminate() {
semaphore.release();
}
}
public class JMSListener implements Runnable {
private final MessageConsumer messageConsumer;
private final Semaphore semaphore;
public JMSListener(MessageConsumer messageConsumer, Semaphore semaphore) {
this.messageConsumer = messageConsumer;
this.semaphore = semaphore;
}
public void run() {
while(true) {
semaphore.acquire();
Message message = messageConsumer.receive();
// create process from message
}
}
}

I think a simple while check would suffice. Here's some Pseudocode.
While (running processes are less than 10) {
add one to the running processes list
do something with the message
}
and in the code for onMessage:
function declaration of on Message(Parameters) {
do something
subtract 1 from the running processes list
}
Make sure that the variable you're using to count the amount of running processes is declared as volatile.
Example as requested:
public static volatile int numOfProcesses = 0;
while (true) {
if (numOfProcesses < 10) {
// read a message and make a new process, etc
// probably put your receive code here
numOfProcesses++;
}
}
Wherever your the code for your processes is written:
// do stuff, do stuff, do more stuff
// finished stuff
numOfProcesses--;

Java - Multiple queue producer consumer

I've got the following code:
while(!currentBoard.boardIsValid()){
for (QueueLocation location : QueueLocation.values()){
while(!inbox.isEmpty(location)){
Cell c = inbox.dequeue(location);
notifyNeighbours(c.x, c.y, c.getCurrentState(),previousBoard);
}
}
}
I've got a consumer with a few queues (all of their methods are synchronised). One queue for each producer. The consumer loops over all the queues and checks if they've got a task for him to consume.
If the queue he's checking has a task in it, he consumes it. Otherwise, he goes to the check the next queue until he finishes iterating over all the queues.
As of now, if he iterates over all the queues and they're all empty, he keeps on looping rather than waiting for one of them to contain something (as seen by the outer while).
How can I make the consumer wait until one of the queues has something in it?
I'm having an issue with the following scenario: Lets say there are only 2 queues. The consumer checked the first one and it was empty. Just as he's checking the second one (which is also empty), the producer put something in the first queue. As far as the consumer is concerned, the queues are both empty and so he should wait (even though one of them isn't empty anymore and he should continue looping).
Edit:
One last thing. This is an exercise for me. I'm trying to implement the synchronisation myself. So if any of the java libraries have a solution that implements this I'm not interested in it. I'm trying to understand how I can implement this.

#Abe was close. I would use signal and wait - use the Object class built-ins as they are the lightest weight.
Object sync = new Object(); // Can use an existing object if there's an appropriate one
// On submit to queue
synchronized ( sync ) {
queue.add(...); // Must be inside to avoid a race condition
sync.notifyAll();
}
// On check for work in queue
synchronized ( sync ) {
item = null;
while ( item == null ) {
// Need to check all of the queues - if there will be a large number, this will be slow,
// and slow critical sections (synchronized blocks) are very bad for performance
item = getNextQueueItem();
if ( item == null ) {
sync.wait();
}
}
}
Note that sync.wait releases the lock on sync until the notify - and the lock on sync is required to successfully call the wait method (it's a reminder to the programmer that some type of critical section is really needed for this to work reliably).
By the way, I would recommend a queue dedicated to the consumer (or group of consumers) rather than a queue dedicated to the producer, if feasible. It will simplify the solution.

If you want to block across multiple queues, then one option is to use java's Lock and Condition objects and then use the signal method.
So whenever the producer has data, it should invoke the signallAll.
Lock fileLock = new ReentrantLock();
Condition condition = fileLock.newCondition();
...
// producer has to signal
condition.signalAll();
...
// consumer has to await.
condition.await();
This way only when the signal is provided will the consumer go and check the queues.

I solved a similar situation along the lines of what #Abe suggests, but settled on using a Semaphore in combination with an AtomicBoolean and called it a BinarySemaphore. It does require the producers to be modified so that they signal when there is something to do.
Below the code for the BinarySemaphore and a general idea of what the consumer work-loop should look like:
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;
public class MultipleProdOneConsumer {
BinarySemaphore workAvailable = new BinarySemaphore();
class Consumer {
volatile boolean stop;
void loop() {
while (!stop) {
doWork();
if (!workAvailable.tryAcquire()) {
// waiting for work
try {
workAvailable.acquire();
} catch (InterruptedException e) {
if (!stop) {
// log error
}
}
}
}
}
void doWork() {}
void stopWork() {
stop = true;
workAvailable.release();
}
}
class Producer {
/* Must be called after work is added to the queue/made available. */
void signalSomethingToDo() {
workAvailable.release();
}
}
class BinarySemaphore {
private final AtomicBoolean havePermit = new AtomicBoolean();
private final Semaphore sync;
public BinarySemaphore() {
this(false);
}
public BinarySemaphore(boolean fair) {
sync = new Semaphore(0, fair);
}
public boolean release() {
boolean released = havePermit.compareAndSet(false, true);
if (released) {
sync.release();
}
return released;
}
public boolean tryAcquire() {
boolean acquired = sync.tryAcquire();
if (acquired) {
havePermit.set(false);
}
return acquired;
}
public boolean tryAcquire(long timeout, TimeUnit tunit) throws InterruptedException {
boolean acquired = sync.tryAcquire(timeout, tunit);
if (acquired) {
havePermit.set(false);
}
return acquired;
}
public void acquire() throws InterruptedException {
sync.acquire();
havePermit.set(false);
}
public void acquireUninterruptibly() {
sync.acquireUninterruptibly();
havePermit.set(false);
}
}
}

Java LinkedBlockingQueue with ability to signal when done?

I have a situation of a single producer and single consumer working with a queue of objects. There are two situations when the queue might be empty:
The consumer handled the objects quicker than the producer was capable of generating new objects (producer uses I/O before generating objects).
The producer is done generating objects.
If the queue is empty, I want the consumer to wait until a new object is available or until the producer signals that it is done.
My research so far got me no where because I still ended up with a loop that checks both the queue and a separate boolean flag (isDone). Given that there's no way of waiting on multiple locks (thought of waiting on the queue AND the flag), what can be done to solve this?

First of all, the suggestion that using a wrapper is "too much overhead" is a guess, and IMO a very bad one. This assumption should be measured with a performance test with actual requirements. If and only if the test fails, then verify using a profiler that wrapping the queue object is why.
Still if you do that and wrapping the queue object (in this case a String) really is the cause of unacceptable performance, then you can use this technique: create a known, unique string to serve as an "end of messages" message.
public static final String NO_MORE_MESSAGES = UUID.randomUUID().toString();
Then when retrieving Strings from the queue, just check (it can be an reference check) if the String is NO_MORE_MESSAGES. If so, then you're done processing.

Simple. Define a special object that the producer can send to signal "done".

One option is to wrap your data in a holder object, which can be used to signal the end of processing.
For example:
public class QueueMessage {
public MessageType type;
public Object thingToWorkOn;
}
where MessageType is an enum defining a "work" message or a "shutdown" message.

You could use LinkedBlockingQueues poll(long timeout, TimeUnit unit) -method in the consumer, and if it returns null (the timout elapsed), check the boolean flag. Another way would be passing some special "EndOfWork"-object into the queue as the last one, so the consumer knows that it's the end of work.
Yet another way would be interrupting the consumer thread from the producer thread, but this would require the producer thread to be aware of the consumer. If they both would be implemented as nested classes, you could use the parent class to hold a boolean running-value, which both could access, and terminate both threads with single boolean.

The following option has been raised too (not sure if this should be in an answer to myself but couldn't find a better place to write this):
Create a wrapper for the queue. This wrapper will have a monitor that will be waited on when reading by the consumer and will be notified by the producer whenever either a new object is added or the flag of isDone is raised.
When the consumer reads objects from the queue, these objects will be wrapped with something similar to what #yann-ramin suggested above. To reduce overhead though, the consumer will provide a single, reusable, instance of QueueMessage upon every read call (it will always be the same instance). The queue wrapper will update the fields accordingly before returning the instance to the consumer.
This avoids any use of timeouts, sleeps, etc.
EDITED
This is a proposed implementation:
/**
* This work queue is designed to be used by ONE producer and ONE consumer
* (no more, no less of neither). The work queue has certain added features, such
* as the ability to signal that the workload generation is done and nothing will be
* added to the queue.
*
* #param <E>
*/
public class DefiniteWorkQueue<E> {
private final E[] EMPTY_E_ARRAY;
private LinkedBlockingQueue<E> underlyingQueue = new LinkedBlockingQueue<E>();
private boolean isDone = false;
// This monitor allows for flagging when a change was done.
private Object changeMonitor = new Object();
public DefiniteWorkQueue(Class<E> clazz) {
// Reuse this instance, makes calling toArray easier
EMPTY_E_ARRAY = (E[]) Array.newInstance(clazz, 0);
}
public boolean isDone() {
return isDone;
}
public void setIsDone() {
synchronized (changeMonitor) {
isDone = true;
changeMonitor.notifyAll();
}
}
public int size() {
return underlyingQueue.size();
}
public boolean isEmpty() {
return underlyingQueue.isEmpty();
}
public boolean contains(E o) {
return underlyingQueue.contains(o);
}
public Iterator<E> iterator() {
return underlyingQueue.iterator();
}
public E[] toArray() {
// The array we create is too small on purpose, the underlying
// queue will extend it as needed under a lock
return underlyingQueue.toArray(EMPTY_E_ARRAY);
}
public boolean add(E o) {
boolean retval;
synchronized (changeMonitor) {
retval = underlyingQueue.add(o);
if (retval)
changeMonitor.notifyAll();
}
return retval;
}
public boolean addAll(Collection<? extends E> c) {
boolean retval;
synchronized (changeMonitor) {
retval = underlyingQueue.addAll(c);
if (retval)
changeMonitor.notifyAll();
}
return retval;
}
public void remove(RemovalResponse<E> responseWrapper) throws InterruptedException {
synchronized (changeMonitor) {
// If there's nothing in the queue but it has not
// ended yet, wait for someone to add something.
if (isEmpty() && !isDone())
changeMonitor.wait();
// When we get here, we've been notified or
// the current underlying queue's state is already something
// we can respond about.
if (!isEmpty()) {
responseWrapper.type = ResponseType.ITEM;
responseWrapper.item = underlyingQueue.remove();
} else if (isDone()) {
responseWrapper.type = ResponseType.IS_DONE;
responseWrapper.item = null;
} else {
// This should not happen
throw new IllegalStateException(
"Unexpected state where a notification of change was made but " +
"nothing is in the queue and work is not done.");
}
}
}
public static class RemovalResponse<E> {
public enum ResponseType {
/**
* Used when the response contains the first item of the queue.
*/
ITEM,
/**
* Used when the work load is done and nothing new will arrive.
*/
IS_DONE
};
private ResponseType type;
private E item;
public ResponseType getType() {
return type;
}
public void setType(ResponseType type) {
this.type = type;
}
public E getItem() {
return item;
}
public void setItem(E item) {
this.item = item;
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Consumer-Producer with Threads and BlockingQueues - java

Instead of MyObject o = queue.poll(); try MyObject o = queue.take(); The latter will block until there is something available in the queue, whereas the former will always return immediately, whether or not something is available.

Related

How to execute Runnable/Thread in correct sequence

flink SourceFunction<> is being replaced in StreamExecutionEnvironment.addSource()?

Listening to a jms queue and processing only 10 messages at a time

Java - Multiple queue producer consumer

Java LinkedBlockingQueue with ability to signal when done?

Categories

Resources