Producer Consumer pattern - handling producer failure

Producer Consumer pattern - handling producer failure - java

I have a producer/consumer pattern like the following
A fixed number of producer threads, each writing on to their own BlockingQueue, invoked via an Executor
A single consumer thread, reading off the producer threads
Each producer is running a database query and writing the results to its Queue. The consumer polls all the producer Queues. At the moment if there is a database error the producer thread dies and then the consumer gets stuck forever waiting for more results on the product Queue.
How should I be structuring this to handle catch errors correctly?

I once did a similar thing and decided to use a sentinel value that the dying producer thread would push into the queue from the catch-block. You can push the exception itself (this works in most scenarios), or have a special object for that. In any case it is great to push the exception to the consumer for debugging purposes.

Whatever class it is that you actually push onto the queue/s, it should contain success/fail/error members so that the consumer/s can check for fails.
Peter has already suggested using only one queue - I don't see how avoiding all that polling should be any particular problem - the objects on the queue can have members that identify which producer they came from, and any other metadata, if required.

It appears that the only option you have when a producer dies is to stop the consumer.
To do this you can use a poison pill. This is a special object which the producer adds when it stops and the consumer knows to stop when it receives it. The poison pill can be added into a finally block so it is always added no matter how the producer is killed/dies.
Given you have only one consumer, I would use one queue. This way your consumer will only block where all the producers have died.

You might add some timeout to kill the consumer when there are no more elements in the queue(s) for a certain time.
Another approach might be to have the producers maintain an "alive" flag and signal that they are dying by setting it to false. If the producers run continuously but might not always get results from the database the "alive" flag could be the time the producer reported to be alive the last time and then use the timeout to check whether the producer might have died (when the last report of being alive was too long ago).

Answering my own question.
I used the following class. It takes a list of Runnable and executes them all in parallel, if one fails, it interrupts all the others. Then I have interrupt handling in my producers and consumers to die gracefully when interrupted.
This works nicely for my case.
Thanks for all the comments/answers as they gave me some ideas.
// helper class that does the following
//
// if any thread has an exception then interrupt all the others with an eye to cancelling them
// if the thread calling execute() is interrupted then interrupt all the child threads
public class LinkedExecutor
{
private final Collection<Runnable> runnables;
private final String name;
public LinkedExecutor( String name, Collection<Runnable> runnables )
{
this.runnables = runnables;
this.name = name;
}
public void execute()
{
ExecutorService executorService = Executors.newCachedThreadPool( ConfigurableThreadFactory.newWithPrefix( name ) );
// use a completion service to poll the results
CompletionService<Object> completionService = new ExecutorCompletionService<Object>( executorService );
for ( Runnable runnable : runnables )
{
completionService.submit( runnable, null );
}
try
{
for ( int i = 0; i < runnables.size(); i++ )
{
Future<?> future = completionService.take();
future.get();
}
}
catch ( InterruptedException e )
{
// on an interruption of this thread interrupt all sub-threads in the executor
executorService.shutdownNow();
throw new RuntimeException( "Executor '" + name + "' interrupted", e );
}
catch ( ExecutionException e )
{
// on a failure of any of the sub-threads interrupt all the threads
executorService.shutdownNow();
throw new RuntimeException( "Execution execution in executor '" + name + "'", e );
}
}
}

Related

ThreadPoolTaskExecutor blocks feeder thread forever

I'm trying to have a single thread loading records (say from a database). This thread feeds records into a thread pool that processes these individual tasks.
I was expecting this code to work, but it prints number until 60 and then stops.
ThreadPoolTaskExecutor accountLoaderTaskExecutor = new ThreadPoolTaskExecutor();
accountLoaderTaskExecutor.setCorePoolSize(1);
accountLoaderTaskExecutor.setMaxPoolSize(1);
accountLoaderTaskExecutor.initialize();
ThreadPoolTaskExecutor accountDeletionTaskExecutor = new ThreadPoolTaskExecutor();
accountDeletionTaskExecutor.setCorePoolSize(10);
accountDeletionTaskExecutor.setMaxPoolSize(10);
accountDeletionTaskExecutor.setQueueCapacity(50);
accountDeletionTaskExecutor.initialize();
accountLoaderTaskExecutor.submit(() -> {
List<Integer> customerAccountIds = getCustomerAccountIds(); // return 1000s integers
customerAccountIds.forEach(id -> {
accountDeletionTaskExecutor.submit(() -> {
try {
System.out.println(id);
Thread.sleep(500);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
});
});
});
Thread.currentThread().join();
I was expecting the accountLoaderTaskExecutor thread to block on accountDeletionTaskExecutor.submit but then continue as records are being processed until it exhausts all customerAccountIds.

If that comment "return 1000s integers" means that you have thousands of ids, then the code you have written will queue thousands of tasks to the threadpool and then proceed to Thread.currentThread().join() which does absolutely nothing, because joining the current thread to itself is meaningless: you can only join different threads.
Then, I presume you exit the application, and the default behavior is probably to terminate all threadpools on application exit. (I am not sure about that, I am speculating.)
The 60 tasks that you observe getting started probably manage to start while the remaining thousands of tasks are being queued.
To verify that this is what is happening, you can try replacing that call to join() with a Thread.sleep( 1000 ) and see if you observe more tasks being started.
If that is the case, then one approach to solve your problem might be to add proper graceful threadpool shut-down (of the kind that waits for all queued tasks to complete first.)

How to pass a message from TimerTask to main thread?

I have a main client which keeps background timers for each peer. These timers run in a background thread, and in 30s (the timeout period) are scheduled to perform the task of marking the respective peer as offline. The block of code to do this is:
public void startTimer() {
timer = new Timer();
timer.schedule(new TimerTask() {
public void run() {
status = false;
System.out.println("Setting " + address.toString() + " status to offline");
// need to send failure message somehow
thread.sendMessage();
}
}, 5*1000);
}
Then, in the main program, I need some way to detect when the above timer task has been run, so that the main client can then send a failure message to all other peers, something like:
while (true)
if (msgFromThreadReceived)
notifyPeers();
How would I be able to accomplish this with TimerTask? As I understand, the timer is running in a separate thread, and I want to somehow pass a message to the main thread to notify the main thread that the task has been run.

I would have the class that handles the timers for the peers take a concurrent queue and place a message in the queue when the peer goes offline. Then the "main" thread can poll the queue(s) in an event-driven way, receiving and processing the messages.
Please note that this "main" thread MUST NOT be the event dispatch thread of a GUI framework. If there is something that needs to be updated in the GUI when the main thread receives the message, it can invoke another piece of code on the event dispatch thread upon reception of the message.
Two good choices for the queue would be ConcurrentLinkedQueue if the queue should be unbounded (the timer threads can put any number of messages in the queue before the main thread picks them up), or LinkedBlockingQueue if there should be a limit on the size of the queue, and if it gets too large, the timer threads have to wait before they can put another message on it (this is called backpressure, and can be important in distributed, concurrent systems, but may not be relevant in your case).
The idea here is to implement a version of the Actor Model (q.v.), in which nothing is shared between threads (actors), and any data that needs to be sent (which should be immutable) is passed between them. Each actor has an inbox in which it can receive messages and it acts upon them. Only, your timer threads probably don't need inboxes, if they take all their data as parameters to the constructor and don't need to receive any messages from the main thread after they're started.
public record PeerDownMessage(String peerName, int errorCode) {
}
public class PeerWatcher {
private final Peer peer;
private final BlockingQueue<PeerDownMessage> queue;
public PeerWatcher(Peer peer, BlockingQueue<PeerDownMessage> queue) {
this.peer = Objects.requireNonNull(peer);
this.queue = Objects.requireNonNull(queue);
}
public void startTimer() {
// . . .
// time to send failure message
queue.put(new PeerDownMessage(peer.getName(), error));
// . . .
}
}
public class Main {
public void eventLoop(List<Peer> peers) {
LinkedBlockingQueue<PeerDownMessage> inbox =
new LinkedBlockingQueue<>();
for (Peer peer : peers) {
PeerWatcher watcher = new PeerWatcher(peer, inbox);
watcher.startTimer();
}
while (true) {
PeerDownMessage message = inbox.take();
SwingWorker.invokeLater(() {
// suppose there is a map of labels for each peer
JLabel label = labels.get(message.peerName());
label.setText(message.peerName() +
" failed with error " + message.errorCode());
});
}
}
}
Notice that to update the GUI, we cause that action to be performed on yet another thread, the Swing Event Dispatch Thread, which must be different from our main thread.
There are big, complex frameworks you can use to implement the actor model, but the heart of it is this: nothing is shared between threads, so you never need to synchronize or make anything volatile, anything an actor needs it either receives as a parameter to its constructor or via its inbox (in this example, only the main thread has an inbox since the worker threads don't need to receive anything once they are started), and it is best to make everything immutable. I used a record instead of a class for the message, but you could use a regular class. Just make the fields final, set them in the constructor, and guarantee they can't be null, as in the PeerWatcher class.
I said the main thread can poll the "queue(s)," implying there could be more than one, but in this case they all send the same type of message, and they identify which peer the message is for in the message body. So I just gave every watcher a reference to the same inbox for the main thread. That's probably best. An actor should just have one inbox; if it needs to do multiple things, it should probably be multiple actors (that's the Erlang way, and that's where I've taken the inspiration for this from).
But if you really needed to have multiple queues, main could poll them like so:
while (true) {
for (LinkedBlockingQueue<PeerDownMessage> queue : queues) {
if (queue.peek() != null) {
PeerDownMessage message = queue.take();
handleMessageHowever(message);
}
}
}
But that's a lot of extra stuff you don't need. Stick to one inbox queue per actor, and then polling the inbox for messages to process is simple.
I initially wrote this to use ConcurrentLinkedQueue but I used put and take which are methods of BlockingQueue. I just changed it to use LinkedBlockingQueue but if you prefer ConcurrentLinkedQueue, you can use add and poll but on further consideration, I would really recommend BlockingQueue for the simplicity of its take() method; it lets you easily block while waiting for the next available item instead of busy waiting.

How to reconnect kafka producer once closed?

I have multi thread app which uses producer class to produce messages, earlier i was using below code to create producer for each request.where KafkaProducer was newly built with each request as below:
KafkaProducer<String, byte[]> producer = new KafkaProducer<String, byte[]>(prop);
ProducerRecord<String, byte[]> data = new ProducerRecord<String, byte[]>(topic, objBytes);
producer.send(data, new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
isValidMsg[0] = false;
exception.printStackTrace();
saveOrUpdateLog(msgBean, producerType, exception);
logger.error("ERROR:Unable to produce message.",exception);
}
}
});
producer.close();
Then I read Kafka docs on producer and come to know we should use single producer instance to have good performance.
Then I created single instance of KafkaProducer inside a singleton class.
Now when & where we should close the producer. Obviously if we close the producer after first send request it wont find the producer to resend messages hence throwing :
java.lang.IllegalStateException: Cannot send after the producer is closed.
OR how we can reconnect to producer once closed.
Problem is if program crashes or have exceptions then?

Generally, calling close() on the KafkaProducer is sufficient to make sure all inflight records have completed:
/**
* Close this producer. This method blocks until all previously sent requests complete.
* This method is equivalent to <code>close(Long.MAX_VALUE, TimeUnit.MILLISECONDS)</code>.
* <p>
* <strong>If close() is called from {#link Callback}, a warning message will be logged and close(0, TimeUnit.MILLISECONDS)
* will be called instead. We do this because the sender thread would otherwise try to join itself and
* block forever.</strong>
* <p>
*
* #throws InterruptException If the thread is interrupted while blocked
*/
If your producer is being used throughout the lifetime of your application, don't close it up until you get a termination signal, then call close(). As said in the documentation, the producer is safe to used in a multi-threaded environment and hence you should re-use the same instance.
If you're sharing your KafkaProducer in multiple threads, you have two choices:
Call close() while registering a shutdown callback via Runtime.getRuntime().addShutdownHook from your main execution thread
Have your multi-threaded methods race for closing on only allow for a single one to win.
A rough sketch of 2 would possibly look like this:
object KafkaOwner {
private var producer: KafkaProducer = ???
#volatile private var isClosed = false
def close(): Unit = {
if (!isClosed) {
kafkaProducer.close()
isClosed = true
}
}
def instance: KafkaProducer = {
this.synchronized {
if (!isClosed) producer
else {
producer = new KafkaProducer()
isClosed = false
}
}
}
}

As described in javadoc for KafkaProducer:
public void close()
Close this producer. This method blocks until all previously sent requests complete.
This method is equivalent to close(Long.MAX_VALUE, TimeUnit.MILLISECONDS).
src: https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#close()
So you don't need to worry that your messages won't be sent, even if you call close immediately after send.
If you plan to use a KafkaProducer more than once, then close it only after you've finished using it. If you still want to have the guarantee that your message is actually sent before your method completes and not waiting in a buffer, then use KafkaProducer#flush() which will block until current buffer is sent. You can also block on Future#get() if you prefer.
There is also one caveat to be aware of if you don't plan to ever close your KafkaProducer (e.g. in short-lived apps, where you just send some data and the app immediately terminates after sending). The KafkaProducer IO thread is a daemon thread, which means the JVM will not wait until this thread finishes to terminate the VM. So, to ensure that your messages are actually sent use KafkaProducer#flush(), no-arg KafkaProducer#close() or block on Future#get().

Kafka producer is supposed to be thread safe and frugal with it's thread pool. you might want to use
producer.flush();
instead of
producer.close();
leaving the producer open until program termination or until your sure you won't need it any more.
If you still want to close the producer, then recreate it on demand.
producer = new KafkaProducer<String, byte[]>(prop);

Best practice for interrupting threads that take longer than a threshold

I am using the Java ExecutorService framework to submit callable tasks for execution.
These tasks communicate with a web service and a web service timeout of 5 mins is applied.
However I've seen that in some cases the timeout is being ignored and thread 'hangs' on an API call - hence, I want to cancel all the tasks that take longer than say, 5 mins.
Currently, I have a list of futures and I iterate through them and call future.get until all tasks are complete. Now, I've seen that the future.get overloaded method takes a timeout and throws a timeout when the task doesnt complete in that window. So I thought of an approach where I do a future.get() with timeout and in case of TimeoutException I do a future.cancel(true) to make sure that this task is interrupted.
My main questions
1. Is the get with a timeout the best way to solve this issue?
2. Is there the possibility that I'm waiting with the get call on a task that hasnt yet been placed on the thread pool(isnt an active worker). In that case I may be terminating a thread that, when it starts may actually complete within the required time limit?
Any suggestions would be deeply appreciated.

Is the get with a timeout the best way to solve this issue?
This will not suffice. For instance, if your task is not designed to response to interruption, it will keep on running or be just blocked
Is there the possibility that I'm waiting with the get call on a task that hasnt yet been placed on the thread pool(isnt an active worker). In that case I may be terminating a thread that, when it starts may actually complete within the required time limit?
Yes, You might end up cancelling as task which is never scheduled to run if your thread-pool is not configured properly
Following code snippet could be one of the way you can make your task responsive to interruption when your task contains Non-interruptible Blocking. Also it does not cancel the task which are not scheduled to run. The idea here is to override interrupt method and close running tasks by say closing sockets, database connections etc. This code is not perfect and you need to make changes as per requirements, handle exceptions etc.
class LongRunningTask extends Thread {
private Socket socket;
private volatile AtomicBoolean atomicBoolean;
public LongRunningTask() {
atomicBoolean = new AtomicBoolean(false);
}
#Override
public void interrupt() {
try {
//clean up any resources, close connections etc.
socket.close();
} catch(Throwable e) {
} finally {
atomicBoolean.compareAndSet(true, false);
//set the interupt status of executing thread.
super.interrupt();
}
}
public boolean isRunning() {
return atomicBoolean.get();
}
#Override
public void run() {
atomicBoolean.compareAndSet(false, true);
//any long running task that might hang..for instance
try {
socket = new Socket("0.0.0.0", 5000);
socket.getInputStream().read();
} catch (UnknownHostException e) {
} catch (IOException e) {
} finally {
}
}
}
//your task caller thread
//map of futures and tasks
Map<Future, LongRunningTask> map = new HashMap<Future, LongRunningTask>();
ArrayList<Future> list = new ArrayList<Future>();
int noOfSubmittedTasks = 0;
for(int i = 0; i < 6; i++) {
LongRunningTask task = new LongRunningTask();
Future f = execService.submit(task);
map.put(f, task);
list.add(f);
noOfSubmittedTasks++;
}
while(noOfSubmittedTasks > 0) {
for(int i=0;i < list.size();i++) {
Future f = list.get(i);
LongRunningTask task = map.get(f);
if (task.isRunning()) {
/*
* This ensures that you process only those tasks which are run once
*/
try {
f.get(5, TimeUnit.MINUTES);
noOfSubmittedTasks--;
} catch (InterruptedException e) {
} catch (ExecutionException e) {
} catch (TimeoutException e) {
//this will call the overridden interrupt method
f.cancel(true);
noOfSubmittedTasks--;
}
}
}
}
execService.shutdown();

Is the get with a timeout the best way to solve this issue?
Yes it is perfectly fine to get(timeout) on a Future object, if the task that the future points to is already executed it will return immediately. If the task is yet to be executed or is being executed then it will wait until timeout and is a good practice.
Is there the possibility that I'm waiting with the get call on a task
that hasnt yet been placed on the thread pool(isnt an active worker)
You get Future object only when you place a task on the thread pool so it is not possible to call get() on a task without placing it on thread pool. Yes there is a possibility that the task has not yet been taken by a free worker.

The approach that you are talking about is ok. But most importantly before setting a threshold on the timeout you need to know what is the perfect value of thread pool size and timiout for your environment. Do a stress testing which will reveal whether the no of worker threads that you configured as part of Threadpool is fine or not. And this may even reduce the timeout value. So this test is most important i feel.
Timeout on get is perfectly fine but you should add to cancel the task if it throws TimeoutException. And if you do the above test properly and set your thread pool size and timeout value to ideal than you may not even need to cancel tasks externally (but you can have this as backup). And yes sometimes in canceling a task you may end up canceling a task which is not yet picked up by the Executor.

You can of course cancel a Task by using
task.cancel(true)
It is perfectly legal. But this will interrupt the thread if it is "RUNNING".
If the thread is waiting to acquire an intrinsic lock then the "interruption" request has no effect other than setting the thread's interrupted status. In this case you cannot do anything to stop it. For the interruption to happen, the thread should come out from the "blocked" state by acquiring the lock it was waiting for (which may take more than 5 mins). This is a limitation of using "intrinsic locking".
However you can use explicit lock classes to solve this problem. You can use "lockInterruptibly" method of the "Lock" interface to achieve this. "lockInterruptibly" will allow the thread to try to acquire a lock while remaining responsive to the interruption. Here is a small example to achieve that:
public void workWithExplicitLock()throws InterruptedException{
Lock lock = new ReentrantLock();
lock.lockInterruptibly()();
try {
// work with shared object state
} finally {
lock.unlock();
}
}

How to handle RejectedExecutionException with ThreadPoolExecutor in java

What is the best way to handle RejectedExecutionException while using a ThreadPoolExecutor in Java?
I want to ensure that the task submitted should not be overlooked and should surely get executed. As of now there are no hard real time requirements to get the task done.
One of the things I thought could be done was waiting in a loop till I know that there is space in the runnable queue, and then go on and add it to the queue.
Would be glad if people can share their experiences.
Adding the possible solution I though of:
while(executor.getQueue().remainingCapacity <= 0){
// keep looping
Thread.sleep(100);
};
//if the loop exits ,indicates that we have space in the queue hence
//go ahead and add to the queue
executor.execute(new ThreadInstance(params));

I would change the behaviour of your queue. e.g.
public class MyBlockingQueue<E> extends ArrayBlockingQueue<E> {
private final long timeoutMS;
public MyBlockingQueue(int capacity, long timeoutMS) {
super(capacity);
this.timeoutMS = timeoutMS;
}
#Override
public boolean offer(E e) {
try {
return super.offer(e, timeoutMS, TimeUnit.MILLISECONDS);
} catch (InterruptedException e1) {
Thread.currentThread().interrupt();
return false;
}
}
}
This will wait for the queue to drain before giving up.

If you have constrained your thread pool to only allow a certain number of concurrent threads (generally a good thing), then the application needs to somehow push-back on the calling code, so when you receive a RejectedExecutionException from the ThreadPoolExecutor you need to indicate this to the caller and the caller will need to handle the retry.
An analogous situation is a web server under heavy load. A client connects, the web server should return a 503 - Service Unavailable (generally a temporary condition) and the client decides what to do about it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Producer Consumer pattern - handling producer failure - java

Related

ThreadPoolTaskExecutor blocks feeder thread forever

How to pass a message from TimerTask to main thread?

How to reconnect kafka producer once closed?

Best practice for interrupting threads that take longer than a threshold

How to handle RejectedExecutionException with ThreadPoolExecutor in java

Categories

Resources