Akka actors are getting stopped - java

I'm using akka actors to achieve parallel processing of some http requests. I've initailized a pool of actors using RoundRobinPool like:
ActorRef myActorPool = actorSystem.actorOf(new RoundRobinPool(200).props(Props.create(MyActor.class, args)), MyActor.class.getSimpleName());
It is working fine. But after the process is running for sometime, i'm getting following error
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Recipient[Actor[akka://web_server/user/MyActor#-769383443]] had already been terminated. Sender[null] sent the message of type "com.data.model.Request".
So I've overridden postStop method added a log statement there.
#Override
public void postStop() {
log.warn("Actor is stopped");
}
Now, I can see in the logs that the actors are getting stopped. But I'm not sure for which request it is happening. Once all the actors in the pool terminates (200 is the pool size I've set), I'm getting AskTimeoutException as said before. Is there anyway to debug why the actors are getting terminated?
EDIT 1
In the controller, I'm using the created actor pool like
CompletableFuture<Object> s = ask(myActorPool, request, 1000000000).toCompletableFuture();
return s.join();
The actor processes one kind of messages only.
#Override
public AbstractActor.Receive createReceive() {
return receiveBuilder()
.match(Request.class, this::process)
.build();
}
private void process(Request request) {
try {
// code here
} catch (Exception e) {
log.error(e.getMessage(), e);
getSender().tell(new akka.actor.Status.Failure(e), getSelf());
}
}

As far as you have described the probelm it seems you are processing your data inside the ask call and taking more time than askTimeout, and you are getting the error.
What you can do is either increase the askTimeout or do less processing inside tha ask call.
you should not do CPU bound operations inside the ask call, it can cause slowing down your system it is recommended that you should do the I/O bound operations inside the ask call. That way you can leverage the actors.
For example:
val x=x+1 // CPU bound operation should not do inside the ask call
some db call you can do inside the ask call that is preferable.

Related

Do while loop behaving unexpectedly, for some inexplicable reason

I've been all over the internet and the Java docs regarding this one; I can't seem to figure out what it is about do while loops I'm not understanding. Here's the background: I have some message handler code that takes some JSON formatted data from a REST endpoint, parses it into a runnable task, then adds this task to a linked blocking queue for processing by the worker thread. Meanwhile, on the worker thread, I have this do while loop to process the message tasks:
do {
PublicTask currentTask = pubMsgQ.poll();
currentTask.run();
} while(pubMsgQ.size() > 0);
pubMsgQ is a LinkedBlockingQueue<PublicTask> (PublicTask implements the Runnable interface). I can't see any problems with this loop (obviously, or else I wouldn't be here), but this is how it behaves during execution: Upon entering the do block, pubMsgQ is polled and returns the runnable task as expected. The task is then run successfully with expected results, but then we get to the while statement. Now, according to the Java docs, poll() should return and remove the head of the queue, so I should expect that pubMsgQ.size() will return 0, right? Wrong I guess, because somehow the while statement passes and the program enters the do block again; of course this time pubMsgQ.poll() returns null (as I would have expected it should) and the program crashes with NullPointerException. What? Please explain like I'm five...
EDIT:
I decided to leave my original post as is above; because I think I actually explain the undesired behavior of that specific piece of the code quite succinctly (the loop is being executed twice while I'm fairly certain there is no way the loop should be executing twice). However, I realize that probably doesn't give enough context for that loop's existence and purpose in the first place, so here is the complete breakdown for what I am actually trying to accomplish with this code as I am sure there is a better way to implement this altogether anyways.
What this loop is actually a part of is a message handler class which implements the MessageHandler interface belonging to my Client Endpoint class [correction from my previous post; I had said the messages coming in were JSON formatted strings from a REST endpoint. This is technically not true: they are JSON formatted strings being received through a web socket connection. Note that while I am using the Spring framework, this is not a STOMP client; I am only using the built-in javax WebSocketContainer as this is more lightweight and easier for me to implement]. When a new message comes in onMessage() is called, which passes the JSON string to the MessageHandler; so here is the code for the entire MessageHandler class:
public class MessageHandler implements com.innotech.gofish.AutoBrokerClient.MessageHandler {
private LinkedBlockingQueue<PublicTask> pubMsgQ = new LinkedBlockingQueue<PublicTask>();
private LinkedBlockingQueue<AuthenticatedTask> authMsgQ = new LinkedBlockingQueue<AuthenticatedTask>();
private MessageLooper workerThread;
private CyclicBarrier latch = new CyclicBarrier(2);
private boolean running = false;
private final boolean authenticated;
public MessageHandler(boolean authenticated) {
this.authenticated = authenticated;
}
#Override
public void handleMessage(String msg) {
try {
//Create new Task and submit it to the message queue:
if(authenticated) {
AuthenticatedTask msgTsk = new AuthenticatedTask(msg);
authMsgQ.put(msgTsk);
} else {
PublicTask msgTsk = new PublicTask(msg);
pubMsgQ.put(msgTsk);
}
//Check status of worker thread:
if(!running) {
workerThread = new MessageLooper();
running = true;
workerThread.start();
} else if(running && !workerThread.active) {
latch.await();
latch.reset();
}
} catch(InterruptedException | BrokenBarrierException e) {
e.printStackTrace();
}
}
private class MessageLooper extends Thread {
boolean active = false;
public MessageLooper() {
}
#Override
public synchronized void run() {
while(running) {
active = true;
if(authenticated) {
do {
AuthenticatedTask currentTask = authMsgQ.poll();
currentTask.run();
if(GoFishApplication.halt) {
GoFishApplication.reset();
}
} while(authMsgQ.size() > 0);
} else {
do {
PublicTask currentTask = pubMsgQ.poll();
currentTask.run();
} while(pubMsgQ.size() > 0);
}
try {
active = false;
latch.await();
} catch (InterruptedException | BrokenBarrierException e) {
e.printStackTrace();
}
}
}
}
}
You may probably see where I'm going with this...what this Gerry-rigged code is trying to do is act as a facsimile for the Looper class provided by the Android Development Kit. The actual desired behavior is as messages are received, the handleMessage() method adds the messages to the queue for processing and the messages are processed on the worker thread separately as long as there are messages to process. If there are no more messages to process, the worker thread waits until it is notified by the handler that more messages have been received; at which point it resumes processing those messages until the queue is once again empty. Rinse and repeat until the user stops the program.
Of course, the closest thing the JDK provides to this is the ThreadPoolExecutor (which I know is probably the actual proper way to implement this); but for the life of me I couldn't figure out how to for this exact case. Finally, as a quick aside so I can be sure to explain everything fully, The reason why there are two queues (and a public and authenticated handler) is because there are two web socket connections. One is an authenticated channel for sending/receiving private messages; the other is un-authenticated and used only to send/receive public messages. There should be no interference, however, given that the authenticated status is final and set at construction; and each Client Endpoint is passed it's own Handler which is instantiated at the time of server connection.
You appear to have a number of concurrency / threading bugs in your code.
Assumptions:
It looks like there could be multiple MessageHandler objects, each with its own pair of queues and (supposedly) at most one MessageLooper thread. It also looks as if a given MessageHandler could be used by multiple request worker threads.
If that is the case, then one problem is that MessageHandler is not thread-safe. Specifically, the handleMessage is accessing and updating fields of the MessageHandler instance without doing any synchronization.
Some of the fields are initialized during object creation and then never changed. They are probably OK. (But you should declare them as final to be sure!) But some of the variables are supposed to change during operation, and they must be handled correctly.
One section that rings particular alarm bells is this:
if (!running) {
workerThread = new MessageLooper();
running = true;
workerThread.start();
} else if (running && !workerThread.active) {
latch.await();
latch.reset();
}
Since this is not synchronized, and the variables are not volatile:
There are race conditions if two threads call this code simultaneously; e.g. between testing running and assigning true to it.
If one thread sets running to true, there are no guarantees that a second thread will see the new value.
The net result is that you could potentially get two or more MessageLooper threads for a given set of queues. That breaks your assumptions in the MessageLooper code.
Looking at the MessageLooper code, I see that you have declared the run method as synchronized. Unfortunately, that doesn't help. The problem is that the run method will be synchronizing on this ... which is the specific instance of MessageLooper. And it will acquire the lock once and release it once. On short, the synchronized is wrong.
(For Java synchronized methods and synchronized blocks to work properly, 1) the threads involved need to synchronize on the same object (i.e. the same primitive lock), and 2) all read and write operations on the state guarded by the lock need to be done while holding the lock. This applies to use of Lock objects as well.)
So ...
There is no synchronization between a MessageLooper thread and any other threads that are adding to or removing from the queues.
There are no guarantees that the MessageLooper thread will notice changes to the running flag.
As I previously noted, you could have two or more MessageLooper polling the same pair of queues.
In short, there are lots of possible explanations for strange behavior in the code in the Question. This includes the specific problem you noticed with the queue size.
Writing correct multi-threaded code is difficult. This is why you should be using an ExecutorService rather than attempting to roll your own code.
But it you do need to roll your own concurrency code, I recommend buying and reading "Java: Concurrency in Practice" by Brian Goetz et al. It is still the only good textbook on this topic ...

Polling for Pod's ready state

I am using the fabric8 library for java for deploying appliations on a Kubernetes cluster.
I want to poll the status of pods to know when they are ready. I started writing my own until I read about the Watcher.
I implemented something like this
deployment =
kubeClient.extensions().deployments().inNamespace(namespaceName).create(deployment);
kubeClient.pods().inNamespace(namespaceName).watch(new Watcher<Pod>() {
#Override
public void eventReceived(io.fabric8.kubernetes.client.Watcher.Action action,
Pod resource) {
logger.info("Pod event {} {}", action, resource);
logger.info("Pod status {} , Reason {} ", resource.getStatus().getPhase(),
resource.getStatus().getReason());
}
#Override
// What causes the watcher to close?
public void onClose(KubernetesClientException cause) {
if (cause != null) {
// throw?
logger.error("Pod event {} ", cause);
}
}
});
I m not sure if I understand the Watcher functionality correctly. Does it time out? Or Do I still write my poller inside the eventReceivedMethod()? What is the use case for a watcher?
// What causes the watcher to close?
Since watches are implemented using websockets, a connection is subject to closure at any time for any reason or no reason.
What is the use case for a watcher?
I would imagine it's two-fold: not paying the TCP/IP + SSL connection setup cost, making it quicker, and having your system be event-driven rather than simple polling, which will make every participant use less resources (the server and your client).
But yes, the answer to your question is that you need to have retry logic to reestablish the watcher if you have not yet reached the Pod state you were expecting.

Akka and Ask Pattern. When Actor is abruptly stopped can i return Future?

I currently have code which dispatches a request using the Ask Pattern. The dispatched request will generate an Akka Actor which sends a HTTP request and then returns the response. I'm using Akka's circuit breaker API to manage issues with the upstream web services i call.
If the circuitbreaker is in an open state then all subsequent requests are failing fast which is the desired effect. However when the actor fails fast it just throws a CircuitBreakerOpenException, stops the actor however control does not return to the code which made the initial request until an AskTimeoutException is generated.
This is the code which dispatches the request
Timeout timeout = new Timeout(Duration.create(10, SECONDS));
Future<Object> future = Patterns.ask(myActor, argMessage, timeout);
Response res = (Response ) Await.result(future, timeout.duration());
This is the circuitbreaker
getSender().tell(breaker.callWithSyncCircuitBreaker(new Callable<Obj>()
{
#Override
public Obj call() throws Exception {
return fetch(message);
}
}), getSelf()
);
getContext().stop(getSelf());
When this block of code is executed and if the circuit is open it fails fast throwing an exception however i want to return control back to the code which handles the future without having to wait for a timeout.
Is this possible?
When an actor fails out and is restarted, if it was processing a message, no response will be automatically sent to that sender. If you want to send that sender a message on that particular failure then catch that exception explicitly and respond back to that sender with a failed result, making sure to capture the sender first before you go into any future callbacks to avoid closing over this mutable state. You could also try to do this in the preRestart, but that's not very safe as by that time the sender might have changed if you are using futures inside the actor.

How do I stop a Camel route when JVM gets to a certain heap size?

I am using Apache Camel to connect to various endpoints, including JMS topics, and write to a database. Sometimes the database connection fails (for whatever reason, database issue, network blip, etc) and the messages from the topic subscriber start backing up. At a certain point, there are so many messages backed up waiting to be written to the database that the application throws an out of memory error. So far I understand all that.
The problem I have is the following: When the application is frantically trying to garbage collect before eventually giving up and accepting that it is out of memory, the application stops working, but is still alive. This means that the topic subscriber is still seen as active by the JMS provider, but not reading anything off the topic, so the provider starts queueing up the messages. Eventually the provider falls over also when the maximum depth runs out.
How can I configure my application to either disconnect when reaching a certain heap usage, or kill itself completely much much faster when running out of memory? I believe there are some JVM parameters that allow the application to kill itself much quicker when running out of memory, but I am wondering if that is the best solution or whether there is another way?
First of all I think you should use a JDBC connection pool that is capable of refreshing failed connections. So you do not run into the described scenario in the first place. At least not if the DB/network issue is short lived.
Next I'd protect the message broker by applying producer flow control (at least thats how it is called in ActiveMQ). I.e. prevent message producers from submitting more messages if a certain memory threshold has been breached. If the thresholds are set correctly, then that will prevent your message broker from falling over.
As for your original question: I'd use JMX to monitor the VM. If some metric, e.g. memory, breaches a threshold then you can suspend or shut down the route or the whole Camel context via the MBeans Camel exposes.
You can control (start/stop and suspend/resume) Camel routes using the Camel context methods .stop(), .start(), .suspend() and .resume().
You can spin a separate thread that monitors the current VM memory and stops the required route when a certain condition is met.
new Thread() {
#Override
public void run() {
while(true) {
long free = Runtime.getRuntime().freeMemory();
boolean routeRunning = camelContext.isRouteStarted("yourRoute");
if (free < threshold && routeRunning) {
camelContext.stopRoute("yourRoute");
} else if (free > threshold && !routeRunning) {
camelContext.startRoute("yourRoute");
}
// Check every 10 seconds
Thread.sleep(10000);
}
}
}
As commented in the other answer, relying on this is not particularly robust, but at least a little more robust than getting an OutOfMemoryException. Note that you need to .stop() the route, .suspend() does not deallocate resources, which means the connection with the queue provider is still open and the service looks like it is open for business.
You can also stop the route as part of the error handling of the route itself (this is possibly more robust but would require manual intervention to restart the route once the error is cleared, or a scheduled route that periodically checks if the error condition still exists and restart the route if it is gone). The thing to keep in mind is that you cannot stop a route from the same thread that is servicing the route at the time so you need to spin a separate thread that does the stopping. For example:
route("sample").from("jms://myqueue")
// Handle SQL Exceptions by shutting down the route
.onException(SQLException.class)
.process(new Processor() {
// This processor spawns a new thread that stops the current route
Thread stop;
#Override
public void process(final Exchange exchange) throws Exception {
if (stop == null) {
stop = new Thread() {
#Override
public void run() {
try {
// Stop the current route
exchange.getContext().stopRoute("sample");
} catch (Exception e) {}
}
};
}
// start the thread in background
stop.start();
}
})
.end()
// Standard route processors go here
.to(...);

Async NIO: Same client sending multiple messages to Server

Regarding Java NIO2.
Suppose we have the following to listen to client requests...
asyncServerSocketChannel.accept(null, new CompletionHandler <AsynchronousSocketChannel, Object>() {
#Override
public void completed(final AsynchronousSocketChannel asyncSocketChannel, Object attachment) {
// Put the execution of the Completeion handler on another thread so that
// we don't block another channel being accepted.
executer.submit(new Runnable() {
public void run() {
handle(asyncSocketChannel);
}
});
// call another.
asyncServerSocketChannel.accept(null, this);
}
#Override
public void failed(Throwable exc, Object attachment) {
// TODO Auto-generated method stub
}
});
This code will accept a client connection process it and then accept another.
To communicate with the server the client opens up an AsyncSocketChannel and fires the message.
The Completion handler completed() method is then invoked.
However, this means if the client wants to send another message on the same AsyncSocket instance it can't.
It has to create another AsycnSocket instance - which I believe means another TCP connection - which is performance hit.
Any ideas how to get around this?
Or to put the question another way, any ideas how to make the same asyncSocketChannel receive multipe CompleteionHandler completed() events?
edit:
My handling code is like this...
public void handle(AsynchronousSocketChannel asyncSocketChannel) {
ByteBuffer readBuffer = ByteBuffer.allocate(100);
try {
// read a message from the client, timeout after 10 seconds
Future<Integer> futureReadResult = asyncSocketChannel.read(readBuffer);
futureReadResult.get(10, TimeUnit.SECONDS);
String receivedMessage = new String(readBuffer.array());
// some logic based on the message here...
// after the logic is a return message to client
ByteBuffer returnMessage = ByteBuffer.wrap((RESPONSE_FINISHED_REQUEST + " " + client
+ ", " + RESPONSE_COUNTER_EQUALS + value).getBytes());
Future<Integer> futureWriteResult = asyncSocketChannel.write(returnMessage);
futureWriteResult.get(10, TimeUnit.SECONDS);
} ...
So that's it my server reads a message from the async channe and returns an answer.
The client blocks until it gets the answer. But this is ok. I don't care if client blocks.
Whent this is finished, client tries to send another message on same async channel and it doesn't work.
There are 2 phases of connection and 2 different kind of completion handlers.
First phase is to handle a connection request, this is what you have programmed (BTW as Jonas said, no need to use another executor). Second phase (which can be repeated multiple times) is to issue an I/O request and to handle request completion. For this, you have to supply a memory buffer holding data to read or write, and you did not show any code for this. When you do the second phase, you'll see that there is no such problem as you wrote: "if the client wants to send another message on the same AsyncSocket instance it can't".
One problem with NIO2 is that on one hand, programmer have to avoid multiple async operations of the same kind (accept, read, or write) on the same channel (or else an error occur), and on the other hand, programmer have to avoid blocking wait in handlers. This problem is solved in df4j-nio2 subproject of the df4j actor framework, where both AsyncServerSocketChannel and AsyncSocketChannel are represented as actors. (df4j is developed by me.)
First, you should not use an executer like you have in the completed-method. The completed-method is already handled in a new worker-thread.
In your completed-method for .accept(...), you should call asychSocketChannel.read(...) to read the data. The client can just send another message on the same socket. This message will be handled with a new call to the completed-method, perhaps by another worker-thread on your server.

Categories