In my application, I've noticed that HornetQ 2.4.1 has been piling up message journal files, (sometimes into the thousands.) I'm using HornetQ via JMS Queues and we're using Wildfly 8.2. Normally, when starting the server instance, HornetQ will have 3 messaging journals and a lock file.
The piling up of message journal files has caused issues when restarting the server, we'll see a log that states:
HQ221014: 54% loaded
When removing the files, the server loads just fine. I've experimented some, and it appears as though messages in these files have already been processed, but I'm not sure why they continue to pile up over time.
Edit 1: I've found this link that indicates we're not acknowledging messages. However, when we create the session like so connection.createSession(false,Session.AUTO_ACKNOWLEDGE);.
I'll continue looking for a solution.
I've come to find out that this has been caused (for one reason or another, I currently believe it has something to do with server load or network hangs) by the failure of calling the afterDelivery() method. I'm addressing this by not hitting that queue so often. It's not elegant, but it serves my purpose.
See following HornetQ messages I found in the logs:
HQ152006: Unable to call after delivery
javax.transaction.RollbackException: ARJUNA016053: Could not commit transaction. at org.jboss.as.ejb3.inflow.MessageEndpointInvocationHandler.afterDelivery(MessageEndpointInvocationHandler.java:87)
HQ222144: Queue could not finish waiting executors. Try increasing the thread pool size
HQ222172: Queue jms.queue.myQueue was busy for more than 10,000 milliseconds. There are possibly consumers hanging on a network operation
Related
I have created a spring boot web application and deployed war of the same to tomcat container.
The application connects to mongoDB using Async connections. I am using mongodb-driver-async library for that.
At startup everything works fine. But as soon as load increases, It shows following exception in DB connections:
org.springframework.web.context.request.async.AsyncRequestTimeoutException: null
at org.springframework.web.context.request.async.TimeoutDeferredResultProcessingInterceptor.handleTimeout(TimeoutDeferredResultProcessingInterceptor.java:42)
at org.springframework.web.context.request.async.DeferredResultInterceptorChain.triggerAfterTimeout(DeferredResultInterceptorChain.java:75)
at org.springframework.web.context.request.async.WebAsyncManager$5.run(WebAsyncManager.java:392)
at org.springframework.web.context.request.async.StandardServletAsyncWebRequest.onTimeout(StandardServletAsyncWebRequest.java:143)
at org.apache.catalina.core.AsyncListenerWrapper.fireOnTimeout(AsyncListenerWrapper.java:44)
at org.apache.catalina.core.AsyncContextImpl.timeout(AsyncContextImpl.java:131)
at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:157)
I am using following versions of software:
Spring boot -> 1.5.4.RELEASE
Tomcat (installed as standalone binary) -> apache-tomcat-8.5.37
Mongo DB version: v3.4.10
mongodb-driver-async: 3.4.2
As soon as I restart the tomcat service, everything starts working fine.
Please help, what could be the root cause of this issue.
P.S.: I am using DeferredResult and CompletableFuture to create Async REST API.
I have also tried using spring.mvc.async.request-timeout in application and configured asynTimeout in tomcat. But still getting same error.
It's probably obvious that Spring is timing out your requests and throwing AsyncRequestTimeoutException, which returns a 503 back to your client.
Now the question is, why is this happening? There are two possibilities.
These are legitimate timeouts. You mentioned that you only see the exceptions when the load on your server increases. So possibly your server just can't handle that load and its performance has degraded to the point where some requests can't complete before Spring times them out.
The timeouts are caused by your server failing to send a response to an asynchronous request due to a programming error, leaving the request open until Spring eventually times it out. It's easy for this to happen if your server doesn't handle exceptions well. If your server is synchronous, it's okay to be a little sloppy with exception handling because unhandled exceptions will propagate up to the server framework, which will send a response back to the client. But if you fail to handle an exception in some asynchronous code, that exception will be caught elsewhere (probably in some thread pool management code), and there's no way for that code to know that there's an asynchronous request waiting on the result of the operation that threw the exception.
It's hard to figure out what might be happening without knowing more about your application. But there are some things you could investigate.
First, try looking for resource exhaustion.
Is the garbage collector running all the time?
Are all CPUs pegged at 100%?
Is the OS swapping heavily?
If the database server is on a separate machine, is that machine showing signs of resource exhaustion?
How many connections are open to the database? If there is a connection pool, is it maxed out?
How many threads are running? If there are thread pools in the server, are they maxed out?
If something's at its limit then possibly it is the bottleneck that is causing your requests to time out.
Try setting spring.mvc.async.request-timeout to -1 and see what happens. Do you now get responses for every request, only slowly, or do some requests seem to hang forever? If it's the latter, that strongly suggests that there's a bug in your server that's causing it to lose track of requests and fail to send responses. (If setting spring.mvc.async.request-timeout appears to have no effect, then the next thing you should investigate is whether the mechanism you're using for setting the configuration actually works.)
A strategy that I've found useful in these cases is to generate a unique ID for each request and write the ID along with some contextual information every time the server either makes an asynchronous call or receives a response from an asynchronous call, and at various checkpoints within asynchronous handlers. If requests go missing, you can use the log information to figure out the request IDs and what the server was last doing with that request.
A similar strategy is to save each request ID into a map in which the value is an object that tracks when the request was started and what your server last did with that request. (In this case your server is updating this map at each checkpoint rather than, or in addition to, writing to the log.) You can set up a filter to generate the request IDs and maintain the map. If your filter sees the server send a 5xx response, you can log the last action for that request from the map.
Hope this helps!
Asynchroneus tasks are arranged in a queue(pool) which is processed in parallel depending on the number of threads allocated. Not all asynchroneus tasks are executed at the same time. Some of them are queued. In a such system getting AsyncRequestTimeoutException is normal behaviour.
If you are filling up the queues with asynchroneus tasks that are unable to execute under pressure. Increasing the timeout will only delay the problem. You should focus instead on the problem:
Reduce the execution time(through various optimizations) of asynchroneus task. This will relax the pooling of async tasks. It oviously requires coding.
Increase the number of CPUSs allocated in order to be able to run more efficiently the parallel tasks.
Increase the number of threads servicing the executor of the driver.
Mongo Async driver is using AsynchronousSocketChannel or Netty if Netty is found in the classpath. In order to increase the number of the worker threads servicing the async comunication you should use:
MongoClientSettings.builder()
.streamFactoryFactory(NettyStreamFactoryFactory(io.netty.channel.EventLoopGroup eventLoopGroup,
io.netty.buffer.ByteBufAllocator allocator))
.build();
where eventLoopGroup would be io.netty.channel.nio.NioEventLoopGroup(int nThreads))
on the NioEventLoopGroup you can set the number of threads servicing your async comunication
Read more about Netty configuration here https://mongodb.github.io/mongo-java-driver/3.2/driver-async/reference/connecting/connection-settings/
I've been browsing the forums for last few days and tried almost everything i could find, but without any luck.
The situation is: inside our Java Web Application we have ActiveMQ 5.7 (I know it's very old, eventually we will upgrade to newer version - but for some reasons it's not possible right now). We have only one broker and multiple consumers.
When I start the servers (I have tried to do so for 2, 3, 4 and more servers) everything is ok. The servers are comunicating with each other, QUEUE messages are consumed instantly. But when I leave the servers idle (for example to finally catch some sleep ;) ) it is no longer the case. Messages are stuck in the database and are not beign consumed. The only option to have them delivered is to restart the server.
Part of my configuration (we keep it in properties file, it's the actual state, however I have tried many different combinations):
BrokerServiceURI=broker:(tcp://0.0.0.0:{0})/{1}?persistent=true&useJmx=false&populateJMSXUserID=false&useShutdownHook=false&deleteAllMessagesOnStartup=false&enableStatistics=true
ConnectionFactoryURI=failover://({0})?initialReconnectDelay=100&timeout=6000
ConnectionFactoryServerURI=tcp://{0}:{1}?keepAlive=true&soTimeout=100&wireFormat.cacheEnabled=false&wireFormat.tightEncodingEnabled=false&wireFormat.maxInactivityDuration=0
BrokerService.startAsync=true
BrokerService.networkConnectorStartAsync=true
BrokerService.keepDurableSubsActive=false
Do you have a clue?
I cannot actually tell you the reason from the description mentioned above but I can list down a few checks that are fresh in my mind. Please confirm the following if they are valid for you or not.
Can you check the consumer connections?
Are the consumer sessions still active?
If all the consumer-connections are up, then check the thread-dump whether the active consumer threads (I'm assuming you created consumer threads, correct me if I'm wrong) are in RUNNING or WAITING state(this happened with me where all the consumers were active but some other thread was keeping a lock on Logger while posting a message to slack and the consumers were in WAITING state) because of some other thread in the server).
Check the Dispatch queue size for each consumer. Check the prefetch of each consumer and then compare Dispatch Queue size with Prefetch, refer
Is there a JMSXGroupID you are allotting to each message?
Can you tell a little more about your consumer/producer/broker configurations?
We have a Weblogic server running several apps. Some of those apps use an ActiveMQ instance which is configured to use the Weblogic XA transaction manager.
Now after about 3 minutes after startup, the JVM triggers an OutOfMemoryError. A heap dump shows that about 85% of all memory is occupied by a LinkedList that contains org.apache.activemq.command.XATransactionId instances. The list is a root object and we are not sure who needs it.
What could cause this?
We had exactly the same issue on Weblogic 12c and activemq-ra. XATransactionId object instances were created continuously causing server overload.
After more than 2 weeks of debugging, we found that the problem was caused by WebLogic Transaction Manager trying to recover some pending activemq transactions by calling the method recover() which returns the ids of transaction that seems to be not completed and have to be recovered. The call to this method by Weblogic returned always a not null number n (always the same) and that causes the creation of n instance of XATransactionId object.
After some investigations, we found that Weblogic stores by default its Transaction logs TLOG in filesystem and this can be changed to be persisted in DB. We thought that there was a problem in TLOGs being in file system and we tried to change it to DB and it worked ! Now our server runs for more that 2 weeks without any restart and memory is stable because no XATransactionId are created a part from the necessary amount of it ;)
I hope this will help you and keep us informed if it worked for you.
Good luck !
To be honest it sounds like you're getting a ton of JMS messages and either not consuming them or, if you are, your consumer is not acknowledging the messages if they are not in auto acknowledge mode.
Check your JMS queue backlog. There may be a queue with high backlog, which server is trying to read. These messages may have been corrupted, due to some crash
The best option is to delete the backlog in JMS queue or take a back up in some other queue
I posted this on the AWS support forums but haven't received a response, so hoping you guys have an idea...
We have an auto-scaling group which boots up or terminates an instance based on current load. What I'd like to be able to do it detect, on my current EC2 instance, that it's about to be shut down and to finish my work.
To describe the situation in more detail. We have an auto-scaling group, and each instance reads content from a single SQS. Each instance will be running multiple threads, each thread is reading from the same SQS queue and processing the data as needed.
I need to know when this instance will be about to shut down, so I can stop new threads from reading data, and block the shutdown until the remaining data has finished processing.
I'm not sure how I can do this in the Java SDK, and I'm worried my instances will be terminated without my data being processed correctly.
Thanks
Lee
When it wants to scale down, AWS Auto Scaling will terminate your EC2 instances without warning.
There's no way to allow any queue workers to drain before terminating.
If your workers are processing messages transactionally, and you're not deleting messages from SQS until after they have been successfully processed, then this shouldn't be a problem. The processing will stop when the instance is terminated, and the transaction will not commit. The message won't be deleted from the SQS queue, and can be picked up and processed by another worker later on.
The only kind of draining behavior it supports is HTTP connection draining from an ELB: "If connection draining is enabled for your load balancer, Auto Scaling waits for the in-flight requests to complete or for the maximum timeout to expire, whichever comes first, before terminating instances".
My Java EE application sends JMS to queue continuously, but sometimes the JMS consumer application stopped receiving JMS. It causes the JMS queue very large even full, that collapses the server.
My server is JBoss or Websphere. Do the application servers provide strategy to remove "timeout" JMS messages?
What is strategy to handle large JMS queue? Thanks!
With any asynchronous messaging you must deal with the "fast producer/slow consumer" problem. There are a number of ways to deal with this.
Add consumers. With WebSphere MQ you can trigger a queue based on depth. Some shops use this to add new consumer instances as queue depth grows. Then as queue depth begins to decline, the extra consumers die off. In this way, consumers can be made to automatically scale to accommodate changing loads. Other brokers generally have similar functionality.
Make the queue and underlying file system really large. This method attempts to absorb peaks in workload entirely in the queue. This is after all what queuing was designed to do in the first place. Problem is, it doesn't scale well and you must allocate disk that 99% of the time will be almost empty.
Expire old messages. If the messages have an expiry set then you can cause them to be cleaned up. Some JMS brokers will do this automatically while on others you may need to browse the queue in order to cause the expired messages to be deleted. Problem with this is that not all messages lose their business value and become eligible for expiry. Most fire-and-forget messages (audit logs, etc.) fall into this category.
Throttle back the producer. When the queue fills, nothing can put new messages to it. In WebSphere MQ the producing application then receives a return code indicating that the queue is full. If the application distinguishes between fatal and transient errors, it can stop and retry.
The key to successfully implementing any of these is that your system be allowed to provide "soft" errors that the application will respond to. For example, many shops will raise the MAXDEPTH parameter of a queue the first time they get a QFULL condition. If the queue depth exceeds the size of the underlying file system the result is that instead of a "soft" error that impacts a single queue the file system fills and the entire node is affected. You are MUCH better off tuning the system so that the queue hits MAXDEPTH well before the file system fills but then also instrumenting the app or other processes to react to the full queue in some way.
But no matter what else you do, option #4 above is mandatory. No matter how much disk you allocate or how many consumer instances you deploy or how quickly you expire messages there is always a possibility that your consumer(s) won't keep up with message production. When this happens your producer app should throttle back, or raise an alarm and stop or do anything other than hang or die. Asynchronous messaging is only asynchronous up to the point that you run out of space to queue messages. After that your apps are synchronous and must gracefully handle that situation, even if that means to (gracefully) shut own.
Sure!
http://download.oracle.com/docs/cd/E17802_01/products/products/jms/javadoc-102a/index.html
Message#setJMSExpiration(long) does exactly what you want.