Tomcat server not responding - java

I run a web server on tomcat 7 with Java 8, the server perform a lot of IO operations - mostly DB and HTTP calls, each transaction consumes a generous amount of memory and it serves around 100 concurrents at a given time.
After some time, around 10,000 requests made, but not in particular, the server start hangs, not respond or respond with empty 500 responses.
I see some errors on the logs which I currently trying to solve, but what bugs me is that I can't figure out what eventually causes that - catalina log file does not show a heap space exception, plus I took some memory dumps and it seems like there's always room to grow and garbage to collect, so I decided it is not a memory problem. Then I took thread dumps, I've always seen dozens of threads in WAITING, TIMED_WAITING, PARKING, etc...from what I read it seems like these threads are available to handle incoming work.
It's worth mentioning that all the work is done asynchronously, with no blocking operations and it seems like all the thread pools are available. Even more, I stop the traffic to the server and let it rest for some time, and even then the issue doesn't go away. So I figured it's also not a thread problem.
So...my question is:
Maybe it is a memory issue? Can it be a thread-cpu issue? can it be anything else?

Related

How to run very long request that uses high memory in tomcat?

I have a tomcat server.
In the tomcat server, I handle some restful request which calls to very high memory usage server which can last 15 minutes and finally can crash the tomcat.
How can I run this request:
1. without crash the tomcat?
2. without exceed the limit of 3 min on restful requests?
Thank you.
Try another architectural approach.
REST is designed to be statusless, so you have to introduce status.
I suggest you implement ...
the long running task as batch in the background (as
#kamran-ghiasvand suggests).
a submit request that starts the batch and returns a unique ID
a status request that reports the status of the task (auto refresh
the screen every 5s e.g.). You can do that on html/page basis or as
ajax.
To give you an idea what you'll might need on the backend, I quote our PaymentService interface below.
public interface PaymentService {
PnExecution createPaymentExecution(List<Period> Periods, Date calculationDate) throws PnValidationException;
Long createPaymentExecutionAsync(List<Period> Periods, Date calculationDate);
PnExecution simulatePaymentExecution(Period Period, Date calculationDate) throws PnValidationException;
Void deletePaymentExecution(Long pnExecutionId, AsyncTaskListener<?, ?> listener);
Long deletePaymentExecutionAsync(Long pnExecutionId);
void removePaymentNotificationFromPaymentExecution(Long pnExecutionId, Pn paymentNotification);
}
About performance:
Try to find the memory consumers, and try to sequentialize the problem, cut it into steps. Make sure you have not created memory leaks by keeping unused objects referenced. Last resort would be concurrence (of independent tasks) or parallel (processing of similar tasks). But most of these problems are the result of a too straight-forward architectural approach.
Crashing tomcat server has nothing to do with request processing time, though, it might occur due to JVM heap memory overflow (or thousands of other reasons). You should make sure about reason of crash by investigating tomcat logs carefully. If its reason is lack of memory, you can allocate more memory to JVM upon starting tomcat using '-Xmx' flag. For example, you can add the following line in your setenv.sh for allocating 2GB of ram to tomcat:
CATALINA_OPTS="-Xmx2048m"
In terms of request timeout, also there are many reasons that play role here. For example, connectionTimeout of your http connector (see server.xml), network or browser or web client limitations and many other reasons.
Generally speaking, its very bad practice to make such long request synchronously via restful request. I suggest that you consider another workarounds such as websocket or push notification for announcing user that his time-consuming request is completed on server side.
Basically what you are asking boils down to this:
For some task running on Tomcat, that I have not told you anything about, how do I make it run faster, use less memory and not crash.
In the general case, you need to analyze your code to work out why it is taking so long and using so much memory. Then you need to modify or rewrite it as required to reduce memory utilization and improve its efficiency.
I don't think we can offer sensible advice into how to make the request faster, etc without more details. For example, the advice that some people have been offering to split the request into smaller requests, or perform the large request asynchronously won't necessarily help. You should not try these ideas without first understanding what the real problem is.
It is also possible that your task is taking too long and crashing Tomcat for a specific reason:
It is possible that the request's use of (too much) memory is actually causing the requests to take too long. If a JVM is running out of heap memory, it will spend more and more time running the GC. Ultimately it will fail with an OutOfMemoryError.
The excessive memory use could be related to the size of the task that the request is performing.
The excessive memory use could be caused by a bug (memory leak) in your code or some 3rd party library that you are using.
Depending on the above, the problem could be solved by:
increasing Tomcat's heapsize,
fixing the memory leak, or
limiting the size of the "problem" that the request is trying to solve.
It is also possible that you just have a bug in your code; e.g. an infinite loop.
In summary, you have not provided enough information to allow a proper diagnosis. The best we can do is suggest possible causes. Guesses, really.

Weird Memory Usage by Tomcat

Its a vague question. So please feel free to ask for any specific data.
We have a tomcat server running with two web-service's. One tomcat built using spring. We use mysql for 90% of tasks and mongo for caching of jsons (10%). The other web-service is written using grails. Both the services are medium sized codebases (About 35k lines of code each)
The computation only happens when there is an HTTP request (No batch processing). With about 2000 database hits per request (I know its humongous. We are working on it). The request rate is about 30 req/min. For one particular request, there is Image processing which is quite memory expensive. No JNI anywhere
We have found a weird behavior. Last night, I can confirm that there was no request to the server for about 12 hours. But when I look at the memory consumption, it is very confusing:
Without any requests, the memory keeps jumping from 500Mb to 1.2Gb (700 Mb jump is worrysome). There is no computation on server side as mentioned. I am not sure if its a memory leak:
The memory usage comes down. (Things would have been way easier if the memory didnt come down).
This behavior is reproducable with caches based on SoftReference or so. With full gc's. But I am not using them anywhere (Not sure if something else is using it)
What else can be the reason. is it a cause to worry?
PS: We have had Our of Memory Crashes (Not errors but JVM crash) quite frequently very recently.
This is actually normal behavior. You're just seeing garbage collection occur.

ColdFusion JVM: strange memory behaviour

Since last month, we got a problem on our company's server (Win2008ServerStd + IIS7 + CF enterprise 9.0.1 (hotfix2)).
I used jConsole to monitor the Coldfusion JVM (1.6.0_24) activity and here's what I see:
Notice that strange "curve" between 14:10 and 14:15 ! What is that?
Obviously it's not a standard behaviour, when it happens, my applications hang for 30 to 70 seconds!
Do you know what can cause that memory issue? It seems like GC does not run correctly, or hangs itself.
I don't expect a flash-answer, I wonder there can be a lot of root problems causing that but.... where can I start investigating?
Using cfstat, perfmon, fusionreactor, or cf perfomance monitor take a look at running requests and queued duing your problem. What you will likely see is running requests climbing past the setting of the simultaneous requests (in the cf admin). Then the requests will start to queue. Eventually the queue will clear out (if your server is recovering on it's own).
This sort of thing can be caused by a number of things. For example, if your DB server slows down or has an issue, if your network has a problem, or if network ports are resyncing, if your disks have I/O problems etc.
My guess is that you will drive yourself batty trying to figure this out by monitoring your heap. See if you can watch one of the monitors for some specific scripts that might be the culprit.
The other comment (about some indexing agents) is also a possibility. A flurry of indexing can definitely cause behavior. If that's the case, you might take a look at the simultaneous request settings. If it is set at the default you might have enough head room to increase it.
It could have been a spider creating lots and lots of sessions as it crawled the site which would eat up memory for a period of time. Once the spider stopped crawling those sessions would time out and be garbage collected.
I would compare your HTTP server logs w/ the JVM logs. Compare that time frame and see if there are a lot of requests from a search engine spider (Googlebot, msnbot, etc).
Fabio,
Same kind of issue I have couple of month ago where I was getting spike on regular interval and server eating up arround 50% of CPU usage. I wrote full story below URL
http://www.isummation.com/blog/strange-coldfusion-issue-jrun-eating-up-to-50-of-cpu/
which may help you (Sorry for so long).
I found that client variables storing in registry was causing issue and I am able to catch with help of VisualVM where I first find out thread causing issue and looking into trace of exactly find solution.
The only thing that's really odd IMO is the sudden spike to having so many threads. Capture a thread dump on a regular basis (jstack, etc.. are your friends) and then correlate those thread dumps to your monitoring where it shows the spike.
The root problem will become more obvious once you understand what all the extra threads are doing. Perhaps it's more threads handling transactions, but it might be something else entirely.

sun.rmi.transport.tcp.TCPTransport uses 100% CPU

I'm developping a communication library based on NIO's non-blocking SocketChannels so I can use select to keep my thread low on CPU usage (and getting faster reaction time to other events).
SocketChannel are created externally to my thread and added to the list it handles, marking them as non-blocking and adding them to a Selector for READ operations (and WRITE when needed, but that does not happen in my problem).
I have a little Swing application for tests, running locally, that can be either a client or server: the client one connects to the server one and they can send each other messages. Pretty simple and works fine, excepts for the CPU which tops 100% (50% for each jvm) as soon as the connection is established between client and server.
Running jvisualvm shows me that sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run() uses 98% of the application time, counting only 3 method calls!
A forced stack trace shows it's blocking on the read operation on a FilteredInputStream, on a Socket.
I'm a little puzzled as I don't use RMI (though I can understand NIO and RMI can share the "transport" code part). I have seen a few similar questions but each were specifically using RMI, which I'm not. The answers I've seen is that this ConnectionHandler.run() method is responsible for marshalling/unmarshalling things, when here I get 100% CPU without any network traffic. I can only infer an active wait on the sockets but that sounds odd, especially with non-blocking SocketChannel...
Any idea would be greatly appreciated!
I tracked CPU use down to select(int timeout) which returns 0 immediately, regardless of the timeout value. My understanding of this function was it would block until a selected operation pops up, or timeout is reached (as said in the Javadoc).
However, I found out this other StackOverflow post showing the same problem: OP_CONNECT operation has to be cancelled once connection is accepted.
Many thanks to #Alexander, and #EJP for clarification about the OP_WRITE/OP_CONNECT similarities.
Regarding tge RMI part, it was probably due to Eclipse run configurations.

Java Memory Usage / Thread Pool Performance Problem

These things obviously require close inspection and availability of code to thoroughly analyze and give good suggestions. Nevertheless, that is not always possible and I hope it may be possible to provide me with good tips based on the information I provide below.
I have a server application that uses a listener thread to listen for incoming data. The incoming data is interpreted into application specific messages and these messages then give rise to events.
Up to that point I don't really have any control over how things are done.
Because this is a legacy application, these events were previously taken care of by that same listener thread (largely a single-threaded application). The events are sent to a blackbox and out comes a result that should be written to disk.
To improve throughput, I wanted to employ a threadpool to take care of the events. The idea being that the listener thread could just spawn new tasks every time an event is created and the threads would take care of the blackbox invocation. Finally, I have a background thread performing the writing to disk.
With just the previous setup and the background writer, everything works OK and the throughput is ~1.6 times more than previously.
When I add the thread pool however performance degrades. At the start, everything seems to run smoothly but then after awhile everything is very slow and finally I get OutOfMemoryExceptions. The weird thing is that when I print the number of active threads each time a task is added to the pool (along with info on how many tasks are queued and so on) it looks as if the thread pool has no problem keeping up with the producer (the listener thread).
Using top -H to check for CPU usage, it's quite evenly spread out at the outset, but at the end the worker threads are barely ever active and only the listener thread is active. Yet it doesn't seem to be submitting more tasks...
Can anyone hypothesize a reason for these symptoms? Do you think it's more likely that there's something in the legacy code (that I have no control over) that just goes bad when multiple threads are added? The out of memory issue should be because some queue somewhere grows too large but since the threadpool almost never contains queued tasks it can't be that.
Any ideas are welcome. Especially ideas of how to more efficiently diagnose a situation like this. How can I get a better profile on what my threads are doing etc.
Thanks.
Slowing down then out of memory implies a memory leak.
So I would start by using some Java memory analyzer tools to identify if there is a leak and what is being leaked. Sometimes you get lucky and the leaked object is well-known and it becomes pretty clear who is hanging on to things that they should not.
Thank you for the answers. I read up on Java VisualVM and used that as a tool. The results and conclusions are detailed below. Hopefully the pictures will work long enough.
I first ran the program and created some heap dumps thinking I could just analyze the dumps and see what was taking up all the memory. This would probably have worked except the dump file got so large and my workstation was of limited use in trying to access it. After waiting two hours for one operation, I realized I couldn't do this.
So my next option was something I, stupidly enough, hadn't thought about. I could just reduce the number of messages sent to the application, and the trend of increasing memory usage should still be there. Also, the dump file will be smaller and faster to analyze.
It turns out that when sending messages at a slower rate, no out of memory issue occured! A graph of the memory usage can be seen below.
The peaks are results of cumulative memory allocations and the troughs that follow are after the garbage collector has run. Although the amount of memory usage certainly is quite alarming and there are probably issues there, no long term trend of memory leakage can be observed.
I started to incrementally increase the rate of messages sent per second to see where the application hits the wall. The image below shows a very different scenario then the previous one...
Because this happens when the rate of messages sent are increased, my guess is that my freeing up the listener thread results in it being able to accept a lot of messages very quickly and this causes more and more allocations. The garbage collector doesn't run and the memory usage hits a wall.
There's of course more to this issue but given what I have found out today I have a fairly good idea of where to go from here. Of course, any additional suggestions/comments are welcome.
This questions should probably be recategorized as dealing with memory usage rather than threadpools... The threadpool wasn't the problem at all.
I agree with #djna.
Thread Pool of java concurrency package works. It does not create threads if it does not need them. You see that number of threads is as expected. This means that probably something in your legacy code is not ready for multithreading. For example some code fragment is not synchronized. As a result some element is not removed from collection. Or some additional elements are stored in collection. So, the memory usage is growing.
BTW I did not understand exactly which part of the application uses threadpool now. Did you have one thread that processes events and now you have several threads that do this? Have you probably changed the inter-thread communication mechanism? Added queues? This may be yet another direction of your investigation.
Good luck!
As mentioned by djna, it's likely some type of memory leak. My guess would be that you're keeping a reference to the request around somewhere:
In the dispatcher thread that's queuing the requests
In the threads that deal with the requests
In the black box that's handling the requests
In the writer thread that writes to disk.
Since you said everything works find before you add the thread pool into the mix, my guess would be that the threads in the pool are keeping a reference to the request somewhere. Th idea being that, without the threadpool, you aren't reusing threads so the information goes away.
As recommended by djna, you can use a Java memory analyzer to help figure out where the data is stacking up.

Categories