I have multithreaded java application with my ~5 threads (and also many threads from jetty web server), some of them are reading/writing mongodb from time to time. Some of writes are intensive, where I read 200K mongodb objects, but they don't happen continiously, they happen once in few minutes. For few hours application works perfectly, but later I see this situation:
Mongo is not doing any work, as far I understand it:
Here is my jstack output:
https://gist.github.com/stiv-yakovenko/06b0d235fd2c32d839788edf56aaa6cd
You can see that all threads are waiting for one thread, which, in turn is waiting for mongo, while mongo is doing nothing. Before problem begings, healthy situation is that no threads are waiting for anyone else, because load is not that high to block everything. Before mongo I was using mapdb to store same data and I never had issues like that.
I've seen same situation with multiple threads waiting for mongo, so I decided to put all mongodb invocations under the same ReentrantLock(true). I hoped that rootcause was too many threads wanted to access mongo, but it doesn't help. I don't know what to do, tried to reproduce the problem with simple code, but I can't. Any ideas?
UPD: here is jstat output as one of commenters requested:
Well, finally it turned out that it was a garbage collection. I've ended up using G1 garbage collector. But it was not enough, because it couldn't deliver required latency (though it was close to it). I had to split application into two parts, one for doing intensive garbage-producing calculations, another for low-latency web responses.
Related
I'm a bit of a novice when it comes to JAVA applications, but have been involved in developing a fairly complex JAVA(8) app that requires multi-threading. Myself and another developer have kept running into a problem where the app keeps running out of memory after running for a while.
At first we gave the application 64GB of memory, but after a few hours it'd run out of memory, crash and restart. Only to keep doing it over and over. Context; The application takes messages from a messaging system (ActiveMQ) and from the message's meta has to build an XML file by calling various data sources for values. There could be literally millions of messages that need to be processed, so we developed a multi-threading system, each thread deal with a message - and gave the application 40 threads.
However, as it keeps taking messages the overall memory consumption goes up and up over time. I feel like the garbage collector isn't being utilized by us correctly?
So at the moment we have one parent thread:
(new Thread(new ReportMessageConsumer(config, ""))).start();
Then within the ReportMessageConsumer we have X number of threads setup, so this would be 40 in our current setup. So this would be all under this one group. Once the XML has been built and the threads done with how do we effectively kill the thread and enforce the Garbage collector to free that memory, so that we can then create a new clean thread to pick up another message?
I feel like the garbage collector isn't being utilized by us correctly?
That is not the problem. The best thing you can do is to let the GC do its thing without any interference. Don't try to force the GC to run. It is rarely helpful, and often bad for performance.
The real problem is that you have a memory leak. It may be happening because you are getting more and more threads ... or it may be something else.
I would recommend the following:
Rewrite you code so that it uses a ExecutorService to manage a bounded pool of threads, and a queue of tasks to be run on those threads. Look at the javadocs for a simple example.
Using a thread pool is likely to improve your application's overall performance. Creating a thread (i.e. Thread.start()) is rather expensive in Java.
(And don't shut down the pool as a way to ensure that a batch of work has completed. That is bad for performance. The simple way to do that is to submit the batch using invokeAll; see ExecutorService, how to wait for all tasks to finish.)
If that doesn't cure your leak, then use a memory profiling tool to find out how / why your application is leaking memory. There are lots of StackOverflow Q&A's on how to do this. For example:
How to find a Java Memory Leak
How to find memory leak in java using JProfiler?
How to find memory leaks using visualvm
I am providing a RESTful service that is being served by a servlet (running inside Tomcat 7.0.X and Ubuntu Linux). I'm already getting about 20 thousand queries per hour and it will grow much higher. The servlet receives the requests, prepares the response, inserts a records in a MySQL database table and delivers the response. The log in the database is absolutely mandatory. Untily recently, all this happened in a syncronous way. I mean, before the Tomcat thread delivered the response, it had to create the records in the database table. The problem is that this log used to take more than 90% of the total time, and even worse: when the database got slower then the service took about 10-15 seconds instead of just 20 miliseconds.
I recetly made an improvement: Each Tomcat thread creates an extra thread doing a "(new Thread(new certain Object)).start();" that takes care of the SQL insertion in an asyncronous way, so the response gets to the clients faster. But these threads take too much RAM when MySQL runs slower and threads multiply, and with a few thousands of them the JVM Tomcat runs of the memory.
What I need is to be able to accept as much HTTP requests as possible, to log every one of them as fast as possible (not syncronously), and to make everything fast and with a very low usage of RAM when MySQL gets slow and inserts need to queue. I think I need some kind of queue to buffer the entries when the speed of http request is higher than the speed of insertions in the database log.
I'm thinking about these ideas:
1- Creating some kind of FIFO queue myself, maybe using some of those Apache commons collections, and the some kind of thread that polls the collection and creates the database records. But what collection should I use? And how should I program the thread that polls it, so it won't monopolize the CPU? I think that a "Do while (true)...." would eat the CPU cycles. And that about making it thread safe? How to do it? I think doing it myself is too much effort and most likely I will reinvent the wheel.
2- log4J? I have never used it directly, but it seems that this framework is algo designed to creat "appenders" that talk to the database. Would that be the way to do it?
3- Using some kind of any other framework that specializes in this?
What would you suggest?
Thanks in advance!
What comes to mind right away is a queue like you said. You can use things like ActiveMQ http://activemq.apache.org/ or RabbitMQ http://www.rabbitmq.com/.
The idea is to just fire and forget. There should be almost no overhead to send the messages.
Then you can connect some "offline" to pick up messages off the queues and write them to the database at the speed you need.
I feel like I plug this all day on Stack Overflow, but we use Mule (http://www.mulesoft.org/) at work to do this. One of the great things about Mule is that you can explicitly set the number of threads that read from the queue and the number of threads that write to the database. It allows you fine grain control over throttling messages.
Definitely take a look at using a ThreadPoolExecutor. You can provide the thread pool size, and it will handle all the concurrency and queuing for you. Only possible issue is that if your JVM crashes for any reason, you'll lose any queued items in your pool.
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html
I would also definitely look into optimizing the MySQL database as much as possible. 20k entries per hour can get hairy pretty quickly. The better optimized your hardware, os, and indexes the quicker your inserts and smaller your queue will be.
First of all: Thanks a lot for your valuable suggestions!
So far I have found a partial solution to my need, and I already implemented it succesfully:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/LinkedBlockingQueue.html
Now I'm thinking about using also a queue provider if that gets full, as a failover solution. So far I have thought about Amazon's queue service, but it costs money. I will also check the queue solutions that Ryan suggested.
I've developed a web application using the following tech stack:
Java
Mysql
Scala
Play Framework
DavMail integration (for calender and exchange server)
Javamail
Akka actors
On the first days, the application runs smoothly and without lags. But after 5 days or so, the application gets really slow! And now I have no clue how to profile this, since I have huge dependencies and it's hard to reproduce this kind of thing. I have looked into the memory and it seems that everything its okay.
Any pointers on the matter?
Try using VisualVM - you can monitor gc behaviour, memory usage, heap, threads, cpu usage etc. You can use it to connect to a remote VM.
`visualvm˙ is also a great tool for such purposes, you can connect to a remote JVM as well and see what's inside.
I suggest you doing this:
take a snapshot of the application running since few hours and since 5 days
compare thread counts
compare object counts, search for increasing numbers
see if your program spends more time in particular methods on the 5th day than on the 1str one
check for disk space, maybe you are running out of it
jconsole comes with the JDK and is an easy tool to spot bottlenecks. Connect it to your server, look into memory usage, GC times, take a look at how many threads are alive because it could be that the server creates many threads and they never exit.
I agree with tulskiy. On top of that you could also use JMeter if the investigations you will have made with jconsole are unconclusive.
The probable causes of the performances degradation are threads (that are created but never exit) and also memory leaks: if you allocate more and more memory, before having the OutOfMemoryError, you may encounter some performances degradation (happened to me a few weeks ago).
To eliminate your database you can monitor slow queries (and/or queries that are not using an index) using the slow query log
see: http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
I would hazard a guess that you have a missing index, and it has only become apparent as your data volumes have increased.
Yet another profiler is Yourkit.
It is commercial, but with trial period (two weeks).
Actually, I've firstly tried VisualVM as #axel22 suggested, but our remote server was ssh'ed and we had problems with connecting via VisualVM (not saying that it is impossible, I've just surrendered after a few hours).
You might just want to try the 'play status' command, which will list web app state (threads, jobs, etc). This might give you a hint on what's going on.
So guys, in this specific case, I was running play in Developer mode, which makes the compiler works every now and then.
After changing to production mode, everything was lightning fast and no more problems anymore. But thanks for all the help.
These things obviously require close inspection and availability of code to thoroughly analyze and give good suggestions. Nevertheless, that is not always possible and I hope it may be possible to provide me with good tips based on the information I provide below.
I have a server application that uses a listener thread to listen for incoming data. The incoming data is interpreted into application specific messages and these messages then give rise to events.
Up to that point I don't really have any control over how things are done.
Because this is a legacy application, these events were previously taken care of by that same listener thread (largely a single-threaded application). The events are sent to a blackbox and out comes a result that should be written to disk.
To improve throughput, I wanted to employ a threadpool to take care of the events. The idea being that the listener thread could just spawn new tasks every time an event is created and the threads would take care of the blackbox invocation. Finally, I have a background thread performing the writing to disk.
With just the previous setup and the background writer, everything works OK and the throughput is ~1.6 times more than previously.
When I add the thread pool however performance degrades. At the start, everything seems to run smoothly but then after awhile everything is very slow and finally I get OutOfMemoryExceptions. The weird thing is that when I print the number of active threads each time a task is added to the pool (along with info on how many tasks are queued and so on) it looks as if the thread pool has no problem keeping up with the producer (the listener thread).
Using top -H to check for CPU usage, it's quite evenly spread out at the outset, but at the end the worker threads are barely ever active and only the listener thread is active. Yet it doesn't seem to be submitting more tasks...
Can anyone hypothesize a reason for these symptoms? Do you think it's more likely that there's something in the legacy code (that I have no control over) that just goes bad when multiple threads are added? The out of memory issue should be because some queue somewhere grows too large but since the threadpool almost never contains queued tasks it can't be that.
Any ideas are welcome. Especially ideas of how to more efficiently diagnose a situation like this. How can I get a better profile on what my threads are doing etc.
Thanks.
Slowing down then out of memory implies a memory leak.
So I would start by using some Java memory analyzer tools to identify if there is a leak and what is being leaked. Sometimes you get lucky and the leaked object is well-known and it becomes pretty clear who is hanging on to things that they should not.
Thank you for the answers. I read up on Java VisualVM and used that as a tool. The results and conclusions are detailed below. Hopefully the pictures will work long enough.
I first ran the program and created some heap dumps thinking I could just analyze the dumps and see what was taking up all the memory. This would probably have worked except the dump file got so large and my workstation was of limited use in trying to access it. After waiting two hours for one operation, I realized I couldn't do this.
So my next option was something I, stupidly enough, hadn't thought about. I could just reduce the number of messages sent to the application, and the trend of increasing memory usage should still be there. Also, the dump file will be smaller and faster to analyze.
It turns out that when sending messages at a slower rate, no out of memory issue occured! A graph of the memory usage can be seen below.
The peaks are results of cumulative memory allocations and the troughs that follow are after the garbage collector has run. Although the amount of memory usage certainly is quite alarming and there are probably issues there, no long term trend of memory leakage can be observed.
I started to incrementally increase the rate of messages sent per second to see where the application hits the wall. The image below shows a very different scenario then the previous one...
Because this happens when the rate of messages sent are increased, my guess is that my freeing up the listener thread results in it being able to accept a lot of messages very quickly and this causes more and more allocations. The garbage collector doesn't run and the memory usage hits a wall.
There's of course more to this issue but given what I have found out today I have a fairly good idea of where to go from here. Of course, any additional suggestions/comments are welcome.
This questions should probably be recategorized as dealing with memory usage rather than threadpools... The threadpool wasn't the problem at all.
I agree with #djna.
Thread Pool of java concurrency package works. It does not create threads if it does not need them. You see that number of threads is as expected. This means that probably something in your legacy code is not ready for multithreading. For example some code fragment is not synchronized. As a result some element is not removed from collection. Or some additional elements are stored in collection. So, the memory usage is growing.
BTW I did not understand exactly which part of the application uses threadpool now. Did you have one thread that processes events and now you have several threads that do this? Have you probably changed the inter-thread communication mechanism? Added queues? This may be yet another direction of your investigation.
Good luck!
As mentioned by djna, it's likely some type of memory leak. My guess would be that you're keeping a reference to the request around somewhere:
In the dispatcher thread that's queuing the requests
In the threads that deal with the requests
In the black box that's handling the requests
In the writer thread that writes to disk.
Since you said everything works find before you add the thread pool into the mix, my guess would be that the threads in the pool are keeping a reference to the request somewhere. Th idea being that, without the threadpool, you aren't reusing threads so the information goes away.
As recommended by djna, you can use a Java memory analyzer to help figure out where the data is stacking up.
I have a program that starts up and creates an in-memory data model and then creates a (command-line-specified) number of threads to run several string checking algorithms against an input set and that data model. The work is divided amongst the threads along the input set of strings, and then each thread iterates the same in-memory data model instance (which is never updated again, so there are no synchronization issues).
I'm running this on a Windows 2003 64-bit server with 2 quadcore processors, and from looking at Windows task Manager they aren't being maxed-out, (nor are they looking like they are being particularly taxed) when I run with 10 threads. Is this normal behaviour?
It appears that 7 threads all complete a similar amount of work in a similar amount of time, so would you recommend running with 7 threads instead?
Should I run it with more threads?...Although I assume this could be detrimental as the JVM will do more context switching between the threads.
Alternatively, should I run it with fewer threads?
Alternatively, what would be the best tool I could use to measure this?...Would a profiling tool help me out here - indeed, is one of the several profilers better at detecting bottlenecks (assuming I have one here) than the rest?
Note, the server is also running SQL Server 2005 (this may or may not be relevant), but nothing much is happening on that database when I am running my program.
Note also, the threads are only doing string matching, they aren't doing any I/O or database work or anything else they may need to wait on.
My guess would be that your app is bottlenecked on memory access, i.e. your CPU cores spend most of the time waiting for data to be read from main memory. I'm not sure how well profilers can diagnose this kind of problem (the profiler itself could influence the behaviour considerably). You could verify the guess by having your code repeat the operations it does many times on a very small data set.
If this guess is correct, the only thing you can do (other than getting a server with more memory bandwidth) is to try and increase the locality of your memory access to make better use of caches; but depending on the details of the application that may not be possible. Using more threads may in fact lead to worse performance because of cores sharing cache memory.
Without seeing the actual code, it's hard to give proper advice. But do make sure that the threads aren't locking on shared resources, since that would naturally prevent them all from working as efficiently as possible. Also, when you say they aren't doing any io, are they not reading an input or writing an output either? this could also be a bottleneck.
With regards to cpu intensive threads, it is normally not beneficial to run more threads than you have actual cores, but in an uncontrolled environment like this with other big apps running at the same time, you are probably better off simply testing your way to the optimal number of threads.