I have an application that performs a very sequential set of discrete tasks.
My problem is that one of the first tasks consumes a large amount of memory, and despite eliminating object references and invoking the garbage collector, only about half the memory is essentially freed. This impacts later tasks. The problem is also that I want to temporarily grant the JVM a large heap to efficiently manage the first task but I don't want this to stick around till the GC decides it's efficient to free the rest.
I had the idea of executing the memory-intensive task inside thread; the new child thread uses the parent JVM (no surprise here), but there appears to be no change in the memory management.
How does Java handle Thread memory? Is there a simple way to create a child heap for the subthread that can be dumped after the thread has finished?
As an addendum, here's what I actually want to do:
Setup a Neo4j graph database (I'm creating several million nodes, properties and relationships, along with numerous indexes) [memory intensive]
Perform queries on the graph database
No, heap is shared between threads and there isn't a way to reserve memory for a given thread or allow a thread to break limits. Threads are not processes (despite they are implemented this way in some jvm).
You could run this thread in a separate procss (different JVM) and pass data to it via files or sockets, but while it would solve memory problems, it could kill performances ... but depends on how much data you need to pass.
Use a memory profiler to find out which GC root is keeping the objects alive that you expected to be garbage-collected.
I expect, however, Neo4j is keeping those objects alive and there might be little you can do about it. After all, your graph and its indexes do need to be there for you to be able to perform queries on them.
You might be able to find some Neo4j API call to tell it to clean out some caches or something similar.
Related
I have a Java/Spring data transfer service that reads records in from a .csv file, parses them, collates them in memory, and then loads them into a database.
Each run of parses a file that contains ~800k records.
The service is deployed to a Kubernetes container. Neither the production environment nor the application design are ideal for this application, but I have to operate within some restrictions.
After two or three runs, the service crashes due to a memory error. We are seeing long garbage collection times, and assume that garbage collection is not keeping pace.
We are using the G1 garbage collector, and I want to tune the collection to prioritize memory over speed. I don't care how efficient or fast the service is, it only has to perform this data transfer a few times.
What settings will accomplish this?
We are seeing long garbage collection times, and assume that garbage collection is not keeping pace.
Long GC times are a symptom of the problem rather than the root cause of the problem. If the GC is simply not keeping up, that should not cause OOMEs.
(It possible that heavy use of finalizers, Reference objects or similar make it harder for the GC to keep up, but that is still a symptom. It seems likely that this is relevant in your use-case.)
My theory is that the real cause of the long collection times is that your heap is too small. When your heap is nearly full, the GC has to run more and more often and is able to reclaim less and less space. That leads to long collection times. Then finally, you get an OOME because either you run out of heap space entirely, or because you hit the GC overhead threshold.
Another possibility is that your heap is too big for the available RAM ... and you are getting virtual memory thrashing
In either case, simply tweaking the GC settings is not going to help. You need to identify the root cause of the problem before you can fix it.
My take is that either you have a memory leak, or not enough RAM, or there is a problem with your application's design.
On the design side, rather than reading / parsing the entire file as an in-memory data structure, use a streaming / event-based parser. Read records one at a time, process them and then discard them ... keeping as little information about them in memory as you can get away with. In other words, make the application less memory hungry.
I need help tuning one of our Microservices.
we are running a Spring based Microservice (Spring Integration, Spring Data JPA) on a jetty server in an OpenJDK8 Container. We are also using Mesosphere as our Container Orchestrating platform.
The application consumes messages from IBM MQ, does some processing and then stores the processed output in an Oracle DB.
We noticed that at some point on the 2nd of May that the queue processing stopped from our application. Our MQ team could still see that there were open connections against the queue, but the application was just not reading anymore. It did not die totally, as the healthCheck Api that DCOS hits still shows as healthy.
We use AppD for performance monitoring and what we could see is that on the same date there was a garbage collection done and from there the application never picked up messages from the queue. The graph above shows the amount of time spent doing GC on the different dates.
As part of the Java Opts we use to run the application we state
-Xmx1024m
The Mesosphere reservation for each of that Microservice is as shown below
Can someone please point me in the right direction to configure the right settings for Garbage Collection for my application.
Also, if you think that the GC is just a symptom, thanks for sharing your views on potential flaws I should be looking for.
Cheers
Kris
You should check up your code.
A GC operation will trigger a STW(Stop The World) operation which will block all the thread created in your code. But STW dosen't affect the code run state.
But gc will affect your code logic if you use such as System.currentTimeMillis to control you code run logic.
A gc operation will also effect the non-strong reference, if you're use WeakReference, SoftReference, WeakHashMap, after a full gc, these component may change their behavir.
A full gc operation is done,and freed memory dosen't allow your code to allocate new Object,your code will throw a 'OutOfMembryException' which will interrupt your code execution.
I think the things you should do now is:
First, check up the 'GC Cause', to determine if the full gc happend in System.gc() call or Allocate failed.
Then, if GC Cause is System.gc(), your should check up the non-strong reference used in your code.
Finally, if GC cause is Allocate failed, you should check up your log to determine weather there happend a OutOfMembryException in you code, if happend, you should allocate more memory to avoid OutOfMembryException.
As a suggestion, You SHOULD NOT keep your mq message in your microservice application memory. Mostlly, the source of gc problem is bad practice in your code.
I don't think that garbage collection is at fault here, or that you should be attempting to fix this by tweaking GC parameters.
I think it is one of two things:
A coincidence. A correlation (for a single data point) that doesn't imply causation.
Something about garbage collection, or the event that triggered the garbage collection has caused something to break in your application.
For the latter, there are any number of possibilities. But one that springs to mind is that something (e.g. a request) caused an application thread to allocate a really large object. That triggered a full GC in an attempt to find space. The GC failed; i.e. there still wasn't enough space after the GC did its best. That then turned into an OOME which killed the thread.
If the (hypothetical) thread that was killed by the OOME was critical to the operation application, AND the rest of the application didn't "notice" it had died, then the application as a whole would break.
One clue to look for would be an OOME logged when the thread died. But it is also possible (if the application is not written / configured appropriately) for the OOME not to appear in the logs.
Regarding the ApppD chart? Is that time in seconds? How many Full GCs do you have? Perhaps you should enable the log for the garbage collector.
Thanks for your contribution guys. We will be attempting to increase the CPU allocation from 0.5 CPU to 1.25 CPU, and execute another round of NFT tests.
We tried running the command below
jmap -dump:format=b,file=$FILENAME.bin $PID
to get a heap dump, but the utility is not present on the default OpenJDK8 container.
I have just seen your comments about CPU
increase the CPU allocation from 0.5 CPU to 1.25 CPU
Please, keep in mind that in order to execute the parallel GC you need at least two cores. I think with your configuration you are using serial collector and there is no reason to use a serial garbage collector nowadays when you can leverage the use of multiple cores. Have you consider trying at least two cores? I often use four as a minimum number for my application servers on production and performance.
You can see more information here:
On a machine with N hardware threads where N is greater than 8, the parallel collector uses a fixed fraction of N as the number of garbage collector threads. The fraction is approximately 5/8 for large values of N. At values of N below 8, the number used is N. On selected platforms, the fraction drops to 5/16. The specific number of garbage collector threads can be adjusted with a command-line option (which is described later). On a host with one processor, the parallel collector will likely not perform as well as the serial collector because of the overhead required for parallel execution (for example, synchronization). However, when running applications with medium-sized to large-sized heaps, it generally outperforms the serial collector by a modest amount on machines with two processors, and usually performs significantly better than the serial collector when more than two processors are available.
Source: https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/parallel.html
Raúl
I'm a bit of a novice when it comes to JAVA applications, but have been involved in developing a fairly complex JAVA(8) app that requires multi-threading. Myself and another developer have kept running into a problem where the app keeps running out of memory after running for a while.
At first we gave the application 64GB of memory, but after a few hours it'd run out of memory, crash and restart. Only to keep doing it over and over. Context; The application takes messages from a messaging system (ActiveMQ) and from the message's meta has to build an XML file by calling various data sources for values. There could be literally millions of messages that need to be processed, so we developed a multi-threading system, each thread deal with a message - and gave the application 40 threads.
However, as it keeps taking messages the overall memory consumption goes up and up over time. I feel like the garbage collector isn't being utilized by us correctly?
So at the moment we have one parent thread:
(new Thread(new ReportMessageConsumer(config, ""))).start();
Then within the ReportMessageConsumer we have X number of threads setup, so this would be 40 in our current setup. So this would be all under this one group. Once the XML has been built and the threads done with how do we effectively kill the thread and enforce the Garbage collector to free that memory, so that we can then create a new clean thread to pick up another message?
I feel like the garbage collector isn't being utilized by us correctly?
That is not the problem. The best thing you can do is to let the GC do its thing without any interference. Don't try to force the GC to run. It is rarely helpful, and often bad for performance.
The real problem is that you have a memory leak. It may be happening because you are getting more and more threads ... or it may be something else.
I would recommend the following:
Rewrite you code so that it uses a ExecutorService to manage a bounded pool of threads, and a queue of tasks to be run on those threads. Look at the javadocs for a simple example.
Using a thread pool is likely to improve your application's overall performance. Creating a thread (i.e. Thread.start()) is rather expensive in Java.
(And don't shut down the pool as a way to ensure that a batch of work has completed. That is bad for performance. The simple way to do that is to submit the batch using invokeAll; see ExecutorService, how to wait for all tasks to finish.)
If that doesn't cure your leak, then use a memory profiling tool to find out how / why your application is leaking memory. There are lots of StackOverflow Q&A's on how to do this. For example:
How to find a Java Memory Leak
How to find memory leak in java using JProfiler?
How to find memory leaks using visualvm
I have the code which reads a set of binary files which essentially consist from a lot of serialized java objects. I'm trying to parallelize the code, by running the reading of the files in the thread pool ( Executors.newFixedThreadPool )
What I'm seeing is that when threaded, the reading runs actually slower than in a single thread -- from 1.5 to 10 time slower,depending on the number of threads.
In my test-case I'm actually reading the same file (35mb) from multiple threads, so I'm not bound by I/O in any way. I do not run more threads than CPUs and I do not have any synchronization between pools -- i.e. I'm just processing independently a bunch of files.
Does anyone have an idea what could be a possible reason for this slow performance when threaded ? What should I look for ? Or what's the best way to dissect the problem? I already looked for static variables in the classes, which could be shared between threads and I don't see any.
Can one of the java.* classes when instantiated in the thread run significantly slower, (e.g. java.zip.deflate which I'm using)?
Thanks for any hints.
Upd: Another interesting hint is that when a single thread is running the execution time of the function which does the reading is constant to high precision, but when running multiple threads, I see significant variation in timings.
Sounds to me like you are expecting a java.zip.deflate read of 35mb to run faster when you add multiple threads doing the same job. It won't. In fact, although you may not be IO bound, you are still incurring kernel overhead with each thread that you add -- buffer copies, etc.. Even if you are reading entirely out of kernel buffer space, you incur CPU and processing overhead.
That said, I am surprised that you incur 1.5 to 10 times slower. If each of your processing threads is then writing output then obviously that won't be cached.
However I suspect that you may be incurring memory contention. If you are handling a Java serialized object stream, you need to watch your memory consumption unless you are resetting it often. Serialization keeps a lot of references around to objects so that large contiguous streams can generate a tremendous amount of GC bandwidth.
I'd connect to your program using jconsole and watch the memory tab closely. As the survivor and old-gen spaces fill you will see non-linear CPU implications.
Just because all thread workers are reading from same file does not mean for sure it is not IO bound. It might be. It might not be. To be sure, setup your test case so that all thread workers are reading from a file in memory vs. off disk.
You mentioned above that you believe the OS has cached the file, but do you know for sure if the file is being opened in read-only/shared mode? If not, then the OS could still be locking the file to insure only one thread has access at a time.
Potentially related links:
Reading a single file with Multiple Thread: should speed up?
Java multi-thread application that reads a single file
The problem was caused by java.util.zip.Inflate class which actually has lot of synchronized methods (because several of them use native code), so when multiple threads are being run, the synchronized methods are competing with each other and making the code very close to sequential.
The solution was to replace the java.util.zip classes by the java only version from GNU classpath (e.g. from here http://git.savannah.gnu.org/cgit/classpath.git/tree/java/util/zip)
These things obviously require close inspection and availability of code to thoroughly analyze and give good suggestions. Nevertheless, that is not always possible and I hope it may be possible to provide me with good tips based on the information I provide below.
I have a server application that uses a listener thread to listen for incoming data. The incoming data is interpreted into application specific messages and these messages then give rise to events.
Up to that point I don't really have any control over how things are done.
Because this is a legacy application, these events were previously taken care of by that same listener thread (largely a single-threaded application). The events are sent to a blackbox and out comes a result that should be written to disk.
To improve throughput, I wanted to employ a threadpool to take care of the events. The idea being that the listener thread could just spawn new tasks every time an event is created and the threads would take care of the blackbox invocation. Finally, I have a background thread performing the writing to disk.
With just the previous setup and the background writer, everything works OK and the throughput is ~1.6 times more than previously.
When I add the thread pool however performance degrades. At the start, everything seems to run smoothly but then after awhile everything is very slow and finally I get OutOfMemoryExceptions. The weird thing is that when I print the number of active threads each time a task is added to the pool (along with info on how many tasks are queued and so on) it looks as if the thread pool has no problem keeping up with the producer (the listener thread).
Using top -H to check for CPU usage, it's quite evenly spread out at the outset, but at the end the worker threads are barely ever active and only the listener thread is active. Yet it doesn't seem to be submitting more tasks...
Can anyone hypothesize a reason for these symptoms? Do you think it's more likely that there's something in the legacy code (that I have no control over) that just goes bad when multiple threads are added? The out of memory issue should be because some queue somewhere grows too large but since the threadpool almost never contains queued tasks it can't be that.
Any ideas are welcome. Especially ideas of how to more efficiently diagnose a situation like this. How can I get a better profile on what my threads are doing etc.
Thanks.
Slowing down then out of memory implies a memory leak.
So I would start by using some Java memory analyzer tools to identify if there is a leak and what is being leaked. Sometimes you get lucky and the leaked object is well-known and it becomes pretty clear who is hanging on to things that they should not.
Thank you for the answers. I read up on Java VisualVM and used that as a tool. The results and conclusions are detailed below. Hopefully the pictures will work long enough.
I first ran the program and created some heap dumps thinking I could just analyze the dumps and see what was taking up all the memory. This would probably have worked except the dump file got so large and my workstation was of limited use in trying to access it. After waiting two hours for one operation, I realized I couldn't do this.
So my next option was something I, stupidly enough, hadn't thought about. I could just reduce the number of messages sent to the application, and the trend of increasing memory usage should still be there. Also, the dump file will be smaller and faster to analyze.
It turns out that when sending messages at a slower rate, no out of memory issue occured! A graph of the memory usage can be seen below.
The peaks are results of cumulative memory allocations and the troughs that follow are after the garbage collector has run. Although the amount of memory usage certainly is quite alarming and there are probably issues there, no long term trend of memory leakage can be observed.
I started to incrementally increase the rate of messages sent per second to see where the application hits the wall. The image below shows a very different scenario then the previous one...
Because this happens when the rate of messages sent are increased, my guess is that my freeing up the listener thread results in it being able to accept a lot of messages very quickly and this causes more and more allocations. The garbage collector doesn't run and the memory usage hits a wall.
There's of course more to this issue but given what I have found out today I have a fairly good idea of where to go from here. Of course, any additional suggestions/comments are welcome.
This questions should probably be recategorized as dealing with memory usage rather than threadpools... The threadpool wasn't the problem at all.
I agree with #djna.
Thread Pool of java concurrency package works. It does not create threads if it does not need them. You see that number of threads is as expected. This means that probably something in your legacy code is not ready for multithreading. For example some code fragment is not synchronized. As a result some element is not removed from collection. Or some additional elements are stored in collection. So, the memory usage is growing.
BTW I did not understand exactly which part of the application uses threadpool now. Did you have one thread that processes events and now you have several threads that do this? Have you probably changed the inter-thread communication mechanism? Added queues? This may be yet another direction of your investigation.
Good luck!
As mentioned by djna, it's likely some type of memory leak. My guess would be that you're keeping a reference to the request around somewhere:
In the dispatcher thread that's queuing the requests
In the threads that deal with the requests
In the black box that's handling the requests
In the writer thread that writes to disk.
Since you said everything works find before you add the thread pool into the mix, my guess would be that the threads in the pool are keeping a reference to the request somewhere. Th idea being that, without the threadpool, you aren't reusing threads so the information goes away.
As recommended by djna, you can use a Java memory analyzer to help figure out where the data is stacking up.