I have a repeating process that:
gets some data from the database
builds some objects in memory, adding to a Collection
writes the data from the Collection to a file
All of the objects/Collections go out of scope or are set to null after each iteration. (The Collection is reused for each iteration.)
Using Java VisualVM, I see a graph that looks like this, which seems very odd considering that it's a repeating process. Yes, the data coming back from the database is different, but it's generally the same amount.
Why does the heap size decrease at first?
Why does the used heap get so close to the heap size in the middle?
(the ~30-second blip at 1:43 was just when VisualVM froze momentarily)
I'm not as big of an expert on GC as some are, but the general idea is that when you've started the program you've given it the initial heap size, max heap size and other relevant parameters and then it's go time.
However the GC has plenty of intelligence and different algorithms that are optimized for different kinds of tasks. A naive implementation would just keep the heap size static and then collect the garbage when it's full. That's known as a "stop the world" collection, because the collector needs to stop everything so it can perform a little (or big) clean up.
Modern GC doesn't just cause long pauses into running programs because it needs to clean up, so there's always a little clean up going on as seen from the sawtooth. But when you start a program the GC has no idea what the program is going to do and how it will use memory. Therefore it has to observe what's happening, analyze memory usage and then decide what amounts of memory it needs to keep available for immediate use, whether it needs to grow the current heap size or if it can decrease the current heap size.
Depending on the behaviour of your program and the GC algorithm being used you can see a lot of different patterns. As long as you're not experiencing linear growth that ends up in an OutOfMemoryError, you should be relatively safe. Unless of course you want to optimize what's happening to increase throughput, responsiveness etc., but that's a more advanced subject and is more relevant when you've gotten your code working the way you want it.
Related
I have troubles with Java memory consumption.
I'd like to say to Java something like this: "you have 8GB of memory, please use it, and only it. Only if you really can't put all your resources in this memory pool, then fail with OOM".
I know, there are default parameters like -Xmx - they limit only the heap. There are also plenty of other parameters, I know. The problems with these parameters are:
They aren't relevant. I don't want to limit the heap size to 6GB (and trust that native memory won't take more than 2GB). I do want to limit all the memory (heap, native, whatever). And do that effectively, not just saying "-Xmx1GB" - to be safe.
There is too many different parameters related to memory, and I don't know how to configure all of them to achieve the goal.
So, I don't want to go there and care about heap, perm and whatever types of memory. My high-level expectation is: since there is only 8GB, and some static memory is needed - take the static memory from the 8GB, and carefully split the remaining memory between other dynamic memory entities.
Also, ulimit and similar things don't work. I don't want to kill the java process once it consumes more memory than expected. I want Java does its best to not reach the limit firstly, and only if it really, really can't - kill the process.
And I'm OK to define even 100 java parameters, why not. :) But then I need assistance with the full list of needed parameters (for, say, Java 8).
Have you tried -XX:MetaspaceSize?
Is this what you need?
Please, read this article: http://karunsubramanian.com/websphere/one-important-change-in-memory-management-in-java-8/
Keep in mind that this is only valid to Java 8.
AFAIK, there is no java command line parameter or set of parameters that will do that.
Your best bet (IMO) is to set the max heap size and the max metaspace size and hope that other things are going to be pretty static / predictable for your application. (It won't cover the size of the JVM binary and it probably won't cover native libraries, memory mapped files, stacks and so on.)
In a comment you said:
So I'm forced to have a significant amount of memory unused to be safe.
I think you are worrying about the wrong thing here. Assuming that you are not constrained by address space or swap space limitations, memory that is never used doesn't matter.
If a page of your address space is not used, the OS will (in the long term) swap it out, and give the physical RAM page to something else.
Pages in the heap won't be in that situation in a typical Java application. (Address space pages will cycle between in-use and free as the GC moves objects within and between "spaces".)
However, the flip-side is that a GC needs the total heap size to be significantly larger than the sum of the live objects. If too much of the heap is occupied with reachable objects, the interval between garbage collection runs decreases, and your GC ergonomics suffer. In the worst case, a JVM can grind to a halt as the time spent in the GC tends to 100%. Ugly. The GC overhead limit mechanism prevents this, but that just means that your JVM gets an OOME sooner.
So, in the normal heap case, a better way to think about it is that you need to keep a portion of memory "unused" so that the GC can operate efficiently.
This is GC diagram from visualvm for a simple application that listens for some incoming stream of data trough websocket... At start it creates a lot of garbage, but as you can see it gets better over time... Is this JIT figuring out somehow how to avoid creating objects?
There are some very specific cases, where the JIT can remove allocations and therefore reduce the pressure on the GC. Mainly with escape analysis. Basically if the object lives only withing one method and never leaves it, it can be allocated on the stack instead of the heap, reducing work of the garbage collector.
If you want to know for sure: You can disable escape analysis: Use the command line argument -XX:-DoEscapeAnalysis and see if the graph changes.
However there many other self tuning mechanisms present. Like the runtime system notices that you don't need as much memory, and therefore start to reduce the heap size. Your graph would match that. As most of the memory can always be freed, the memory system reduces the heap size: With more frequent but smaller GC's.
When viewing my remote application in JVisualVM over JMX, I see a saw-tooth of memory usage while idle:
Taking a heap dump and analysing it with JVisualVM, I see a large chunk of memory is in a few big int[] arrays which have no references and by comparing heap dumps I can see that it seems to be these that are taking the memory and being reclaimed by a GC periodically.
I am curious to track these down since it piqued my interest that my own code never knowingly allocates any int[] arrays.
I do use a lot of libs like netty so the culprit could be elsewhere. I do have other servers with much the same mix of frameworks but don't see this sawtooth there.
How can I discover who is allocating them?
Take a heapdump and find out what objects are holding them. Once you know what objects are holding the arrays you should have an easy time idea figuring out what is allocating them.
It doesn't answer your question, but my question is:
Why do you care?
You've told the jvm garbage collector (GC) it can use up to 1GB of memory. Java is using less than 250M.
The GC tries to be smart about when it garbage collects and also how hard it works at garbage collection. In your graph, there is no demand for memory. The jvm isn't anywhere near that 1GB limit you set. I see no reason the GC should try very hard at all. Not sure why you would care either.
Its a good thing for the garbage collector to be lazy. The less the GC works, the more resources there are available for your application.
Have you tried triggering GC via the JVisualVM "Perform GC" button? That button should trigger a "stop the world" garbage collection operation. Try it when the graph is in the middle of one of those saw tooth ramp ups - I predict that the usage will drop to the base of the saw tooth or below. If it does, that proves that the memory saw tooth is just garbage accumulation and GC is doing the right thing.
Here is an screenshot of memory usage for a java swing application I use:
Notice the sawtooth pattern.
You said you are worried about int[]. When I start the memory profiler and have it profile everything I can see the allocations of int[]
Basically all allocations come from an ObjectOutputStream$HandleTable.growEntries method. It looks like the thread the allocations were made on was spun up to handle a network message.
I suspect its caused by jmx itself. Possibly by rmi (do you use rmi?). Or the debugger (do you have a debugger connected?).
I just thought I'd add to this question that the sawtooth pattern is very much normal and has nothing necessarily to do with your int[] arrays. It happens because new allocations happen in the Eden-gen, and an ephemeral collection only is triggered once it has filled up, leaving the old-gen be. So as long as your program does any allocations at all, the Eden gen will fill up and then empty repeatedly. Especially, then, when you have a regular amount of allocations per unit of time, you'll see a very regular sawtooth pattern.
There are tons of articles on the web detailing how Hotspot's GC works, so there's no need for me to expand on that here. If you don't know at all how ephemeral collection works, you may want to check out Wikipedia's article on the subject (see the "Generational GC" section; "generational" and "ephemeral" are synonymous in this context).
As for the int[] arrays, however, they are a bit mysterious. I'm seeing those as well, and there's another question here on SO on the subject of them without any real answer. It's not actually normal for objects with no references to show up in a heap dump, because a heap dump normally only contains live objects (because Hotspot always performs a stop-the-world collection before actually dumping the heap). My personal guess is that they are allocated as part of some kind of internal JVM data-structure (and therefore only have references from the C++ part of Hotspot rather than from the Java heap), but that's really just a pure guess.
I have a Java client which consumes a large amount of data from a server. If the client does not keep up with the data stream at a fast enough rate, the server disconnects the socket connection. My client gets disconnected a few times per day. I ran jconsole to see the memory usage, and the heap space graph looks like a fairly well defined sawtooth pattern, oscillating between about 0.5GB and 1.8GB (2GB of heap space is allocated). But every time I get disconnected is during a full GC (but not on every full GC). I see the full GC takes a bit over 1 second on average. Depending on the time of day, full GC happens as often as every 5 minutes when busy, or up to 30 minutes can go by in between full GCs during the slow periods.
I suspect if I can reduce the full GC time, the client will be able to better keep up with the incoming data, but I do not have much experience with GC tuning. Does anyone have some insight on if this might be a good idea, and how to do it? Or is there an alternative idea which may work as well?
** UPDATE **
I used -XX:+UseConcMarkSweepGC and it improved, but I still got disconnected during the very busy moments. So I increased the heap allocation to 3GB to help weather through the busy moments and it seems to be chugging along pretty well now, but it's only been 1 day without a disconnection. Maybe if I get some time I will go through and try to reduce the amount of garbage created which I'm confident will help as well. Thanks for all the suggestions.
Full GC could take very long to complete, and is not that easy to tune.
One way to (easily) tune it is to increase the heap space - generally speaking, double the heap space can double the interval between two GCs, but will double the time consumed by a GC. If the program you are running has very clear usage patterns, maybe you can consider increase the heap space to make the interval so large that you can guarantee to have some idle time to try to make the system perform a GC. On the other hand, following this logic, if the heap is small a full garbage collection will finish in a instant, but that seems like inviting more troubles than helping.
Also, -XX:+UseConcMarkSweepGC might help since it will try to perform the GC operations concurrently (not stopping your program; see here).
Here's a very nice talk by Til Gene (CTO of Azul systems, maker of high performance JVM, and published several GC algos), about GC in JVM in general.
It is not easy to tune away the Full GC. A much better approach is to produce less garbage. Producing less garbage reduces pressure on the collection to pass objects into the tenured space where they are more expensive to collect.
I suggest you use a memory profiler to
reduce the amount of garbage produced. In many applications this can be reduce by a factor of 2 - 10x relatively easily.
reduce the size of the objects you are creating e.g. use primitive and smaller datatypes like double instead of BigDecimal.
recycle mutable object instead of discarding them.
retain less data on the client if you can.
By reducing the amount of garbage you create, objects are more likely to die in the eden, or survivor spaces meaning you have far less Full collections, which can be shorter as well.
Don't take it for granted you have to live with lots of collections, in extreme cases you can avoid it almost completely http://vanillajava.blogspot.ro/2011/06/how-to-avoid-garbage-collection.html
Take out calls to Runtime.getRuntime().gc() - When garbage collection is triggered manually it either does nothing or it does a full stop-the-world garbage collection. You want incremental GC to happen.
Have you tried using the server jvm from a jdk install? It changes a bunch of the default configuration settings (including garbage collection) and is easy to try - just add -server to your java command.
java -server
What is all the garbage that gets created? Can you generate less of it? Where possible, try to use the valueOf methods. By using less memory you'll save yourself time in gc AND in memory allocation.
This is related to my question Java Excel POI stops after multiple execution by quartz.
My program stops unexpectedly after a few iterations. I tried profiling and found that I was consuming a lot of heap memory per iteration (And a memory leak somewhere.. havn't found the bugger yet). So, as a temporary solution, I tried inserting System.gc(); at the end of each complete execution of the program (kindly read the linked question for a brief description of the program). I was not expecting much, maybe a few more heap space available after each iteration. But, it appears that the program uses less heap memory when I inserted System.gc();.
The top graph shows the program running with the System.gc(); while the bottom graph is the one without. As you can see the top graph shows that I'm only using less than a 100mb after 4 iteratioins of the program while the bottom graph shows over 100mb in usage for the same amount of iterations. Can anyone clarify how and why System.gc(); causes this effect in my heap? If there are any disadvantages if I were to use this in my program? Or I'm completely hopless in programming and take up photography instead?
Note that I inserted GC at the end of each program iteration. So I assume that heap usage must be the same as without the GC inserted until it meets the the System.gc(); command
Thanks!
Can anyone clarify how and why System.gc(); causes this effect in my heap?
System.gc is kind of a request service for the Garbage Collector to Run. Note that I have used request and not trigger in my statement. GC based upon the heap state might/not carry on collection.
If there are any disadvantages if I were to use this in my program?
From experience, GC works best when left alone. In your example you shouldn't worry or use System.gc. Because GC will run when it is best to run and manually requesting it might reduce the performance. Even though only a small difference, you can observe that "time spent on gc" is better in the below graph than the first one.
As per memory, both the graphs are OK. Seems like your max heap is a bit high. Hence GC did-not run it in second graph. If it was really required, it would have ran it.
As per the Java specs, calling gc() does not guarantee that it will run, you only hint to the JVM that you need it to run, so the result is unreliable (You should avoid calling gc() at not matter what). But, in your case here and since the heap is reaching critical limits incrementally, that's why perhaps your hints are being executed.
GC usually runs based on specific algorithms to prevent the heap from being exhausted and when it fails to reclaim the much needed space while having no more heap for you app to survive, you'll face the OutOfMemoryException.
While the GC is running, your application will experience some pauses as a result of its activities, so you won't really want it to run more often!
Your best bet is to solve the leak and practice better memory management for a healthy runtime experience.
Using System.gc() shouldn't impact the heap size allocated to JVM. Heap size is dependent only on startup arguments we provide to our JVM. I will recommend you to run the same program 3-4 times and take average values with System.gc() and without.
Coming back to the problem of finding the memory leak; I will recommend to use JProfiler or other tools which would tell you exact memory footprint; and different objects in the heap.
Last but not the least; you are a reasonable programmer. No need of going for a photo shoot :)