I have a Java app running in a standalone JVM. The app listens for data on one or more sockets, queues the data, and has scheduled threads pulling the data off the queue and persisting it. The data is wide, over 700 data elements per record, though all of the data elements are small Strings, Integers, or Longs.
The app runs smoothly for periods of time, sometimes 30 minutes to an hour, but then we experience one or more long garbage collection pauses. The majority of the pause time is spent in the Object Copy time. The sys time is also high relative to the other collections.
Here is the JVM details:
java version "1.7.0_03"
Java(TM) SE Runtime Environment (build 1.7.0_03-b04)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)
Here are the JVM options:
-XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC -XX:-UseGCOverheadLimit
The process is taskset to 4 cores (all on the same socket), but is barely using 2 of them. All of the processes on this box are pinned to their own cores (with 0 ansd 1 unused). The machine has plenty of free memory (20+G) and top shows the process using 2.5G of RES memory.
Here is some of the gc log output...
[Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8 1608.1 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1 2190.1 2153.3 1604.1 1930.8 1892.6 1651.9
[Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap: 1062M(3072M)->47M(3072M)]
[Times: user=2.24 sys=7.22, real=2.49 secs]
Any ideas on why the Object Copy time and sys time are so high and how to rectify it? There are numerous garbage collections in the log with nearly identical Eden/Survivors/Heap sizes that are only taking 10 or 20 ms.
3gb is not a large heap and the survivor size is also small. Anything else running on those cores? How much garbage are you generating and how often is it collecting? You may want to try it without G1GC as well.
Do you need 3 gigabytes of heap? Garbage-collection pauses get longer as a heap gets bigger, since (though less frequent) there is more work to do when GC is finally needed.
It appears you're locking the heap at 3 gig by setting a minimum. If the app doesn't need 3 gig, this is going to force it to use that much anyway.. and result in a humungous pause when that 3G finally does need to be collected.
I spent quite some time tuning Eclipse IDE for responsiveness, and found very early on that compact heap-sizes had better 'low pause' characteristics than large.
Apart from JVM heap settings, you can make sure your code 'nulls out' data elements & collection items as these are discarded.
This is standard practice in the java.util Collections package, but perhaps you have code that could benefit from it. The 700-wide records could especially be a candidate for this practice, which helps to simplify things for the GC and enable better cleanup from the 'minor GC' sweeps.
Related
I would like to know how much I can increase heapsize in jmeter.bat and in Java for my 64bit Windows 8 OS, 16GB RAM to avoid GC overhead limit exceeded.
Also, can heapsize in jmeter.bat and java be same?
As per Excessive GC Time and OutOfMemoryError chapter of the Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning article
The concurrent collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.
The policy is the same as that in the parallel collector, except that time spent performing concurrent collections is not counted toward the 98% time limit. In other words, only collections performed while the application is stopped count toward excessive GC time. Such collections are typically due to a concurrent mode failure or an explicit collection request (e.g., a call to System.gc()).
It usually indicates some problems with your Java application, in your case - with JMeter test.
First of all, make sure you're following recommendations from 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure guide to get the most of your JMeter instance
Try switching to Concurrent Mark Sweep Garbage Collector. In order to use it add the next line to ARGS in JMeter startup script:
-XX:+UseConcMarkSweepGC
If it doesn't help and you're totally sure that your JMeter script and JVM parameters are fine you can turn off this behaviour via aforementioned -XX:-UseGCOverheadLimit setting
After running a few days the CPU load of my JVM is about 100% with about 10% of GC (screenshot).
The memory consumption is near to max (about 6 GB).
The tomcat is extremely slow at that state.
Since it's too much for a comment i'll write it up ans answer:
Looking at your charts it seems to be using CPU for non-GC tasks, peak "GC activity" seems to stay within 10%.
So on first impression it would seem that your task is simply CPU-bound, so if that's unexpected maybe you should do some CPU-profiling on your java application to see if something pops out.
Apart from that, based on comments I suspect that physical memory filling up might evict file caches and memory-mapped things, leading to increased page faults which forces the CPU to wait for IO.
Freeing up 500MB on a manual GC out of a 4GB heap does not seem all that much, most GCs try to keep pause times low as their primary goal, keep the total time spent in GC within some bound as secondary goal and only when the other goals are met they try to reduce memory footprint as tertiary goal.
Before recommending further steps you should gather more statistics/provide more information since it's hard to even discern what your actual problem is from your description.
monitor page faults
figure out which GC algorithm is used in your setup and how they're tuned (-XX:+PrintFlagsFinal)
log GC activity - I suspect it's pretty busy with minor GCs and thus eating up its pause time or CPU load goals
perform allocation profiling of your application (anything creating excessive garbage?)
You also have to be careful to distinguish problems caused by the java heap reaching its sizing limit vs. problems causing by the OS exhausting its physical memory.
TL;DR: Unclear problem, more information required.
Or if you're lazy/can afford it just plug in more RAM / remove other services from the machine and see if the problem goes away.
I learned to check this on GC problems:
Give the JVM enough memory e.g. -Xmx2G
If memory is not sufficient and no more RAM is available on the host, analyze the HEAP dump (e.g. by jvisualvm).
Turn on Concurrent Marc and Sweep:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Check the garbage collection log: -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
My Solution:
But I solved that problem finally by tuning the cache sizes.
The cache sizes were to big, so memory got scarce.
if you want keep the memory of your server free you can simply try the vm-parameter
-Xmx2G //or any different value
This ensures your program never takes more than 2 Gigabyte of Ram. But be aware if case of high workload the server may be get an OutOfMemoryError.
Since a old generation (full) GC may block your whole server from working for some seconds java will try to avoid a Full Garbage collection.
The Ram-Limitation may trigger a Full-Generation GC more easy (or even support more objects to be collected by Young-Generation GC).
From my (more guessing than actually knowing) opinion: I don't think another algorithm can help so much here.
I am running an application server on Linux 64bit with 8 core CPUs and 6 GB memory.
The server must be highly responsive.
After some inspection I found that the application running on the server creates rather a huge amount of short-lived objects, and has only about 200~400 MB long-lived objects(as long as there is no memory leak)
After reading http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
I use these JVM options
-server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC
Result: the minor GC takes 0.01 ~ 0.02 sec, the major GC takes 1 ~ 3 sec
the minor GC happens constantly.
How can I further improve or tune the JVM?
larger heap size? but will it take more time for GC?
larger NewSize and MaxNewSize (for young generation)?
other collector? parallel GC?
is it a good idea to let major GC take place more often? and how?
Result: the minor GC takes 0.01 ~ 0.02 sec, the major GC takes 1 ~ 3 sec the minor GC happens constantly.
Unless you are reporting pauses, I would say that the CMS collector is doing what you have asked it to do. By definition, CMS will use a larger percentage of the CPU than the Serial and Parallel collectors. This is the penalty you pay for low pause times.
If you are seeing 1 to 3 second pause times, I'd say that you need to do some tuning. I'm no expert, but it looks like you should start by reducing the value of CMSInitiatingOccupancyFraction from the default value of 92.
Increasing the heap size will improve the "throughput" of the GC. But if your problem is long pauses, increasing the heap size is likely to make the problem worse.
Careful .... GC can be a hairy subject if you are not cautious. Within any runtime (JVM for Java / CLR for .Net) there are several processes that take place. Generally there is an early stage optimization of memory (Young Generational Garbage Collection / Young Gen GC & Old Generational Garbage Collection / Old Gen GC). The young gen gc happens on a regular basis and is commonly attributed to your smaller pauses / hiccups. The old gen gc is normally what is going on when you see the long "stop the world" pauses.
Why you might ask? The reason you get pauses with your runtime / JVM is that when the runtime does its cleanup of the Heap it has to go through what is called a phase change. It stops the threads running your application in order to mark and swap pointers to optimize your available memory. Yong gen is faster as it is mainly releasing objects that are only temporary. Old gen, however, evaluates all the objects on the heap and when you run out of memory will it will kick of to free up much needed memory.
Why the Caution? Old gen gets exponentially worse in pause time the more heap you use. at 2-4 GB in total heap size you should be fine on modern runtimes like Java 6 (JDK 1.6+). Once you go beyond that threashold you will see exponential increases in pause times. I have run into some clients that have to restart servers - as in some rare cases where a heap is large GC pause times can take longer than a full restart.
There are some new tools out there that are pretty cool and can give you a leading edge on evaluating if GC is your pain. JHiccup is one and it is free from the azulsystemswebsite. At this time I think it is only for Linux though. They also have a JVM that has a re-built GC algorithm that runs pauseless ... but if you are on a single server deployment with a non-critical app it may not be cost effective (that one is not free).
To sum up - if your runtime / JVM / CLR heap is less than 2 GB adding more memory will help. Be sure to give yourself some overhead. You never want to hit 100% Heap size / memory size if possible. That is when the long pauses are the longest. Give yourself an extra 20%+ memory over what you think you will need. That way you have room for the GC algorithms to move objects around for optimization. If you ever plan to go large ... there is one tool that fixes the circa 1990 JVM technology (Azul Systems Zing JVM), but it is not free. They do offer an open source tool to diagnose GC issues. The JVM (as I have tried it) also has a really cool thread level visibility tool that lets you report on any leaks, bugs, or locks in production without overhead (some trick with offloading data the JVM already deals with and time stamping). That has saved tons of dev test time ... but again, not for small apps.
Stay below 4 GB. Give extra headroom. And if you want you can turn on these flags to monitor GC for Java / JVM:
java -verbose:gc myProgram
java -Xloggc:D:/log/myLogFile.log -XX:+PrintGCDetails myProgram
You may try some of the other collectors Hotspot uses. There are more than one.
If you are on Linux go ahead and try the JHiccup tool as well. It is free.
You may be interested in trying the low-pause Garbage-First collector instead of concurrent mark-sweep (although it's not necessarily more performant for all collections, it's supposed to have a better worst case). It's enabled by -XX:+UseG1GC and is supposed to be really awesomesauce, but you may want to give it a thorough evaluation before using it in production. It has probably improved since, but it seems to have been a bit buggy a year ago, as seen in Experience with JDK 1.6.x G1 (“Garbage First”)
It is perfectly fine for the garbage collection to run in parallel with your program, if you have plenty of cpu, which you do.
What you want, is to make absolutely certain that you will not run into a scenario where the garbage collection PAUSES your main program.
Have you tried simply not stating any flags except saying you want the server VM (for the Sun JVM), and then put your server under heavy load to see how it behaves? Only then can you see, if you get any improvements from tinkering with options.
This actually sounds like a throughput app and should probably use the throughput collector. I would balance the size of the new gen making it big enough to not GC too often and small enough to prevent long pauses. 20ms sounds like a long minor GC to me. I also suspect your survivor space is set too large and is just being wasted. If you don't have much surviving to old gen, you shouldn't have that much surviving your minor GCs.
In the end, you should use jvmstat and VisualGC to really get a feel for how your app is using memory.
For high responsive server application, I think you want to see the major GC happens less frequently. Here is the list of parameters would help.
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSWaitDuration=300000
-XX:GCTimeRatio=40
Larger heap size may not help on low pause, as long as your app doesn't run out of memory.
Larger NewSize and MaxNewSize would help on throughput, may not help on low pause. If you choose to take this approach, you may consider give GC threads more execution time by setting -XX:GCTimeRatio lower. The point is to remember to take a holistic when tuning JVM.
I think previous posters have missed something very obvious- the Perm Generation size is too low. If the system uses 200 to 400 MB as permanent generation- then it may be best to set Max Perm Gen to 400 MB. the PerGen size should also be set to the same value. You will then never run out of Permanent Generation Space.
Currently- it looks like the JVM has to spend a lot of time moving objects in and out of Permanent Generation. This can take time. JVM tries to allocate contiguous memory areas to Java Objects- this speeds memory access due to hardware level features. In order to do that, it is very helpful to have plenty of buffer in memory. If Permanent Generation is almost full, newly discovered permanent objects must be split or existing objects must be shuffled. This is what triggers a full GC, as well as causes long full GC pauses.
The question states that the Permanent Generation size has already been measured- if this has not been done, it should be measured using a tool. These tools process logs generated by the JVM with the verboseGC option turned on.
All the mark and sweep options listed above- may not be needed with this basic improvement.
People throw GC options as solutions without evaluating how mature they have proven to be in actual use.
Does the Sun JVM slow down when more memory is available and used via -Xmx? (Assumption: The machine has enough physical memory so that virtual memory swapping is not a problem.)
I ask because my production servers are to receive a memory upgrade. I'd like to bump up the -Xmx value to something decadent. The idea is to prevent any heap space exhaustion failures due to my own programming errors that occur from time to time. Rare events, but they could be avoided with my rapidly evolving webapp if I had an obscene -Xmx value, like 2048mb or higher. The application is heavily monitored, so unusual spikes in JVM memory consumption would be noticed and any flaws fixed.
Possible important details:
Java 6 (runnign in 64-bit mode)
4-core Xeon
RHEL4 64-bit
Spring, Hibernate
High disk and network IO
EDIT: I tried to avoid posting the configuration of my JVM, but clearly that makes the question ridiculously open ended. So, here we go with relevant configuration parameters:
-Xms256m
-Xmx1024m
-XX:+UseConcMarkSweepGC
-XX:+AlwaysActAsServerClassMachine
-XX:MaxGCPauseMillis=1000
-XX:MaxGCMinorPauseMillis=1000
-XX:+PrintGCTimeStamps
-XX:+HeapDumpOnOutOfMemoryError
By adding more memory, it will take longer for the heap to fill up. Consequently, it will reduce the frequency of garbage collections. However, depending on how mortal your objects are, you may find that how long it takes to do any single GC increases.
The primary factor for how long a GC takes is how many live objects there are. Thus, if virtually all of your objects die young and once you get established, none of them escape the young heap, you may not notice much of a change in how long it takes to do a GC. However, whenever you have to cycle the tenured heap, you may find everything halting for an unreasonable amount of time since most of these objects will still be around. Tune the sizes accordingly.
If you just throw more memory at the problem, you will have better throughput in your application, but your responsiveness can go down if you're not on a multi core system using the CMS garbage collector. This is because fewer GCs will occur, but they will have more work to do. The upside is that you will get more memory freed up with your GCs, so allocation will continue to be very cheap, hence the higher througput.
You seem to be confusing -Xmx and -Xms, by the way. -Xms just sets the initial heap size, whereas -Xmx is your max heap size.
More memory usually gives you better performance in garbage collected environments, at least as long as this does not lead to virtual memory usage / swapping.
The GC only tracks references, not memory per se. In the end, the VM will allocate the same number of (mostly short-lived, temporary) objects, but the garbage collector needs to be invoked less often - the total amount of garbage collector work will therefore not be more - even less, since this can also help with caching mechanisms which use weak references.
I'm not sure if there is still a server and a client VM for 64 bit (there is for 32 bit), so you may want to investigate that also.
According to my experience, it does not slow down BUT the JVM tries to cut back to Xms all the time and try to stay at the lower boundary or close to. So if you can effort it, bump Xms as well. Sun is recommending both at the same size. Add some -XX:NewSize=512m (just a made up number) to avoid the costly pile up of old data in the old generation with leads to longer/heavier GCs on the way. We are running our web app with 700 MB NewSize because most data is short-lived.
So, bottom line: I do not expect a slow down, but put your more of memory to work. Set a larger new size area and set Xms to Xmx to lower the stress on the GC, because it does not need to try to cut back to Xms limits...
It typically will not help your peformance/throughput,if you increase -Xmx.
Theoretically there could be longer "stop the world" phases but in practice with the CMS that's not a real problem.
Of course you should not set -Xmx to some insane value like 300Gbyte unless you really need it :)
I'm running a memory intensive app on a machine with 16Gb of RAM, and an 8-core processor, and Java 1.6 all running on CentOS release 5.2 (Final). Exact JVM details are:
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
I'm launching the app with the following command line options:
java -XX:+UseConcMarkSweepGC -verbose:gc -server -Xmx10g -Xms10g ...
My application exposes a JSON-RPC API, and my goal is to respond to requests within 25ms. Unfortunately, I'm seeing delays up to and exceeding 1 second and it appears to be caused by garbage collection. Here are some of the longer examples:
[GC 4592788K->4462162K(10468736K), 1.3606660 secs]
[GC 5881547K->5768559K(10468736K), 1.2559860 secs]
[GC 6045823K->5914115K(10468736K), 1.3250050 secs]
Each of these garbage collection events was accompanied by a delayed API response of very similar duration to the length of the garbage collection shown (to within a few ms).
Here are some typical examples (these were all produced within a few seconds):
[GC 3373764K->3336654K(10468736K), 0.6677560 secs]
[GC 3472974K->3427592K(10468736K), 0.5059650 secs]
[GC 3563912K->3517273K(10468736K), 0.6844440 secs]
[GC 3622292K->3589011K(10468736K), 0.4528480 secs]
The thing is that I thought the UseConcMarkSweepGC would avoid this, or at least make it extremely rare. On the contrary, delays exceeding 100ms are occurring almost once a minute or more (although delays of over 1 second are considerably rarer, perhaps once every 10 or 15 minutes).
The other thing is that I thought only a FULL GC would cause threads to be paused, yet these don't appear to be full GCs.
It may be relevant to note that most of the memory is occupied by a LRU memory cache that makes use of soft references.
Any assistance or advice would be greatly appreciated.
First, check out the Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning documentation, if you haven't already done so. This documentation says:
the concurrent collector does most of its tracing and sweeping work with the
application threads still running, so only brief pauses are seen by the
application threads. However, if the concurrent collector is unable to finish
reclaiming the unreachable objects before the tenured generation fills up, or if
an allocation cannot be satisfied with the available free space blocks in the
tenured generation, then the application is paused and the collection is completed
with all the application threads stopped. The inability to complete a collection
concurrently is referred to as concurrent mode failure and indicates the need to
adjust the concurrent collector parameters.
and a little bit later on...
The concurrent collector pauses an application twice during a
concurrent collection cycle.
I notice that those GCs don't seem to be freeing very much memory. Perhaps many of your objects are long lived? You may wish to tune the generation sizes and other GC parameters. 10 Gig is a huge heap by many standards, and I would naively expect GC to take longer with such a huge heap. Still, 1 second is a very long pause time and indicates either something is wrong (your program is generating a large number of unneeded objects or is generating difficult-to-reclaim objects, or something else) or you just need to tune the GC.
Usually, I would tell someone that if they have to tune GC then they have other problems they need to fix first. But with an application of this size, I think you fall into the territory of "needing to understand GC much more than the average programmer."
As others have said, you need to profile your application to see where the bottleneck is. Is your PermGen too large for the space allocated to it? Are you creating unnecessary objects? jconsole works to at least show a minimum of information about the VM. It's a starting point. As others have indicated however, you very likely need more advanced tools than this.
Good luck.
Since you mention your desire to cache, I'm guessing that most of your huge heap is occupied by that cache. You might want to limit the size of the cache so that you are sure it never attempts to grow large enough to fill the tenured generation. Don't rely on SoftReference alone to limit the size. As the old generation fills with soft references, older references will be cleared and become garbage. New references (perhaps to the same information) will be created, but cleared quickly because free space is in short supply. Eventually, the tenured space is full of garbage and needs to be cleaned.
Consider adjusting the -XX:NewRatio setting too. The default is 1:2, meaning that one-third of the heap is allocated to the new generation. For a large heap, this is almost always too much. You might want to try something like 9, which would keep 9 Gb of your 10 Gb heap for the old generation.
Turns out that part of the heap was getting swapped out to disk, so that garbage collection had to pull a bunch of data off the disk back into memory.
I resolved this by setting Linux's "swappiness" parameter to 0 (so that it wouldn't swap data out to disk).
Here are some things I have found which could be significant.
JSON-RPC can generate a lot of objects. Not as much as XML-RPC, but still something to watch for. In any case you do appear to be generating as much at 100 MB of objects per second which means your GC is running a high percentage of the time and is likely to be adding to your random latency. Even though the GC is concurrent, your hardware/OS is very likely to exhibit non-ideal random latency under load.
Have a look at your memory bank architecture. On Linux the command is numactl --hardware. If your VM is being split across more than one memory bank this will increase your GC times significantly. (It will also slow down your application as these accesses can be significantly less efficient) The harder you work the memory subsystem the more likely the OS will have to shift memory around (Often in large amounts) and you get dramatic pauses as a result (100 ms is not surprising). Don't forget your OS does more than just run your app.
Consider compacting/reducing the memory consumption of your cache. If you are using multiple GB of cache it is worth looking at ways to cut memory consumption further than you have already.
I suggest you profile your app with memory allocation tracing AND cpu sampling on at the same time. This can yield very different results and often points to the cause of these sort of problems.
Using these approaches, the latency of an RPC call can be reduced to below 200 micro-second and the GC times reduced to 1-3 ms effecting less than 1/300 of calls.
I'd also suggest GCViewer and a profiler.
Some places to start looking:
https://visualvm.dev.java.net/
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstat.html
http://www.javaperformancetuning.com/tools/gcviewer/index.shtml
Also I'd run the code through a profiler.. I like the one in NetBeans but there are other ones as well. You can view the gc behaviour in real time. The Visual VM does that as well... but I haven't run it yet (been looking for a reason to... but haven't had the time or the need yet).
I haven't personally used such a huge heap but I've experienced very low latency in general using following switches for Oracle/Sun Java 1.6.x:
-Xincgc -XX:+UseConcMarkSweepGC -XX:CMSIncrementalSafetyFactor=50
-XX:+UseParNewGC
-XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=2 -XX:ParallelGCThreads=2
-XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=5
-XX:GCTimeRatio=90 -XX:MaxGCPauseMillis=20 -XX:GCPauseIntervalMillis=1000
The important parts are, in my opinion, the use of CMS for tenured generation and ParNewGC for young generation. In addition, this adds a pretty big safety factor for CMS (default is 10% instead of 50%) and request short pause times. As you're targeting for 25 ms response time, I'd try setting -XX:MaxGCPauseMillis to even smaller value. You could even try to use more than two cores for concurrent GC but I would guess that is not worth the CPU usage.
You should probably also check the HotSpot JVM GC cheat sheet.