Hello I am having a case of 150GB heap memory program using In Memory Data grid. I have some crazy requirement from the operational department to use a single machine. Now we all know what happens in if the parallel garbage collector is used over 150GB probably it will be tens of minutes of garbage collection if the FULL GC is invoked.
My hope was that with Java 9 is coming Shenandoah low pause GC. Unfortunately from what I see it is not listed for delivery in Java 9. Does anyone knows anything about that ?
Never the less, I am wondering how G1 GC will perform for this amount of Heap memory.
And one last question. Since I have non interactive batch application that is supposed to complete in 2 hours lets say. The main goal here is to ensure that the Full GC never kicks in. If I ensure that there is plenty of memory lets say if the maximum heap that can be reached is 150 and I allocate it 250GB may I say with good confidence that the Full GC will never kick in or ? Usually full GC is triggered if the new generation + the old generation touches the maximum heap. Can it be triggered in a different way ?
There is a duplicate request made I will try to explain here why this question is not a duplicate. First we are talking about 150GB Heap which adds completely different dimension to the question. Second I dont use RMI as it is in the question mentioned, third I am asking question about G1 garbage collector in between the lines.Also once we go beyond the 32GB heap barrier we are entering the 64 bit address space you can not convince me that a question in regards of <32GB Heap is the same as a question with heap >32GB Not to mentioned that things have changed a bit since Java 7 for instance PermSpace does not exist.
The rule of thumb for a compacting GC is that it should be able to process 1 GB of live objects per core per second.
Example on an Haswell i7 (4 cores/8 threads) and 20GB heap with the parallel collector:
[24.757s][info][gc,heap ] GC(109) PSYoungGen: 129280K->0K(917504K)
[24.757s][info][gc,heap ] GC(109) ParOldGen: 19471666K->7812244K(19922944K)
[24.757s][info][gc ] GC(109) Pause Full (Ergonomics) 19141M->7629M(20352M) (23.791s, 24.757s) 966.174ms
[24.757s][info][gc,cpu ] GC(109) User=6.41s Sys=0.02s Real=0.97s
The live set after compacting is 7.6GB. It takes 6.4 seconds worth of cpu-time, due to parallelism this translates to <1s pause time.
In principle the parallel collector should be able to handle a 150GB heap with full GC times < ~2 minutes on a multi-core system, even when most of the heap consists of live objects.
Of course this is just a rule of thumb. Some things that can affect it negatively:
paging
thermal CPU throttling
workloads consisting of very large, reference-heavy objects
non-local memory traffic in NUMA configurations
other processes competing for CPU time
heavy use of weak/soft references
In some cases tuning may be necessary to achieve this throughput.
If the Parallel collector does not work despite all that then CMS and G1 can be viable alternatives but only if there is enough spare heap capacity and CPU cores available to the JVM. They need significant breathing room to do their concurrent work without risking a full GC.
It is correct I said no interactive, but still I have a strict license agreements. I need to be finished with the whole processing in an hour. So I can no afford 30 minutes stop the world event.
Basically, you don't really need low pause times in the sense that CMS, G1, Shenandoah or Zing aim for (they aim for <100ms or even <10ms even on large heaps).
All you need is that STW pauses are not so catastrophically bad that they eat a significant portion of your compute time.
This should be feasible with most of the available collectors, ignoring the serial one.
In practice there are some pathological edge cases where they may fall down, but to get to that point you need setup a system with your actual workload and do some test runs. If you experience some real problems, then you can ask a question with more details.
Related
I use Kafka 2.1.0.
We have a Kafka cluster with 5 brokers (r5.xlarge machines). We often observe that the GC timings increase too much without any change in the rate of incoming messages severely impacting the performance of the cluster. Now, I don't understand what could be causing much sudden increase in GC time.
I have tried a few things with little improvement but I don't really understand the reason behind them.
export KAFKA_HEAP_OPTS="-Xmx10G -Xms1G"
export KAFKA_JVM_PERFORMANCE_OPTS="-XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80"
I would like to understand the most important parameters when tuning GC in a Kafka broker.
Seeing the configuration above, where am I going wrong? What can be done to rectify this?
All the producers and consumers are working fine, and the rate of incoming messages remains fairly constant. Till now, we have not been able to figure out any pattern behind the sudden increase in GC times, it seems random.
UPDATE
After some further analysis, It turns out there was indeed some increase in the amount of data per sec. One of the topics had increased message input from around 10 KBps to 200 KBps. But I believed that Kafka could easily handle this much of data.
Is there something I am missing??
Grafana Snapshot
I would start by looking to see if the problem is something else than a GC tuning issue. Here are a couple of possibilities:
A hard memory leak will cause GC times to increase. The work done by a GC is dominated by tracing and copying of reachable objects. If you have a leak, then more and more objects will be (incorrectly) reachable.
A cache that that is keeping too many objects reachable will also increase GC times.
Excessive use of Reference types, finalizers, etc may increase GC times.
I would enable GC logging, and look for patterns in memory and space utilization reported by the GC. If you suspect a memory leak because memory utilization is trending higher in the long term, go to the next step and use a memory profile to track down the leak.
Either way, it is important to understand what is causing the problem before trying to fix it.
After some further analysis, it turns out there was indeed some increase in the amount of data per sec. One of the topics had increased message input from around 10 KBps to 200 KBps. But I believed that Kafka could easily handle this much of data.
It most likely can. However, a 20x increase in throughput will inevitably lead to more objects being created and discarded ... and the GC will need to run more often to deal with this.
How come just 200 Kbps of data divided among 5 brokers was able to break GC.
What makes you think that you have "broken" the GC? 15% time in GC doesn't mean it is broken.
Now, I can imagine that the GC may have difficulty meeting your 20ms max pause time goal, and may be triggering occasional full GCs as a result. Your pause time goal is "ambitious", especially if the heap may grow to 10GB. I would suggest reducing the heap size, increasing the pause time goal, and/or increasing the number of physical cores available to the JVM(s).
By breaking I mean an increased delay in committing offsets and other producer and consumer offsets.
So ... you are just concerned that a 20 x increase in load has resulted in the GC using up to 15% of available CPU. Well that's NOT broken. That is (IMO) expected. The garbage collector is not magic. It needs to use CPU time to do its work. The more work it has to do, the more CPU it needs to use to do it. If your application's workload involves a lot of object allocation, then the GC has to deal with that.
In addition to the tuning ideas above, I suspect that you should set the G1HeapRegionSize size a lot smaller. According to "Garbage First Garbage Collector Tuning" by Monica Beckwith, the default is to have 2048 regions based on the minimum heap size. But your setting will give 1G / 16M == 64 initial regions.
Finally, if your overall goal is to reduce the CPU utilization of the GC, then you should be using the Throughput GC, not G1GC. This will minimize GC overheads. The downside is that GC pause minimization is no longer a goal, so occasional lengthy pauses are to be expected.
And if you plan to stay with G1GC, it is advisable to use the latest version of Java; i.e. Java 11. (See "G1 Garbage Collector is mature in Java 9, finally")
Kafka 2.1 uses G1GC by default, so I guess you can omit that argument. I'm assuming you're not using JDK 11. Compared to previous versions, JDK 11 brings significant improvement to G1GC. Instead of running a single-threaded full GC cycle, it can now achieve parallel processing. Even though that shouldn't improve the best case scenarios by a big margin, but worst case scenarios should see significant improvement. If possible, please share your results after migrating to JDK 11.
Note: I doubt that's the root cause, but let's see.
I was doing some reading and found out that when GC occurs with a "stop and copy" strategy or any other the program execution is rightfully stopped to have an effective GC. This brings me to the question, How can the performance of java application be ensured when there is no control when garbage collection occurs as the GC stops the program execution when it occurs?
Say we have several threads running one of which is working on a very performance critical activity, which requires milliseconds of precision, at that moment can I rely on my Java application, what will happen if the GC is called in such critical time, will it not effect the performance, are there ways that I can make sure that the GC is not called at certain time in the code?
You have a number of options
use a small young generation size. This can be collected in a few milli-seconds, sometime less than 1 milli-second.
use a large Eden space and create less garbage. If you create less than 300 KB/s on average and you have an Eden space of 24 GB, you can run for a whole day without even a minor collection.
While not all libraries are designed to be low GC, you can still use them provided they are not called very often. For critical path libraries you want them to be low garbage. This not only reduce pause times, but speeds up the code between pauses sometimes as much as 2-5x.
I have a Java client which consumes a large amount of data from a server. If the client does not keep up with the data stream at a fast enough rate, the server disconnects the socket connection. My client gets disconnected a few times per day. I ran jconsole to see the memory usage, and the heap space graph looks like a fairly well defined sawtooth pattern, oscillating between about 0.5GB and 1.8GB (2GB of heap space is allocated). But every time I get disconnected is during a full GC (but not on every full GC). I see the full GC takes a bit over 1 second on average. Depending on the time of day, full GC happens as often as every 5 minutes when busy, or up to 30 minutes can go by in between full GCs during the slow periods.
I suspect if I can reduce the full GC time, the client will be able to better keep up with the incoming data, but I do not have much experience with GC tuning. Does anyone have some insight on if this might be a good idea, and how to do it? Or is there an alternative idea which may work as well?
** UPDATE **
I used -XX:+UseConcMarkSweepGC and it improved, but I still got disconnected during the very busy moments. So I increased the heap allocation to 3GB to help weather through the busy moments and it seems to be chugging along pretty well now, but it's only been 1 day without a disconnection. Maybe if I get some time I will go through and try to reduce the amount of garbage created which I'm confident will help as well. Thanks for all the suggestions.
Full GC could take very long to complete, and is not that easy to tune.
One way to (easily) tune it is to increase the heap space - generally speaking, double the heap space can double the interval between two GCs, but will double the time consumed by a GC. If the program you are running has very clear usage patterns, maybe you can consider increase the heap space to make the interval so large that you can guarantee to have some idle time to try to make the system perform a GC. On the other hand, following this logic, if the heap is small a full garbage collection will finish in a instant, but that seems like inviting more troubles than helping.
Also, -XX:+UseConcMarkSweepGC might help since it will try to perform the GC operations concurrently (not stopping your program; see here).
Here's a very nice talk by Til Gene (CTO of Azul systems, maker of high performance JVM, and published several GC algos), about GC in JVM in general.
It is not easy to tune away the Full GC. A much better approach is to produce less garbage. Producing less garbage reduces pressure on the collection to pass objects into the tenured space where they are more expensive to collect.
I suggest you use a memory profiler to
reduce the amount of garbage produced. In many applications this can be reduce by a factor of 2 - 10x relatively easily.
reduce the size of the objects you are creating e.g. use primitive and smaller datatypes like double instead of BigDecimal.
recycle mutable object instead of discarding them.
retain less data on the client if you can.
By reducing the amount of garbage you create, objects are more likely to die in the eden, or survivor spaces meaning you have far less Full collections, which can be shorter as well.
Don't take it for granted you have to live with lots of collections, in extreme cases you can avoid it almost completely http://vanillajava.blogspot.ro/2011/06/how-to-avoid-garbage-collection.html
Take out calls to Runtime.getRuntime().gc() - When garbage collection is triggered manually it either does nothing or it does a full stop-the-world garbage collection. You want incremental GC to happen.
Have you tried using the server jvm from a jdk install? It changes a bunch of the default configuration settings (including garbage collection) and is easy to try - just add -server to your java command.
java -server
What is all the garbage that gets created? Can you generate less of it? Where possible, try to use the valueOf methods. By using less memory you'll save yourself time in gc AND in memory allocation.
I am currently running an application which requires a maximum heap size of 16GB.
Currently I use the following flags to handle garbage collection.
-XX\:+UseParNewGC, -XX\:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=50, -XX\:+DisableExplicitGC, -XX\:+PrintGCDateStamps, -XX\:+PrintGCDetails, -Xloggc\:/home/user/logs/gc.log
However, I have noticed that during some garbage collections, the application locks up for a few seconds and then carries on - This is completely unacceptable as it's a game server.
An exert from my garbage collection logs can be found here.
Any advice on what I should change in order to reduce these long pauses would be greatly appreciated.
Any advice on what I should change in order to reduce these long pauses would be greatly appreciated.
The chances are that the CMS GC cannot keep up with the amount of garbage your system is generating. But the work that the GC has to perform is actually more closely related to the amount of NON-garbage that your system is retaining.
So ...
Try to reduce the actual memory usage of your application; e.g. by not caching so much stuff, or reducing the size of your "world".
Try to reduce the rate at which your application generates garbage.
Upgrade to a machine with more cores so that there are more cores available to run the parallel GC threads when necessary.
To Mysticial:
Yes in hindsight, it might have been better to implement the server in C++. However, we don't know anything about "the game". If it involves a complicated world model with complicated heterogeneous data structures, then implementing it in C++ could mean that that you replace the "GC pause" problem with the problem that the server crashes all the time due to problems with the way it manages its data structures.
Looking at your logs, I don't see any long pauses. But young GC is very frequent. Promotion rate is very low though (most garbage cleared by young GC as it should). At same time your old space utilization is low.
BTW are we talking about minecraft server?
To reduce frequency of young GC you should increase its size. I would suggest start with -XX:NewSize=8G -XX:MaxNewSize=8G
For such large young space, you should also reduce survivor space size -XX:SurvivorRatio=512
GC tuning is a path of trial and errors, so you may need some more iterations and tweaking.
You can find couple of useful articles at mu blog
HotSpot JVM GC options cheatsheet
Understanding young GC pauses in HotSpot JVM
I'm not an expert on Java garbage collection, but it looks like you're doing the right thing by using the concurrent collector (the UseConcMarkSweepGC flag), assuming the server has multiple processors. Follow the suggestions for troubleshooting at http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#cms. If you already have, let us know what happened when you tried them.
Which version of java are you using?http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html
For better try to minimize the use of instance variables in a class.It would be better to perform on local variables than instance varibles .It helps in gaining the performance and safe from synchronization problem.In the end of operation before exit of program always reset the used variables if you are using instance variables and set again when it is required. It helps more in enhancing performance.Besides in the version of java a good garbage collection policy is implemented.It would be better to move to new version if that is fleasible.
Also you can monitor the garbage collector pause time via VisualVm and you can get more idea when it is performing more garbage collection.
Does Java 6 consume more memory than you expect for largish applications?
I have an application I have been developing for years, which has, until now taken about 30-40 MB in my particular test configuration; now with Java 6u10 and 11 it is taking several hundred while active. It bounces around a lot, anywhere between 50M and 200M, and when it idles, it does GC and drop the memory right down. In addition it generates millions of page faults. All of this is observed via Windows Task Manager.
So, I ran it up under my profiler (jProfiler) and using jVisualVM, and both of them indicate the usual moderate heap and perm-gen usages of around 30M combined, even when fully active doing my load-test cycle.
So I am mystified! And it not just requesting more memory from the Windows Virtual Memory pool - this is showing up as 200M "Mem Usage".
CLARIFICATION: I want to be perfectly clear on this - observed over an 18 hour period with Java VisualVM the class heap and perm gen heap have been perfectly stable. The allocated volatile heap (eden and tenured) sits unmoved at 16MB (which it reaches in the first few minutes), and the use of this memory fluctuates in a perfect pattern of growing evenly from 8MB to 16MB, at which point GC kicks in an drops it back to 8MB. Over this 18 hour period, the system was under constant maximum load since I was running a stress test. This behavior is perfectly and consistently reproducible, seen over numerous runs. The only anomaly is that while this is going on the memory taken from Windows, observed via Task Manager, fluctuates all over the place from 64MB up to 900+MB.
UPDATE 2008-12-18: I have run the program with -Xms16M -Xmx16M without any apparent adverse affect - performance is fine, total run time is about the same. But memory use in a short run still peaked at about 180M.
Update 2009-01-21: It seems the answer may be in the number of threads - see my answer below.
EDIT: And I mean millions of page faults literally - in the region of 30M+.
EDIT: I have a 4G machine, so the 200M is not significant in that regard.
In response to a discussion in the comments to Ran's answer, here's a test case that proves that the JVM will release memory back to the OS under certain circumstances:
public class FreeTest
{
public static void main(String[] args) throws Exception
{
byte[][] blob = new byte[60][1024*1024];
for(int i=0; i<blob.length; i++)
{
Thread.sleep(500);
System.out.println("freeing block "+i);
blob[i] = null;
System.gc();
}
}
}
I see the JVM process' size decrease when the count reaches around 40, on both Java 1.4 and Java 6 JVMs (from Sun).
You can even tune the exact behaviour with the -XX:MaxHeapFreeRatio and -XX:MinHeapFreeRatio options -- some of the options on that page may also help with answering the original question.
I don't know about the page faults. but about the huge memory allocated for Java:
Sun's JVM only allocates memory, never deallocates it (until JVM death) deallocates memory only after a specific ratio between internal memory needs and allocated memory drops beneath a (tunable) value. The JVM starts with the amount specified in -Xms and can be extended up to the amount specified in -Xmx. I'm not sure what the defaults are. Whenever the JVM needs more memory (new objects / primitives / arrays) it allocates an entire chunk from the OS. However, when the need subsides (a momentary need, see 2 as well) it doesn't deallocates the memory back the the OS immediately, but keeps it to itself until that ratio has been reached. I was once told that JRockit behaves better, but I can't verify it.
Sun's JVM runs a full GC based on several triggers. One of them is the amount of available memory - when it falls down too much the JVM tries to perform a full GC to free some more. So, when more memory is allocated from the OS (momentary need) the chance for a full GC is lowered. This means that while you may see 30Mb of "live" objects, there might be a lot more "dead" objects (not reachable), just waiting for a GC to happen. I know yourkit has a great view called "dead objects" where you may see these "left-overs".
In "-server" mode, Sun's JVM runs GC in parallel mode (as opposed the older serial "stop the world" GC). This means that while there may be garbage to collect, it might not be collected immediately because of other threads taking all available CPU time. It will be collected before reaching out of memory (well, kinda. see http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html), if more memory can be allocated from the OS, it might be before the GC runs.
Combined, a large initial memory configuration and short bursts creating a lot of short-lived objects might create a scenario as described.
edit: changed "never deallcoates" to "only after ratio reached".
Excessive thread creation explains your problem perfectly:
Each Thread gets its own stack, which is separate from heap memory and therefore not registered by profilers
The default thread stack size is quite large, IIRC 256KB (at least it was for Java 1.3)
Tread stack memory is probably not reused, so if you create and destroy lots of threads, you'll get lots of page faults
If you ever really need to have hundreds of threads aound, the thread stack size can be configured via the -Xss command line parameter.
Garbage collection is a rather arcane science. As the state of the art develops, un-tuned behaviour will change in response.
Java 6 has different default GC behaviour and different "ergonomics" to earlier JVM versions. If you tell it that it can use more memory (either explicitly on the command line, or implicitly by failing to specify anything more explicit), it will use more memory if it believes that this is likely to improve performance.
In this case, Java 6 appears to believe that reserving the extra space which the heap could grow into will give it better performance - presumably because it believes that this will cause more objects to die in Eden space, and limit the number of objects promoted to the tenured generation space. And from the specifications of your hardware, the JVM doesn't think that this extra reserved heap space will cause any problems. Note that many (though not all) of the assumptions the JVM makes in reaching its conclusion are based on "typical" applications, rather than your specific application. It also makes assumptions based on your hardware and OS profile.
If the JVM has made the wrong assumptions, you can influence its behaviour through the command line, though it is easy to get things wrong...
Information about performance changes in java 6 can be found here.
There is a discussion about memory management and performance implications in the Memory Management White Paper.
Over the last few weeks I had cause to investigate and correct a problem with a thread pooling object (a pre-Java 6 multi-threaded execution pool), where is was launching far more threads than required. In the jobs in question there could be up to 200 unnecessary threads. And the threads were continually dying and new ones replacing them.
Having corrected that problem, I thought to run a test again, and now it seems the memory consumption is stable (though 20 or so MB higher than with older JVMs).
So my conclusion is that the spikes in memory were related to the number of threads running (several hundred). Unfortunately I don't have time to experiment.
If someone would like to experiment and answer this with their conclusions, I will accept that answer; otherwise I will accept this one (after the 2 day waiting period).
Also, the page fault rate is way down (by a factor of 10).
Also, the fixes to the thread pool corrected some contention issues.
Lots of memory allocated outside Java's heap after upgrading to Java 6u10? Can only be one thing:
Java6 u10 Release Notes: "New Direct3D Accelerated Rendering Pipeline (...) Enabled by Default"
Sun enabled Direct 3D accelerations by default in Java 6u10. This option creates lots of (temporary?) native memory buffers, which are allocated outside the Java Heap. Add the following vm argument to disable it again:
-Dsun.java2d.d3d=false
Note that this will NOT disable 2D hardware acceleration, just some features that can make use of 3D hardware acceleration. You will see that your Java heap usage will increase by up to 7MB, but that's a good trade-off because you'll save ~100MB(+) of this temporary volatile memory.
I did a fair amount of testing within 2 Swing desktop application, on two platforms:
a high-end Intel-i7 with nVidia GTX 260 graphics card,
a 3-year laptop with Intel graphics.
On both hardware platforms the option made practically zero subjective difference. (Tests included: scrolling tables, zooming graphical flowsheets, charts, etc.). On the few tests where something was subtly different, disabling d3d counter-intuitively increased performance. I suspect that memory management/bandwidth problems counteracted whatever benefits the d3d accelerated functions were supposed to achieve. (Your mileage may vary!)
If you need to do some performance tuning, here's an excellent reference (e.g. "Troubleshooting Java 2D")
Are you using the ConcMarkSweep collector? It can increase the amount of memory required for your application due to increased memory fragmentation, and "floating garbage" - objects that become unreachable only after the collector has examined them, and therefore are not collected until the next pass.