I have a Java client which consumes a large amount of data from a server. If the client does not keep up with the data stream at a fast enough rate, the server disconnects the socket connection. My client gets disconnected a few times per day. I ran jconsole to see the memory usage, and the heap space graph looks like a fairly well defined sawtooth pattern, oscillating between about 0.5GB and 1.8GB (2GB of heap space is allocated). But every time I get disconnected is during a full GC (but not on every full GC). I see the full GC takes a bit over 1 second on average. Depending on the time of day, full GC happens as often as every 5 minutes when busy, or up to 30 minutes can go by in between full GCs during the slow periods.
I suspect if I can reduce the full GC time, the client will be able to better keep up with the incoming data, but I do not have much experience with GC tuning. Does anyone have some insight on if this might be a good idea, and how to do it? Or is there an alternative idea which may work as well?
** UPDATE **
I used -XX:+UseConcMarkSweepGC and it improved, but I still got disconnected during the very busy moments. So I increased the heap allocation to 3GB to help weather through the busy moments and it seems to be chugging along pretty well now, but it's only been 1 day without a disconnection. Maybe if I get some time I will go through and try to reduce the amount of garbage created which I'm confident will help as well. Thanks for all the suggestions.
Full GC could take very long to complete, and is not that easy to tune.
One way to (easily) tune it is to increase the heap space - generally speaking, double the heap space can double the interval between two GCs, but will double the time consumed by a GC. If the program you are running has very clear usage patterns, maybe you can consider increase the heap space to make the interval so large that you can guarantee to have some idle time to try to make the system perform a GC. On the other hand, following this logic, if the heap is small a full garbage collection will finish in a instant, but that seems like inviting more troubles than helping.
Also, -XX:+UseConcMarkSweepGC might help since it will try to perform the GC operations concurrently (not stopping your program; see here).
Here's a very nice talk by Til Gene (CTO of Azul systems, maker of high performance JVM, and published several GC algos), about GC in JVM in general.
It is not easy to tune away the Full GC. A much better approach is to produce less garbage. Producing less garbage reduces pressure on the collection to pass objects into the tenured space where they are more expensive to collect.
I suggest you use a memory profiler to
reduce the amount of garbage produced. In many applications this can be reduce by a factor of 2 - 10x relatively easily.
reduce the size of the objects you are creating e.g. use primitive and smaller datatypes like double instead of BigDecimal.
recycle mutable object instead of discarding them.
retain less data on the client if you can.
By reducing the amount of garbage you create, objects are more likely to die in the eden, or survivor spaces meaning you have far less Full collections, which can be shorter as well.
Don't take it for granted you have to live with lots of collections, in extreme cases you can avoid it almost completely http://vanillajava.blogspot.ro/2011/06/how-to-avoid-garbage-collection.html
Take out calls to Runtime.getRuntime().gc() - When garbage collection is triggered manually it either does nothing or it does a full stop-the-world garbage collection. You want incremental GC to happen.
Have you tried using the server jvm from a jdk install? It changes a bunch of the default configuration settings (including garbage collection) and is easy to try - just add -server to your java command.
java -server
What is all the garbage that gets created? Can you generate less of it? Where possible, try to use the valueOf methods. By using less memory you'll save yourself time in gc AND in memory allocation.
Related
I use Kafka 2.1.0.
We have a Kafka cluster with 5 brokers (r5.xlarge machines). We often observe that the GC timings increase too much without any change in the rate of incoming messages severely impacting the performance of the cluster. Now, I don't understand what could be causing much sudden increase in GC time.
I have tried a few things with little improvement but I don't really understand the reason behind them.
export KAFKA_HEAP_OPTS="-Xmx10G -Xms1G"
export KAFKA_JVM_PERFORMANCE_OPTS="-XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80"
I would like to understand the most important parameters when tuning GC in a Kafka broker.
Seeing the configuration above, where am I going wrong? What can be done to rectify this?
All the producers and consumers are working fine, and the rate of incoming messages remains fairly constant. Till now, we have not been able to figure out any pattern behind the sudden increase in GC times, it seems random.
UPDATE
After some further analysis, It turns out there was indeed some increase in the amount of data per sec. One of the topics had increased message input from around 10 KBps to 200 KBps. But I believed that Kafka could easily handle this much of data.
Is there something I am missing??
Grafana Snapshot
I would start by looking to see if the problem is something else than a GC tuning issue. Here are a couple of possibilities:
A hard memory leak will cause GC times to increase. The work done by a GC is dominated by tracing and copying of reachable objects. If you have a leak, then more and more objects will be (incorrectly) reachable.
A cache that that is keeping too many objects reachable will also increase GC times.
Excessive use of Reference types, finalizers, etc may increase GC times.
I would enable GC logging, and look for patterns in memory and space utilization reported by the GC. If you suspect a memory leak because memory utilization is trending higher in the long term, go to the next step and use a memory profile to track down the leak.
Either way, it is important to understand what is causing the problem before trying to fix it.
After some further analysis, it turns out there was indeed some increase in the amount of data per sec. One of the topics had increased message input from around 10 KBps to 200 KBps. But I believed that Kafka could easily handle this much of data.
It most likely can. However, a 20x increase in throughput will inevitably lead to more objects being created and discarded ... and the GC will need to run more often to deal with this.
How come just 200 Kbps of data divided among 5 brokers was able to break GC.
What makes you think that you have "broken" the GC? 15% time in GC doesn't mean it is broken.
Now, I can imagine that the GC may have difficulty meeting your 20ms max pause time goal, and may be triggering occasional full GCs as a result. Your pause time goal is "ambitious", especially if the heap may grow to 10GB. I would suggest reducing the heap size, increasing the pause time goal, and/or increasing the number of physical cores available to the JVM(s).
By breaking I mean an increased delay in committing offsets and other producer and consumer offsets.
So ... you are just concerned that a 20 x increase in load has resulted in the GC using up to 15% of available CPU. Well that's NOT broken. That is (IMO) expected. The garbage collector is not magic. It needs to use CPU time to do its work. The more work it has to do, the more CPU it needs to use to do it. If your application's workload involves a lot of object allocation, then the GC has to deal with that.
In addition to the tuning ideas above, I suspect that you should set the G1HeapRegionSize size a lot smaller. According to "Garbage First Garbage Collector Tuning" by Monica Beckwith, the default is to have 2048 regions based on the minimum heap size. But your setting will give 1G / 16M == 64 initial regions.
Finally, if your overall goal is to reduce the CPU utilization of the GC, then you should be using the Throughput GC, not G1GC. This will minimize GC overheads. The downside is that GC pause minimization is no longer a goal, so occasional lengthy pauses are to be expected.
And if you plan to stay with G1GC, it is advisable to use the latest version of Java; i.e. Java 11. (See "G1 Garbage Collector is mature in Java 9, finally")
Kafka 2.1 uses G1GC by default, so I guess you can omit that argument. I'm assuming you're not using JDK 11. Compared to previous versions, JDK 11 brings significant improvement to G1GC. Instead of running a single-threaded full GC cycle, it can now achieve parallel processing. Even though that shouldn't improve the best case scenarios by a big margin, but worst case scenarios should see significant improvement. If possible, please share your results after migrating to JDK 11.
Note: I doubt that's the root cause, but let's see.
Hello I am having a case of 150GB heap memory program using In Memory Data grid. I have some crazy requirement from the operational department to use a single machine. Now we all know what happens in if the parallel garbage collector is used over 150GB probably it will be tens of minutes of garbage collection if the FULL GC is invoked.
My hope was that with Java 9 is coming Shenandoah low pause GC. Unfortunately from what I see it is not listed for delivery in Java 9. Does anyone knows anything about that ?
Never the less, I am wondering how G1 GC will perform for this amount of Heap memory.
And one last question. Since I have non interactive batch application that is supposed to complete in 2 hours lets say. The main goal here is to ensure that the Full GC never kicks in. If I ensure that there is plenty of memory lets say if the maximum heap that can be reached is 150 and I allocate it 250GB may I say with good confidence that the Full GC will never kick in or ? Usually full GC is triggered if the new generation + the old generation touches the maximum heap. Can it be triggered in a different way ?
There is a duplicate request made I will try to explain here why this question is not a duplicate. First we are talking about 150GB Heap which adds completely different dimension to the question. Second I dont use RMI as it is in the question mentioned, third I am asking question about G1 garbage collector in between the lines.Also once we go beyond the 32GB heap barrier we are entering the 64 bit address space you can not convince me that a question in regards of <32GB Heap is the same as a question with heap >32GB Not to mentioned that things have changed a bit since Java 7 for instance PermSpace does not exist.
The rule of thumb for a compacting GC is that it should be able to process 1 GB of live objects per core per second.
Example on an Haswell i7 (4 cores/8 threads) and 20GB heap with the parallel collector:
[24.757s][info][gc,heap ] GC(109) PSYoungGen: 129280K->0K(917504K)
[24.757s][info][gc,heap ] GC(109) ParOldGen: 19471666K->7812244K(19922944K)
[24.757s][info][gc ] GC(109) Pause Full (Ergonomics) 19141M->7629M(20352M) (23.791s, 24.757s) 966.174ms
[24.757s][info][gc,cpu ] GC(109) User=6.41s Sys=0.02s Real=0.97s
The live set after compacting is 7.6GB. It takes 6.4 seconds worth of cpu-time, due to parallelism this translates to <1s pause time.
In principle the parallel collector should be able to handle a 150GB heap with full GC times < ~2 minutes on a multi-core system, even when most of the heap consists of live objects.
Of course this is just a rule of thumb. Some things that can affect it negatively:
paging
thermal CPU throttling
workloads consisting of very large, reference-heavy objects
non-local memory traffic in NUMA configurations
other processes competing for CPU time
heavy use of weak/soft references
In some cases tuning may be necessary to achieve this throughput.
If the Parallel collector does not work despite all that then CMS and G1 can be viable alternatives but only if there is enough spare heap capacity and CPU cores available to the JVM. They need significant breathing room to do their concurrent work without risking a full GC.
It is correct I said no interactive, but still I have a strict license agreements. I need to be finished with the whole processing in an hour. So I can no afford 30 minutes stop the world event.
Basically, you don't really need low pause times in the sense that CMS, G1, Shenandoah or Zing aim for (they aim for <100ms or even <10ms even on large heaps).
All you need is that STW pauses are not so catastrophically bad that they eat a significant portion of your compute time.
This should be feasible with most of the available collectors, ignoring the serial one.
In practice there are some pathological edge cases where they may fall down, but to get to that point you need setup a system with your actual workload and do some test runs. If you experience some real problems, then you can ask a question with more details.
So, the jest of it is, a version of an application at my company is having some memory issues lately, and I'm not fully sure the best way to fix it that isn't just "Allocate more memory", so I wanted to get some guidance.
For the application, It looks like the eden heap is getting full pretty quickly when it has a concurrent users, so objects that won't be alive very long end up in the old heap. After running for a while, the old heap simply gets fulls, and never seems to automatically clean up, but manually running the garbage collection in VisualVM will clear it out (So I assume this means the old heap is full of dead objects)
Is there any setting suggested I could add so garbage collection gets run on the old heap once it gets to a certain threshold? And is there any pitfalls from changing the old/edin ratio from the stock 2:1 to 1:1? For the application, the majority of objects created are what I would consider short lived (From milliseconds to a few minutes)
It looks like the eden heap is getting full pretty quickly when it has a concurrent users, so objects that won't be alive very long end up in the old heap.
This is called "premature promotion"
After running for a while, the old heap simply gets fulls,
When it fills, the GC triggers a major or even a full collection.
never seems to automatically clean up
In which case, it is either used or it is not completely full. It might appear to be almost full, but the GC will be performed when it is actually full.
but manually running the garbage collection in VisualVM will clear it out
So the old gen wasn't almost but not actually full.
I could add so garbage collection gets run on the old heap once it gets to a certain threshold?
You can run System.gc() but this means more work for you application and slow it down. You don't want to be doing this.
If you use the CMS collector you can change the threshold at which it kicks in but unless you need low latency you might be better off leaving your settings as they are.
And is there any pitfalls from changing the old/edin ratio from the stock 2:1 to 1:1?
You reduce the old gen, you you may half the number of GCs you perform and double the amount of time an object can live and not end up in the old gen.
I work in the low latency space and usually set the young space to 24 GB and the old gen to 2 GB. I also use a lot of off heap data so I don't need much old gen. This is not an average use case, but it can work depending on your requirements.
If you are using < 32 GB, just adding a few more GB may be the simplest answer. Also you can use something like -Xmn4g -Xms6g to set the young space and maximum heap not worry about ratios.
For the application, the majority of objects created are what I would consider short lived (From milliseconds to a few minutes)
In that case, ideally you want your eden space large enough so you have a minor collection every few minutes. This way most of your objects will die in the eden space, and not be copied around.
Note: in extreme cases it is possible to have an application produce less than one GB per hour of garbage and run all day with a 24 GB Eden space without even a minor collection.
I know many things about what should be perceived from gc.logs like
you should check how frequently "Full GC" runs, if it is running frequently then it is sign of problem
you should also check how much memory "Full GC" is able to reclaim while finishes, if it is not much then again it is sign of problem as it would force "Full GC" to run again
you should revisit your heap space allocated for java process if "Full GC" runs frequently.
These are some points on which I am working on, I would be interested to know what else should be taken care, when I look at gc logs.
FYI, I have already gone through following threads....
What does "GC--" in gc.log mean?
What does "GC--" mean in a java garbage collection log?
How to analyse and monitor gc.log garbage collector log files from the JVM
Is gc.log writing asynchronous? safe to put gc.log on NFS mount?
First you need to know what wrong can GC do to your program. Depending on the type of collectors that you use for tenured and old gen contents of GC logs may vary. But all in all the baseline inference that we need to derive from gc logs is mostly concentrated to the following:
How long are the minor collections taking?
How long are the major collections taking?
What is the frequency of minor collections?
What is the frequency of major collections?
How much does a minor collection reclaim?
How much does a major collection reclaim?
Combinations of the above
Most Program have a very frequent minor collections that claim about 90-95% of heap and pass the rest to Survivor spaces. Subsequent collections clean up survivors by about 80% again and in essence just 2% to 4% of you actual minor collection makes it to old gen and tis cycles keeps on going no matter which Collector you use.
Now the pain areas are when you have hundreds of small sized minor collections per application request or thread and when added up they make a sizable time mostly in double digit seconds. Since in modern collectors minor pass and sweep are not stop the world cases so somethings this is bearable. With Old gen the problems come when collectors run but don't reclaim anything major. e.g: normally a collector runs when the old gen is about 80-85% full. This may be a stop the world episode since new data cannot be saved on heap unless the heap has more space which is probably the case here. So app threads are paused to let GC threads cleanup the space first. but once the collector finishes the heap fill ratio doesn't come down much as it should. A good sizing should reduce your heap by more than 40% in a single go. If it doesn't that means you need more heap to save your long lived objects.
So in essence GC analysis is not a 'do it based of a set of predefined steps' things. Its more of a hti and trial analysis. It more of an experiment were you set the initial sizes and settings and then note or monitor the GC activity and record findings. Then after say 8-10 runs you compare notes and see what works for your app and what doesn't. Its really an interesting hard work to do.
I am currently running an application which requires a maximum heap size of 16GB.
Currently I use the following flags to handle garbage collection.
-XX\:+UseParNewGC, -XX\:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=50, -XX\:+DisableExplicitGC, -XX\:+PrintGCDateStamps, -XX\:+PrintGCDetails, -Xloggc\:/home/user/logs/gc.log
However, I have noticed that during some garbage collections, the application locks up for a few seconds and then carries on - This is completely unacceptable as it's a game server.
An exert from my garbage collection logs can be found here.
Any advice on what I should change in order to reduce these long pauses would be greatly appreciated.
Any advice on what I should change in order to reduce these long pauses would be greatly appreciated.
The chances are that the CMS GC cannot keep up with the amount of garbage your system is generating. But the work that the GC has to perform is actually more closely related to the amount of NON-garbage that your system is retaining.
So ...
Try to reduce the actual memory usage of your application; e.g. by not caching so much stuff, or reducing the size of your "world".
Try to reduce the rate at which your application generates garbage.
Upgrade to a machine with more cores so that there are more cores available to run the parallel GC threads when necessary.
To Mysticial:
Yes in hindsight, it might have been better to implement the server in C++. However, we don't know anything about "the game". If it involves a complicated world model with complicated heterogeneous data structures, then implementing it in C++ could mean that that you replace the "GC pause" problem with the problem that the server crashes all the time due to problems with the way it manages its data structures.
Looking at your logs, I don't see any long pauses. But young GC is very frequent. Promotion rate is very low though (most garbage cleared by young GC as it should). At same time your old space utilization is low.
BTW are we talking about minecraft server?
To reduce frequency of young GC you should increase its size. I would suggest start with -XX:NewSize=8G -XX:MaxNewSize=8G
For such large young space, you should also reduce survivor space size -XX:SurvivorRatio=512
GC tuning is a path of trial and errors, so you may need some more iterations and tweaking.
You can find couple of useful articles at mu blog
HotSpot JVM GC options cheatsheet
Understanding young GC pauses in HotSpot JVM
I'm not an expert on Java garbage collection, but it looks like you're doing the right thing by using the concurrent collector (the UseConcMarkSweepGC flag), assuming the server has multiple processors. Follow the suggestions for troubleshooting at http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#cms. If you already have, let us know what happened when you tried them.
Which version of java are you using?http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html
For better try to minimize the use of instance variables in a class.It would be better to perform on local variables than instance varibles .It helps in gaining the performance and safe from synchronization problem.In the end of operation before exit of program always reset the used variables if you are using instance variables and set again when it is required. It helps more in enhancing performance.Besides in the version of java a good garbage collection policy is implemented.It would be better to move to new version if that is fleasible.
Also you can monitor the garbage collector pause time via VisualVm and you can get more idea when it is performing more garbage collection.