I have a webserver with some applications on Tomcat, with 4GB of RAM as the maximum consumption adjusted for the JVM and the truth is that the performance is very good.
But I see that as time passes, little by little the PS Old Gen sector is filling up. The increase is 1% approx per day -->
23/01 :
23/01
24/01 :
24/01
Is this alarming?
I've been following up for 15 days and that's how it progresses.
At the system events level there are no alerts or errors from the GC, according to the provider of the implemented product, errors should not occur (since you don't see events in the OS, they don't give importance to you) but should i worry?
According to all the documentation that I read, at the moment of reaching 100% a GC will be executed on said sector and it will continue in operation, but since it hasn't happened so far, I'm interested in asking experts on the matter.
What do you think? What kind of logs should I check apart from system events and tomcat logs?
Thanks!,
Regards.
PS Old Gen Filling Analysis
well you can try to analyze heap dump of you application to see what is stored in memory, also gc log could be useful here, in general full gc can free your old gen memory
Related
ZGC runs not often enough. GC logs show that it runs once every 2-3 minutes for my application and because of this, my memory usage goes high between GC cycles (as high as 90%). After GC, it drops to as low as 20%.
How to increase GC run's frequency to run more often?
-XX:ZCollectionInterval=N - set maximum gap between collections to N seconds.
-XX:ZUncommitDelay=M - set the delay until unused memory is returned to the OS to M seconds.
Before tuning the GC, I would recommend to investigate why this is happening. Might have some issue/bug in your application.
[Some notes about GC]
-XX:ZUncommitDelay=M (Check if it is supported by your linux kernel)
-XX:+ZProactive: Enables proactive GC cycles when using ZGC. By default, this option is enabled. ZGC will start a proactive GC cycle if doing so is expected to have minimal impact on the running application. This is useful if the application is mostly idle or allocates very few objects, but you still want to keep the heap size down and allow reference processing to happen even when there are a lot of free space on the heap.
More details about ZGC config. options can be found:
ZGC Home Page.
Oracle Documentation
Presently (as of JDK 17), ZGC's primary strategy is to wait until the last possible moment of the heap filling up and then do a collection. Its goals are
Avoid unnecessary CPU load by running GC only when it's necessary.
Start the GC early enough so that it will finish before the heap actually fills up (since the heap filling up would be bad, leading to a temporary application stall).
It does this by measuring how fast your app is allocating memory, how long the GC takes to run, and predicting at what point it should start the GC. You can find the exact algorithm in the source code.
ZGC also exposes some knobs for running GC more often (ie, proactively), but honestly I don't find them terribly effective. You can find more info in my other answer. G1 does a better job of being proactive, but whether that's good or not depends on your use-case. (It sounds like you care more about throughput than memory usage, so I think you should prefer ZGC's behavior.)
However, if you find that ZGC is making mistakes in predicting when the heap will fill up and that your application really is hitting stalls, please share that info here or on the ZGC mailing list.
I have an instance of zookeeper that has been running for some time... (Java 1.7.0_131, ZK 3.5.1-1), with -Xmx10G -XX:+UseParallelGC.
Recently there was a leadership change, and the memory usage on most instances in the quorum went from ~200MB to 2GB+. I took a jmap dump, and what I found that was interesting was that there was a lot of byte[] serialization data (>1GB) that had no GC Root, but hadn't been collected.
(This is ByteArrayOutputStream, DataOutputStream, org.apache.jute.BinaryOutputArchive, or HeapByteBuffer, BinaryOutputArchive).
Looking at the gc log, shortly before the election change, the full GC was running every 4-5 minutes. After the election, the tenuring threshold increases from 1 to 15 (max) and the full GC runs less and less often, eventually it doesn't even run on some days.
After severals days, suddenly, and mysteriously to me, something changes, and the memory plummets back to ~200MB with Full GC running every 4-5 minutes.
What I'm confused about here, is how so much memory can have no GC Root, and not get collected by a full GC. I even tried triggering a GC.run from jcmd a few times.
I wondered if something in ZK native land was holding onto this memory, or leaking this memory... which could explain it.
I'm looking for any debugging suggestions; I'm planning on upgrading Java 1.8, maybe ZK 3.5.4, but would really like to root cause this before moving on.
So far I've used visualvm, GCviewer and Eclipse MAT.
(Solid vertical black lines are full GC. Yellow is young generation).
I am not an expert on ZK. However, I have been tuning JVMs on Weblogic for a while and I feel, based on this information, that your configuration is generating the expansion and shrinking of the heaps (-Xmx10G -XX:+UseParallelGC). Thus, perhaps you should try using -Xms10G and -Xmx10G to avoid this resizing. Importantly, each time the JVM is resized a full GC is executed so avoiding this process is a good way to minimize the number of full garbage collections.
Please read this
"When a Hotspot JVM starts, the heap, the young generation and the perm generation space are
allocated to their initial sizes determined by the -Xms, -XX:NewSize, and -XX:PermSize parameters
respectively, and increment as-needed to the maximum reserved size, which are -Xmx, -
XX:MaxNewSize, and -XX:MaxPermSize. The JVM may also shrink the real size at runtime if the
memory is not needed as much as originally specified. However, each resizing activity triggers a
Full Garbage Collection (GC), and therefore impacts performance. As a best practice, we
recommend that you make the initial and maximum sizes identical"
Source: http://www.oracle.com/us/products/applications/aia-11g-performance-tuning-1915233.pdf
If you could provide your gc.log, it would be useful to analyse this case thoroughly.
Best regards,
RCC
I'm having trouble figuring out a way to monitor the JVM GC for memory exhaustion issues.
With the serial GC, we could just look at the full GC pause times and have a pretty good notion if the JVM was in trouble (if it took more than a few seconds, for example).
CMS seems to behave differently.
When querying lastGcInfo from the java.lang:type=GarbageCollector,name=ConcurrentMarkSweep MXBean (via JMX), the reported duration is the sum of all GC steps, and is usually several seconds long. This does not indicate an issue with GC, to the contrary, I've found that too short GC times are usually more of an indicator of trouble (which happens, for example, if the JVM goes into a CMS-concurrent-mark-start-> concurrent mode failure loop).
I've tried jstat as well, which gives the cumulative time spent garbage collecting (unsure if it's for old or newgen GC). This can be graphed, but it's not trivial to use for monitoring purposes. For example, I could parse jstat -gccause output and calculate differences over time, and trace+monitor that (e.g. amount of time spent GC'ing over the last X minutes).
I'm using the following JVM arguments for GC logging:
-Xloggc:/xxx/gc.log
-XX:+PrintGCDetails
-verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintReferenceGC
-XX:+PrintPromotionFailure
Parsing gc.log is also an option if nothing else is available, but the optimal solution would be to have a java-native way to get at the relevant information.
The information must be machine-readable (to send to monitoring platforms) so visual tools are not an option. I'm running a production environment with a mix of JDK 6/7/8 instances, so version-agnostic solutions are better.
Is there a simple(r) way to monitor CMS garbage collection? What indicators should I be looking at?
Fundamentally one wants two things from the CMS concurrent collector
the throughput of the concurrent cycle to keep up with the promotion rate, i.e. the objects surviving into the old gen per unit of time
enough room in the old generation for objects promoted during a concurrent cycle
So let's say the IHOP is fixed to 70% then you probably are approaching a problem when it reaches >90% at some point. Maybe even earlier if you do some large allocations that don't fit into the young generation or outlive it (that's entirely application-specific).
Additionally you usually want it to spend more time outside the concurrent cycle than in it, although that depends on how tightly you tune the collector, in principle you could have the concurrent cycle running almost all the time, but then you have very little throughput margin and burn a lot of CPU time on concurrent collections.
If you really really want to avoid even the occasional Full GC then you'll need even more safety margins due to fragmentation (CMS is non-compacting). I think this can't be monitored via MX beans, you'll have to to enable some CMS-specific GC logging to get fragmentation info.
For viewing GC logs:
If you have already enabled GC logging, I suggest GCViewer - this is an open source tool that can be used to view GC logs and look at parameters like throughput, pause times etc.
For profiling:
I don't see a JDK version mentioned in the question. For JDK 6, I would recommend visualvm to profile an application. For JDK 7/8 I would suggest mission control. You can find these in JDK\lib folder. These tools can be used to see how the application performs over a period of time and during GC (can trigger GC via visualvm UI).
After running a few days the CPU load of my JVM is about 100% with about 10% of GC (screenshot).
The memory consumption is near to max (about 6 GB).
The tomcat is extremely slow at that state.
Since it's too much for a comment i'll write it up ans answer:
Looking at your charts it seems to be using CPU for non-GC tasks, peak "GC activity" seems to stay within 10%.
So on first impression it would seem that your task is simply CPU-bound, so if that's unexpected maybe you should do some CPU-profiling on your java application to see if something pops out.
Apart from that, based on comments I suspect that physical memory filling up might evict file caches and memory-mapped things, leading to increased page faults which forces the CPU to wait for IO.
Freeing up 500MB on a manual GC out of a 4GB heap does not seem all that much, most GCs try to keep pause times low as their primary goal, keep the total time spent in GC within some bound as secondary goal and only when the other goals are met they try to reduce memory footprint as tertiary goal.
Before recommending further steps you should gather more statistics/provide more information since it's hard to even discern what your actual problem is from your description.
monitor page faults
figure out which GC algorithm is used in your setup and how they're tuned (-XX:+PrintFlagsFinal)
log GC activity - I suspect it's pretty busy with minor GCs and thus eating up its pause time or CPU load goals
perform allocation profiling of your application (anything creating excessive garbage?)
You also have to be careful to distinguish problems caused by the java heap reaching its sizing limit vs. problems causing by the OS exhausting its physical memory.
TL;DR: Unclear problem, more information required.
Or if you're lazy/can afford it just plug in more RAM / remove other services from the machine and see if the problem goes away.
I learned to check this on GC problems:
Give the JVM enough memory e.g. -Xmx2G
If memory is not sufficient and no more RAM is available on the host, analyze the HEAP dump (e.g. by jvisualvm).
Turn on Concurrent Marc and Sweep:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Check the garbage collection log: -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
My Solution:
But I solved that problem finally by tuning the cache sizes.
The cache sizes were to big, so memory got scarce.
if you want keep the memory of your server free you can simply try the vm-parameter
-Xmx2G //or any different value
This ensures your program never takes more than 2 Gigabyte of Ram. But be aware if case of high workload the server may be get an OutOfMemoryError.
Since a old generation (full) GC may block your whole server from working for some seconds java will try to avoid a Full Garbage collection.
The Ram-Limitation may trigger a Full-Generation GC more easy (or even support more objects to be collected by Young-Generation GC).
From my (more guessing than actually knowing) opinion: I don't think another algorithm can help so much here.
We're running a fairly complex app as a portlet on Websphere Portal Server 5.1 on AIX using IBM JDK 1.4.2. On our production system I can see a strange behaviour in the verbose GC logs. After a period of normal behaviour the system can start rapidly allocating larger and larger blocks. The system starts to spend > 1000 ms to complete each GC, but blocks are being allocated so quickly that there is only a 30 ms gap between allocation failures.
Each allocation failure slightly larger than the last by some integer amount x 1024 bytes. E.g. you might have 5 MB, and then a short while later 5 MB + 17 * 1024.
This can go on for up to 10 minutes.
The blocks tend to grow up to 8 to 14 MB in size before it stops.
It's a quad-core system, and I assume that it's now spending >95% of it's time doing GC with three cores waiting for the other core to complete GC. For 10 minutes. Ouch.
Obviously system performance dies at this point.
We have JSF, hibernate & JDBC, web services calls, log4j output and not much else.
I interpret this as likely to be something infrastructural rather than our application code. If it was a bad string concatenation inside a loop we would expect more irregular growth than blocks of 1024. If it was StringBuffer or ArrayList growth we would see the block sizes doubling. The growth is making me think of log buffering or something else. I can't think of anything in our app that allocations even 1 MB, let alone 14. Today I looked for logging backing up in memory before being flushed to disk, but the volume of logging statements over this period of GC thrashing was nowhere near the MB range.
Clearly the problem is with the excessive memory allocation rather than with the garbage collection, which is just doing its best to keep up. Something is allocating a large block and trying to grow it inefficiently in increments that are far too small.
Any ideas what might be causing all this when the system is under load? Anybody seen anything similar with Portal Server?
Note: for anybody who's interested it's starting to look like the cause is an occasional but enormous database query. It seems the culprit is either Hibernate or the JDBC driver.
Not sure what could cause the problem, but here is an idea on how to investigate more:
The IBM JDK is great because it can be configured to do a heap dump when it receives a SIGQUIT signal.
In a previous project, it was not our JDK, but we would use it whenever we had memory issues to investigate.
Here's how to enable the heapdump:
http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/index.jsp?topic=/com.ibm.java.doc.diagnostics.142j9/html/enabling_a_heapdump.html
Then there's a tool called heaproot that will allow you to see what's in these dumps.
Finding the type of objects should lead you to the culprit.
Depending on the exact version of the IBM JDK you are using, there are various options for tracking "large allocations". The differences are mainly in the implementation, and the result is a logging Java stack trace when an allocation over a certain size is made (which should help you track down the culprit).
"Sovereign" 1.4.2 SR4+:
http://www-01.ibm.com/support/docview.wss?uid=swg21236523
"J9" 1.4.2 (if Java is running under -Xj9 option):
You need to get hold of a JVMPI / JVMTI agent for the same purpose, I can't find a link for this one right now.
Only a hint... once we had a project that suffered major GC problems (Websphere and IBM JDK) due to heap fragmentation. At the end, we added a JDK switch to force heap compaction.
The Sun JDK does not tent to have a fragmented heap, but the IBM JDK does due to the different memory/GC handling.
Just give it a try... I cannot remember the magic switch.