Suppose my environment is Java 1.8, my application is a batch application and there is no requirement for latency, I don't know whether I should choose Parallel GC or G1 GC?
I understand that Parallel GC is optimized for throughput and is more suitable for batch applications like mine, but I find that all Java applications around me are using G1 garbage collector, so I am not sure if I don't need Parallel GC if I have G1, or if I am looking for throughput, Parallel GC is the best choice. better choice?
I went through the first chapter of the Java® Performance Companion book and there is a passage in it that describes this:
As of this writing, G1 primarily targets the use case of large Java heaps with reasonably low pauses, and also those applications that are using CMS GC. There are plans to use G1 to also target the throughput use case, but for applications looking for high throughput that can tolerate longer GC pauses, Parallel GC is currently the better choice.
This exactly answers my question, if 1 project is purely for completing timed tasks, or consuming MQ, it usually has no requirement for pause time, which would be more appropriate with Parallel GC.
Related
Whenever there's concurrent mode failure or promotion failure using CMS it does full GC using single thread. Why it couldn't do full GC using parallel collector to decrease the full GC penalty?
There is no particular reason other than that it hasn't been implemented that way and engineering effort is focused on G1. Most users of CMS just try to tune it in a way that it never happens, "never" meaning at an interval greater than whatever requires JVM restarts anyway. The parallel old collector can't be reused by simply calling its code since internal data structurs between the collectors differ, so it would involve non-trivial implementation effort.
Google devs have proposed a patch to contribute parallel full GC to CMS, but i wouldn't count on it becoming available in any openjdk builds anytime soon.
Many monitoring tools, like the otherwise phantastic JavaMelody, just monitor the current memory usage. If you want to check for memory leaks or impending out of memory situations, this is not particularily helpful, if you have an application that generates loads of garbage which gets collected immediately. Not perfect, but IMHO much more interesting, would it be to monitor the memory usage immediately after a major garbage collection. If that's high, a crash is looming over you.
So: can you find out the memory usage immediately after the last major garbage collection - either from Java code or via JMX? I know there are some tools like VisualVM which do this (which is no option for production use), and it can be written in the garbage collection log, but I'm looking for a more straightforward solution than parsing the garbage collection logfile. :-) To be clear: I'm looking for something that can easily be used in any application in production, not any expensive tool for debugging.
In case that matters: JDK 7 with -XX:+UseConcMarkSweepGC , but I am interested in general answers, too.
Information about memory available right after gc (youg or old) is available via JMX.
Garbage collector MBean has attribute LastGcInfo which is composite data object including information about memory pool sizes before and after GC.
In addition, starting with Java 7 JMX notification subscription could be used to receive GC events without polling.
You can find example of code working with GC MBean here.
Probably 'Dynatrace' is an option... it's a very powerful monitoring tool (not only for memory).
http://www.dynatrace.com/en/index.html
A very crude way would be to monitor the minima of Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory() for some time. At least that would not require you to know intimate details about memory pools, as monitoring LastGcInfo in Alexey Ragozin's answer does. This might require you to get notifications about garbage collections.
I have refactored and tuned my java application.
I now want to compare the performance of newer and older version of the application , in terms of their individual CPU and heap memory usage.
I am using VisualVM and JDK 1.7. I run them individually and monitor them using VisualVM. At the end all i have is two sets of graphs. This makes deciding which one is better difficult.
Is there a metric that VisualVm provides , which can make deciding which version performs better easier. ?.
Something like Average CPU used or Average Heap Used (if that is an accurate way of measuring performance) Thanks !!
The Platform MBeans provide access to the CPU, memory and Garbage Collection data, so its possible to collect data from there and run statistical analysis against it:
CPU and Memory: http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html
Garbage Collection:
http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/GarbageCollectorMXBean.html
How does Java deal with GC and Heap Allocation on multi-processor machines?
In the reading I've done, there doesn't seem to be any difference in the algorithms used between single and multi-processor systems. The art & science of GC tuning is Java seems fairly mature, yet I can't find anything related to this in any of the common JVM implementations.
As a data point, in .Net, the algorithm changes significantly: There's a heap affinitized to each processor, and each processor is responsible for that heap. This is documented in a number of places such as MSDN:
Scalable Collections On a multiprocessor system running the server
version of the execution engine (MSCorSvr.dll), the managed heap is
split into several sections, one per CPU. When a collection is
initiated, the collector has one thread per CPU; all threads collect
their own sections simultaneously. The workstation version of the
execution engine (MSCorWks.dll) doesn't support this feature.
Any insight that can be offered into Java GC tuning specifically for multi processor systems is also of interest to me.
Indeed, in Hotspot JVM, the way that the heap is used does not depend on the heap size or number of cores.
Each thread (not processor) has a Thread Local Allocation Buffer (TLAB) so that object allocation is cheap and thread-safe. This memory space is kind of identical to that heap-processor-affinity you are mentionning.
You can also activate Non-Uniform Memory Access (NUMA). The idea behind NUMA is to prefer the RAM that is close to a CPU chip to store objects instead of considering the entire heap as a uniform space.
Finally, the GC are multi-threaded and scale on your number of cores, so they take advantage of your hardware.
Garbage collection is an implementation specific concept. Different JVMs (IBM, Oracle, OpenJDK, etc.) have different implementations, and different mechanisms are available in different versions too. Often you can select which mechanism you want to use when you start your Java program.
Similar questions here....
These details are often given in the documentation for the commandline options for your JRE:
IBM JDK Here
Oracle JRE Options here
I'm building a program that will live on an AWS EC2 instance (probably) be invoked periodically via a cron job. The program will 'crawl'/'poll' specific websites that we've partnered with and index/aggregate their content and update our database. I'm thinking java is a perfect fit for a language to program this application in. Some members of our engineering team are concerned about the performance detriment of java's garbage collection feature, and are suggesting using C++.
Are these valid concerns? This is an application that will be invoked possible once every 30 minutes via cron job, and as long as it finishes its task within that time frame the performance is acceptable I would assume. I'm not sure if garbage collection would be a performance issue, since I would assume the server will have plenty of memory and the actual act of tracking how many objects point to an area of memory and then declaring that memory free when it reaches 0 doesn't seem too detrimental to me.
No, your concerns are most likely unfounded.
GC can be a concern, when dealing with large heaps & fractured memory (requires a stop the world collection) or medium lived objects that are promoted to old generation but then quickly de-referenced (requires excessive GC, but can be fixed by resizing ratio of new:old space).
A web crawler is very unlikely to fit either of the above two profiles - you probably don't need a massive old generation and should have relatively short lived objects (page representation in memory while you parse out data) and this will be efficiently dealt with in the young generation collector.
We have an in-house crawler (Java) that can happily handle 2 million pages per day, including some additional post-processing per page, on commodity hardware (2G RAM), the main constraint is bandwidth. GC is a non-issue.
As others have mentioned, GC is rarely an issue for throughput sensitive applications (such as a crawler) but it can (if one is not careful) be an issue for latency sensitive apps (such as a trading platform).
The typical concern C++ programmers have for GC is one of latency. That is, as you run a program, periodic GCs interrupt the mutator and cause spikes in latency. Back when I used to run Java web applications for a living, I had a couple customers who would see latency spikes in the logs and complain about it — and my job was to tune the GC to minimize the impact of those spikes. There are some relatively complicated advances in GC over the years to make monstrous Java applications run with consistently low latency, and I'm impressed with the work of the engineers at Sun (now Oracle) who made that possible.
However, GC has always been very good at handling tasks with high throughput, where latency is not a concern. This includes cron jobs. Your engineers have unfounded concerns.
Note: A simple experimental GC reduced the cost of memory allocation / freeing to less than two instructions on average, which improved throughput, but this design is fairly esoteric and requires a lot of memory, which you don't have on EC2.
The simplest GCs around offer a tradeoff between large heap (high latency, high throughput) and small heap (lower latency, lower throughput). It takes some profiling to get it right for a particular application and workload, but these simple GCs are very forgiving in a large heap / high throughput / high latency configuration.
Fetching and parsing websites will take way more time than the garbage collector, its impact will be probably neliglible. Moreover, the automatic memory management is often more efficient when dealing with a lot of small objects (such as strings) than a manual memory management via new/delete. Not talking about the fact that the garbage collected memory is easier to use.
I don't have any hard numbers to back this up, but code that does a lot of small string manipulations (lots of small allocations and deallocations in a short period of time) should be much faster in a garbage-collected environment.
The reason is that modern GC's "re-pack" the heap on a regular basis, by moving objects from an "eden" space to survivor spaces and then to a tenured object heap, and modern GC's are heavily optimized for the case where many small objects are allocated and then deallocated quickly.
For example, constructing a new string in Java (on any modern JVM) is as fast as a stack allocation in C++. By contrast, unless you're doing fancy string-pooling stuff in C++, you'll be really taxing your allocator with lots of small and quick allocations.
Plus, there are several other good reasons to consider Java for this sort of app: it has better out-of-the-box support for network protocols, which you'll need for fetching website data, and it is much more robust against the possibility of buffer overflows in the face of malicious content.
Garbage collection (GC) is fundamentally a space-time tradeoff. The more memory you have, the less time your program will need to spend performing garbage collection. As long as you have a lot of memory available relative to the maximum live size (total memory in use), the main performance hit of GC -- whole-heap collections -- should be a rare event. Java's other advantages (notably robustness, security, portability, and an excellent networking library) make this a no-brainer.
For some hard data to share with your colleagues showing that GC performs as well as malloc/free with plenty of available RAM, see:
"Quantifying the Performance of Garbage Collection vs. Explicit Memory Management", Matthew Hertz and Emery D. Berger, OOPSLA 2005.
This paper provides empirical answers to an age-old question: is
garbage collection faster/slower/the same speed as malloc/free? We
introduce oracular memory management, an approach that lets us measure
unaltered Java programs as if they used malloc and free. The result: a
good GC can match the performance of a good allocator, but it takes 5X
more space. If physical memory is tight, however, conventional garbage
collectors suffer an order-of-magnitude performance penalty.
Paper: PDF
Presentation slides: PPT, PDF