Why is memory management so visible in Java VM?

Why is memory management so visible in Java VM? - java

I'm playing around with writing some simple Spring-based web apps and deploying them to Tomcat. Almost immediately, I run into the need to customize the Tomcat's JVM settings with -XX:MaxPermSize (and -Xmx and -Xms); without this, the server easily runs out of PermGen space.
Why is this such an issue for Java VMs compared to other garbage collected languages? Comparing counts of "tune X memory usage" for X in Java, Ruby, Perl and Python, shows that Java has easily an order of magnitude more hits in Google than the other languages combined.
I'd also be interested in references to technical papers/blog-posts/etc explaining design choices behind JVM GC implementations, across different JVMs or compared to other interpreted language VMs (e.g. comparing Sun or IBM JVM to Parrot). Are there technical reasons why JVM users still have to deal with non-auto-tuning heap/permgen sizes?

The title of your question is misleading (not on purpose, I know): PermSize issues (and there are a lot of them, I was one of the first one to diagnose a Tomcat/Sun PermGen issue years ago, when there wasn't any knowledge on the issue yet) are not a Java specifity but a Sun VM specifity.
If you use a VM that doesn't use permanent generation (like, say, an IBM VM if I'm not mistaken) you cannot have permgen issues.
So it's is not a "Java" problem, but a Sun VM implementation problem.

Java gives you a bit more control about memory -- strike one for people wanting to apply that control there, vs Ruby, Perl, and Python, which give you less control on that. Java's typical implementation is also very memory hungry (because it has a more advanced garbage collection approach) wrt the typical implementations of the dynamic languages... but if you look at JRuby or Jython you'll find it's not a language issue (when these different languages use the same underlying VM, memory issues are pretty much equalized). I don't know of a widespread "Perl on JVM" implementation, but if there's one I'm willing to bet it wouldn't be measurably different in terms of footprint from JRuby or Jython!

Python/Perl/Ruby allocate their memory with malloc() or an optimization thereof. The limit to the heap space is determined by the operating system rather than the VM, so there's no need for options like -Xmxn. Also, the garbage collection is simpler, based mostly on reference counting. So there's a lot less to fine-tune.
Furthermore, dynamic languages tend to be implemented with bytecode interpreters rather than JIT compilers, so they aren't used for performance-critical code anyway.

The essence of #WizardOfOdds and #Alex-Martelli's answers appear to be correct: Java has an advanced set of GC options, and sometimes you need to tune them. However, I'm still not entirely clear on why you might design a JVM with or without a permanent generation. I have found a bunch of useful links about garbage collection in Java, though not necessarily in comparison to other languages with GC. Briefly:
The Sun GC evolves very slowly due to the fact that it is deployed everywhere and people may rely on quirks in its implementation.
Sun has detailed white papers on GC design and options, such as Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine.
There is a new GC in the wings, called the G1 GC. Alex Miller has a good summary of relevant blog posts and a link to the technical paper. But it still has a permanent generation (and doesn't necessarily do a great job with it).
Jon Masamitsu has (had?) an interesting blog at Sun various details of garbage collection.
Happy to update this answer with more details if anyone has them.

This is because Tomcat is running in the Java Virtual Machine, while other languages are either compiled or interpreted and run against your actual machine. When you set -Xmx and -Xms you are saying that you want to JVM to run like a computer with am amount of ram somewhere in the set range.
I think the reason so many people run in to this is that the default values are relatively low and people end up hitting the default ceiling pretty quickly (instead of waiting until you run out of actual ram as you would with other languages).

Related

What versions of java are slow for gc logging?

I've been told by my company's support team that some versions of java have a significant performance impact when we turn on -verbose:gc. However I can't figure out if this is the case or not.
Was this logging slow(ish) at some point, and when did it stop?
The reason I ask is that there's some hesitation about applying this to a production environment to investigate potential memory leaks (and whether we can stop doing periodic restarts of the system...).
Specifically I'm talking about Java 1.4.2 which I think introduced the argument, and what service pack it applies up to.

I know you asked about the impact of verbose:gc (Amir is correct), but based on the comments I see you are investigating a memory leak.
Is it possible for you to get a histogram of your environment? verbose GC will only show you that there is a memory leak, not where the memory is sitting.
you mention java 1.4.2, is that your current version? If you are using 1.5 or higher you can use
jmap -histo <pid> > file.txt
This will give you a breakdown of all the objects in memory. You will freeze your JVM for a time dependent on the amount of memory in the system. (2GB can freeze for a minute or so on even good hardware) test this on a development system first. I know you don't want to impact your production environment but this is a necessary evil to find the source of the problem. Do a capture right before the periodic restart to lesson your impact.

I suggest that you do the following:
Write some benchmark that is likely to stress the garbage collection. (Create large linked data structures with weak references, etc, etc).
Install a copy of the same version of the JVM as you are using in production on some test box.
Run the benchmark with various GC logging settings, including the settings that you want to run in production, measuring the performance impact on the benchmark.
If you do this right, it will give you some solid evidence about what the likely performance impact will be for your production server.

.NET runtime vs. Java Hotspot: Is .NET one generation behind?

According to the information I could gather on .NET and Java execution environment, the current state of affairs is follows:
Modern Java VM are capable of performing continuous recompilation, which combined with profiling can yield great performance improvements. Older JVMs employed JIT.
More information in this article:
http://www.ibm.com/developerworks/library/j-jtp12214/ and especially: Java theory and practice: Dynamic compilation and performance measurement
.NET uses JIT or NGEN to generate native code, but once the native code is generated, no further (runtime) optimizations are performed.
Benchmarks aside and with no intention to escalate holy wars, does this mean that Java Hotspot VM is one generation ahead of .Net. Will these technologies employed at Java VM eventually find its way into .NET runtime?

They follow two different strategies. I do not think one is better than the other.
.NET does not interpret bytecode, so it has to JIT everything as is gets executed and therefore cannot optimise heavily due to time constraints. If you need heavy optimizations in some part of the code, you can always NGEN it manually, or do a fast but unsafe implementation. Furthermore, calling native code is easy. The approach here seems to be getting the runtime good enough and manually optimise bottlenecks.
Modern JVMs will usually interpret most of the code, and then do an optimized compilation of the bottlenecks. This usually gets better results than straight JIT'ing, but if you need more, you don't have unsafe in Java, and calling native code is not nice. So the approach here is to do as much automatic optimising as possible, because the other options are not that good.
In reality Java applications tend to perform slightly better in time and worse in space when compared to .NET.

I've never benchmarked the two to compare, and I'm more familiar with the Sun JVM, I can only speak in general terms about JITs.
There are always tradeoffs with optimizations, and not all optimizations work all the time. However, here are some modern JIT techniques. I think this can be the beginning of a good conversation if we stick to the technical stuff:
escape analysis
intrinsics
http://bugs.sun.com/view_bug.do?bug_id=6823354
http://weblog.ikvm.net/CommentView.aspx?guid=0404dd8a-88a8-4d62-9bcb-98324d57a2a9
tail-call optimization
on-stack replacement
lock coarsening
lock elision
multi-threaded garbage collection
low-pause garbage collection
polymorphic method call removal
fast heap allocation
There's also features that are helpful as far as good implementations of a VM go:
being able to pick between GC
implementations customization of each GC
heap allocation parameters (such as growth)
page locking
Based on these features and many more, we can compare VMs, and not just "Java" versus ".NET" but, say, Sun's JVM versus IBM's JVM versus .NET versus Mono.
For example, Sun's JVM doesn't do tail-call optimization, IIRC, but IBM's does.

Apparently someone was working on something similar for Rotor. I don't have access to IEEE so I can't read the abstract.
Dynamic recompilation and profile-guided optimisations for a .NET JIT compiler
Quote from Summary...
An evaluation of the framework using a
set of test programs shows that
performance can improve by a maximum
of 42.3% and by 9% on average. Our
results also show that the overheads
of collecting accurate profile
information through instrumentation to
an extent outweigh the benefits of
profile-guided optimisations in our
implementation, suggesting the need
for implementing techniques that can
reduce such overheads.

You may be interested in SPUR which is a Tracing JIT compiler. The focus is on javascript but it operates on CIL not the language itself. It is a research project based on Bartok not the standard .NET VM. The paper has some performance benchmarks showing 'it consistently performs faster than SPUR-CLR' which is the standard 3.5 CLR. There haven't been any announcements about it's future relating to the current VM however. Traces can cross method boundaries which is not something HotSpot does AFAIK, JVM tracing JITs are mentioned here.
I'd be hesitant to say the .NET VM is a generation behind especially when considering all the sub-systems, in particular generics. How the GC and DLR vs invokedynamic compare I'm unsure but there are lots of details about them at places like channel9.

JVM with no garbage collection

I've read in many threads that it is impossible to turn off garbage collection on Sun's JVM. However, for the purpose of our research project we need this feature. Can anybody recommend a JVM implementation which does not have garbage collection or which allows turning it off? Thank you.

I wanted to find a fast way to keep all objects in memory for a simple initial proof of concept.
The simple way to do this is to run the JVM with a heap that is so large that the GC never needs to run. Set the -Xmx and -Xms options to a large value, and turn on GC logging to confirm that the GC doesn't run for the duration of your test.
This will be quicker and more straightforward than modifying the JVM.
(In hindsight, this may not work. I vaguely recall seeing evidence that implied that the JVM does not always respect the -Xms setting, especially if it was really big. Still, this approach is worth trying before trying some much more difficult approach ... like modifying the JVM.)
Also, this whole thing strikes me as unnecessary (even counter-productive) for what you are actually trying to achieve. The GC won't throw away objects unless they are garbage. And if they are garbage, you won't be able to use them. And the performance of a system with GC disabled / negated is not going to indicative of how a real application will perform.
UPDATE - From Java 11 onwards, you have the much simpler option of using the Epsilon (no-op) garbage collector; see
JEP 318: Epsilon: A No-Op Garbage Collector (Experimental)
You add the following options when you launch the JVM:
-XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC
When the heap is filled, no attempt is made to collect garbage. Instead, the Epsilon GC terminates the JVM.

Depending on your needs this could perhaps work:
Using the -Xbootclasspath option you may specify your own implementation of API classes. You could then for instance override the implementation of Object, and add to the constructor, a globalList.add(this) to prevent the objects from being garbage collected. It's a hack for sure, but for simple case-study it's perhaps sufficient.
Another option is to take an open source jvm and comment out the parts that initiate garbage collection. I would guess it is not that complicated.

Sun's JVM has no such option. AFAIK, no other JVM has this option either.
You did not state what it is that you are exactly trying to achieve but you have one of two options: either use a profiler and see exactly what the GC is doing, that way you can take its effects into consideration. The other is to compile one of the JVMs from source, and disable GC from there.

You can only turn off the GC if its not actually needed (otherwise your application would run out of memory) and if you didn't need to GC, it shouldn't run anyway.
The simplest option would be to not discard any objects, this will avoid GC being performed (And set the max memory very high so you don't run out).
You may find that you get GCs on startup and you may consider a no-GC when running acceptable.

the question is old but for those who might be interested, there is a proposal to
Develop a GC that only handles memory allocation, but does not implement any actual memory reclamation mechanism. Once available Java heap is exhausted, perform the orderly JVM shutdown.
JEP draft: Epsilon GC: The Arbitrarily Low Overhead Garbage (Non-)Collector

Maybe you could try making your VM's available memory sufficient for GC never to be run.
My (allbeit limited) experience leads me to suggest that the VM is, by default, extremely lazy and extremely reluctant to run GC.
giving -Xmx 16384M (or some such) and making sure that your research subject stays well below that limit, might give you the environment you wish to obtain, allthough even then it will obviously not be guaranteed.

There actually exists a dirty hack to temporarily pause GC. First create a dummy array in Java. Then, in JNI, use GetPrimitiveArrayCritical function to get hold of the pointer to the array. The Sun JVM will disable GC to ensure that the array is never moved and the pointer stays valid. To re-enable GC, you can call the ReleasePrimitiveArrayCritical function on the pointer. But this is very implementation specific since other VM impl may pin the object instead of disabling GC entirely. (Tested to work on Oracle Jdk 7 & 8)

Take a look at Oracle's JRockit JVM. I've seen very good near-deterministic performance on Intel hardware with this JVM and you can prod and poke the runtime using the Mission Control utility to see how well it's performing.
Though you can't turn GC off completely, I believe that you can use the -Xnoclassgc option to disable the collection of classes. The GC can be tuned to minimize latency at the expense of leaving memory consumption to grow. You may need a license to drop the latency as low as you need if you're going this route.
There is also a Realtime version of the JRockit JVM available but I don't think that there is a free-to-developers version of this available.

Can you get an open source JVM and disable its GC, for example Sun's Hotspot?
If there was no Garbage Collection what would you expect to be the semantics of code like this?
public myClass {
public void aMethod() {
String text = new String("xyz");
}
}
In the absence of GC any item newed and with a stack scoped reference could never be reclaimed. Even if your own classes could decide not to use local variables like this, or to use only primitive types I don't see how you would safely use any standard Java library.
I'd be interested to hear more about your usage scenario.

If I had this problem I would get IBM's Jikes Research Virtual Machine because:
The run-time system is written in Java itself (with special extensions)
The whole thing was designed as a research vehicle and is relatively easy to tweak.
You can't turn off GC forever, because Java programs do allocate and eventually you'll run out of memory, but it's quite possible that you can delay GC for the duration of your experiment by telling the JVM not to start collecting until the heap gets really big. (That trick might work on other JVMs as well, but I wouldn't know where to find the knobs to start twirling.)

Best OS to deploy a low latency Java application?

We have a low latency trading system (feed handlers, analytics, order entry) written in Java. It uses TCP and UDP extensively, it does not use Infiniband or other non-standard networking.
Can anyone comment on the tradeoffs of various OSes or OS configurations to deploy this system? While throughput is obviously important to keep up with modern price feeds, latency is our #1 priority.
Solaris seems like a natural candidate since they created Java; should I use Sparc or x64 processors?
I've heard good things about RHEL and SLERT, are those the right versions of Linux to use in our benchmarking.
Has anyone tested Windows against the above OSes? Or is it assumed to not keep up?
I'd like to leave the Java vs C++ debate for a different thread.

Vendors love this kind of benchmark. You have code, right?
IBM, Sun/Oracle, HP will all love to run your app on their gear to demonstrate their advantages.
Make them do this. If you have code, make the vendors run a demonstration on their gear to show which is best for your needs.
It's easy, painless, free, and factual. The final decision will be easy and obvious. And you will know how to install and tune to maximize performance.
What I hate doing is predicting this kind of thing before the code is written. Too many customers have asked for a H/W and OS recommendation before we've finished identifying all the use cases. Asking for that kind of precognition is simple craziness.
But you have code. You can produce test cases that exercise your code. That's perfect.

For a trading environment, in addition to low latency you are probably concerned about consistency as well as latency so focusing on reducing the impact of GC pauses as much as possible may well give you more benefit than differnt OS choices.
The G1 garbage collector in recent versions of Suns Hotspot VM improves stop the world pauses a lot, in a similar way to the JRockit VM
For real performance guarantees though, Azul Systems version of the Hotspot compiler on their Java Appliance delivers the lowest guaranteed pauses available - also it scales to a massive size - 100s of GB stack and 100s of cores.
I'd discount Java Realtime - although you'd get guarantees of response, you'd sacrifice throughput to get those guarantees
However, if your planning on using your trading system in an environment where every microsecond counts, you're really going to have to live with the lack of consistency you will get from the current generation of VM's - none of them (except realtime) guarantees low microsecond GC pauses. Of course, at this level your going to run into the same issues from OS activity (process pre-emption, interrupt handling, page faults, etc.). In this case one of the real time variants of Linux is going to help you.

I wouldn't rule out Windows from this just because it's Windows. My expirience over the last few years has been that the Windows versions of the Sun JVM was usually the most mature performance wise in contrast to Linux or Soaris x86 on the same hardware. The JVM for Solaris SPARC may be good too, but I guess with Windows on x86 you'll get more power for less money.

I would strongly recommend that you look into an operating system you already have experience with. Solaris is a strange beast if you only know Linux, e.g.
Also I would strongly recommend to use a platform actually supported by Sun, as this will make it much easier to get professional assistance when you REALLY, REALLY need it.
http://java.sun.com/javase/6/webnotes/install/system-configurations.html

I'd probably worry about garbage collection causing latency well before the operating system; have you looked into tuning that at all?
If I were willing to spend the time to trial different OSs, I'd try Solaris 10 and NetBSD, and probably a Linux variant for good measure.
I'd experiment with 32-vs-64 bit architectures; 64 bit will give you a larger heap address space... but will take longer to address each bit of memory.
I'm assuming you've profiled your application and know where the bottlenecks are; by the comment about GC, you've done that. In that case, your application shouldn't be CPU-bound, and chip architecture shouldn't be a primary concern.

I don't think managed code environments and real-time processing go together very well. If you really care about latency, remove the layer imposed by the managed code. This is not a Java vs C++ argument, but a Java/C#/... vs C/C++/FORTRAN/... argument, and I believe that is a valid design discussion to have.
And yes, I do mean FORTRAN, we run a number of near real-time systems with a FORTRAN foundation.

One way to manage latency is to have several JVM's dividing the work with smaller heaps so that a stop the world garbage collection isn't as time consuming when it happens and affects less processes.
Another approach is to load up a cluster of JVM's with enough memory and allocate the processes to ensure there won't be a stop the world garbage collection during the hours you care about latency (if this isn't a 24/7 app), and restart JVMs on off hours.
You should also look at other JVM implementations as a possibility (such as JRocket). Of course if any of them are appropriate depends entirely on your specific application.
If any of the above matters to your approach, it will affect the choice of OS. For example, if you go with another JVM implementation, that might limit OS choices, and if you go with clustering or otherwise running a several JVM's for the application, that might require some better underlying OS tools to manage effectively, further influencing the OS choice.

The choice of operating system or configurable is completely redundant considering the availability of faster network fabrics.
Look at 10GigE with ToE NICs, or the faster solution of 4X QDR (40Gbs) InfiniBand but with IPoIB presenting a standard Ethernet interface and routing.

Why does Java have such a large footprint?

Java - or at least Sun's Hotspot JVM - has long had a reputation for having a very large memory footprint. What exactly is it about the JVM that gives it this reputation? I'd be interested in a detailed breakdown: how much memory goes to the runtime (the JIT? the GC/memory management? the classloader?) anything related to "auxiliary" APIs like JNI/JVMTI? the standard libraries? (which parts get how much?) any other major components?
I realize that this may not be straightforward to answer without a concrete application plus VM configuration, so just to narrow things down at least somewhat: I'm primarily interested in default/typical VM configurations, and in a baseline console "Hello world" app as well as any real-world desktop or server app. (I'm suspecting that a substantial part of the JVM's footprint is largely independent of the app itself, and it is in this part that I'd like to zoom in, ideally.)
I have a couple of other closely related questions:
Other similar technology, such as .NET/mono, don't exhibit nearly the same footprint. Why is this the case?
I've read somewhere on the intarwebs that a large portion of the footprint is due simply to the size of the standard libraries. If this is the case, then why is so much of the standard libraries being loaded up front?
Are there any efforts (JSRs, whatever) to tame the memory footprint? The closest thing I've come across is a project to reduce the on-disk footprint of the JVM.
I'm sure that the footprint has varied over the past decade or so with every new version of Java. Are there any specific numbers/charts chronicling precisely how much the JVM's footprint has changed?

Some initiatives:
Since 1.5 class data sharing can be used;
Java 6 update 14 brought in compressed oops which reduces the footprint of 64-bit JVMs with under 4GB of Heap.

We have some server-side apps which do nothing but bridge multicast traffic (i.e. they have no permanent state). They all run with about 2.3 - 2.5 Mb of Heap on a 32-bit Java6 (linux) JRE.
Is this a big footprint? I could easily have a thousand of these on a typical server-class machine (from a memory perspective), although that would be bit pointless from a threading perspective!
That said, there is the Jigsaw project to modularize the VM (the libraries I believe) which is coming in Java7; this will help those who wish for smaller footprints.
I realize that this doesn't really answer your question but it is relevant nonetheless! What sort of applications are you designing where you are finding that memory footprint is an issue?

At least one thing is Java's long history - it started in 1995 and is now version 6. Keeping backwards compatibility while adding features inevitably inflates its footprint. The image below tells pretty much...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.