Experience of moving to 64 bit JVM

Experience of moving to 64 bit JVM - java

Our company is planning to move to 64 bit JVM in order to get away from 2 GB maximum heap size limit. Google gave me very mixed results about 64 bit JVM performance.
Has anyone tried moving to 64 bit java and share your experience

In a nutshell: 64-bit JVMs will consume more memory for object references and a few other types (generally not significant), consume more memory per thread (often significant on high-volume sites) and enable you to have larger heaps (generally only important if you have many long-lived objects)
Longer Answers/Comments:
The comment that Java is 32-bit by
design is misleading. Java
memory-addressing is either 32, or
64-bit, but the VM spec ensures that
most fields (e.g. int, long, double,
etc.) are the same regardless.
Also - the GC tuning comments while
pertinent for number of objects, may
not be relevant, GC can be quick on
JVMs with large heaps (I've worked
with heaps up to 15GB, with very
quick GC) - it depends more on how
you play with the generational
collector schemes, and what your
object usage pattern is. While in the
past people have spent lots of energy
tuning parameters, it's very workload
dependent, and modern (Java 5+) JVMs
are very good at self-tuning - unless
you have lots of data you're more
likely to harm yourself than help
with aggresive JVM tuning.
As mentioned on x86 architectures,
the 64-bit EMT64 or x64 processors
also include new instructions for
doing things like atomic writes, or
other options which may also impact
high-performance applications.

If you need the larger heap, then questions of performance are rather moot, aren't they? Or do you have a plan for horizontal scaling?
The main problem that I've heard with 64-bit apps is that a full garbage collection can take a very long time (because it's based on number of live objects). So you want to carefully tune the GC parameters to avoid full collections (I've heard one anecdote about a company that had 64 Gb of heap, and tuned their GC so that they'd never to a full GC; they'd simply shut down once a week).
Other than that, recognize that Java is 32-bit by design, so you're not likely to see any huge performance increase from moving data 64 bits at a time. And you're still limited to 32-bit array indices.

Works fine for us. Why don't you simply try setting it up and run your load test suite under a profiler like jvisualvm?

We've written directly to 64bit and I can see no adverse behavior...

Naively taking 32 bit JVM workloads and putting them on 64 bit produces a performance and space hit in my experience.
However, Most of the major JVM vendors have now implemented a good technology that essentially compresses some of the heap - it's called compressed references or compressed oops for 64 bit JVMs that aren't "big" (ie: in the 4-30gb range).
This makes a big difference and should make a 32->64 transition much lower impact.
Reference for the IBM JVM: link text

Related

Java Heaps and GC on Multi-Processor machines?

How does Java deal with GC and Heap Allocation on multi-processor machines?
In the reading I've done, there doesn't seem to be any difference in the algorithms used between single and multi-processor systems. The art & science of GC tuning is Java seems fairly mature, yet I can't find anything related to this in any of the common JVM implementations.
As a data point, in .Net, the algorithm changes significantly: There's a heap affinitized to each processor, and each processor is responsible for that heap. This is documented in a number of places such as MSDN:
Scalable Collections On a multiprocessor system running the server
version of the execution engine (MSCorSvr.dll), the managed heap is
split into several sections, one per CPU. When a collection is
initiated, the collector has one thread per CPU; all threads collect
their own sections simultaneously. The workstation version of the
execution engine (MSCorWks.dll) doesn't support this feature.
Any insight that can be offered into Java GC tuning specifically for multi processor systems is also of interest to me.

Indeed, in Hotspot JVM, the way that the heap is used does not depend on the heap size or number of cores.
Each thread (not processor) has a Thread Local Allocation Buffer (TLAB) so that object allocation is cheap and thread-safe. This memory space is kind of identical to that heap-processor-affinity you are mentionning.
You can also activate Non-Uniform Memory Access (NUMA). The idea behind NUMA is to prefer the RAM that is close to a CPU chip to store objects instead of considering the entire heap as a uniform space.
Finally, the GC are multi-threaded and scale on your number of cores, so they take advantage of your hardware.

Garbage collection is an implementation specific concept. Different JVMs (IBM, Oracle, OpenJDK, etc.) have different implementations, and different mechanisms are available in different versions too. Often you can select which mechanism you want to use when you start your Java program.
Similar questions here....
These details are often given in the documentation for the commandline options for your JRE:
IBM JDK Here
Oracle JRE Options here

64-bit JVM as good as 32-bit for mission critical workloads? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I asked the same question in a different way and the question was closed : https://stackoverflow.com/questions/7231460/java-64-bit-or-32-bit
This is my 2nd try at getting an objective answer(s).
We were contemplating moving our product to 64-bit Java for those customers who are pushing the boundaries of the 32-bit server JVM on Solaris (SPARC) and Linux (RHEL 5.x). Our director asked "some years ago, 64-bit wasn't quite there. How about now?"
For those customers not pushing the 4 GB boundary, will using 64-bit JVM have adverse effects in terms of performance? If yes, how much? We create a lot of objects. (we don't want to support 32-bit and 64-bit JVMs at the same time. It's a either or situation, preferably).
For those pushing the 4 GB boundary, can we expect the JVM to be as stable as the 32-bit one?
Will performance be an issue? If yes, how much? we create a lot of objects.
What GC tuning techniques are new ?
Profilers: are they quite there for profiling 64-bit JVM apps?
UPDATE : To those commenters and those who closed my earlier question, I believe I NOW understand your angst. At the same time, I believe some of you made some (untrue) assumptions that I was lazy or a suit who has no clue. I'll post my findings after my investigations. Thanks to all those who gave me real pointers.

Since you can pretty much do the testing yourself, I'm assuming you are expecting "real life" answers as to the experience of using a 32/64 bit JVM.
I'm part of a team which creates financial application for banks. We use 32 bit JVM's for development on Windows and almost all our server applications run on RHEL which runs a 64 bit JVM. Performance was never a problem for us since we use decent machines (our regular server machine use a 32 core AMD box with at least 32 GiB RAM).
As expected, the heap size would go up due to the difference in pointer sizes but since now the limit is raised from 4GiB, it again doesn't bother us. Our application also creates a lot of objects. We have no problem debugging our application with either VisualVM or command line tools. Regarding GC settings, we use the default settings and try to change only when we measure that it's actually the GC creating problems. Allocate a heap size which is much more than your application would use (8GiB) is common for us and you would see a single big GC sweep around once a day which is pretty good.
I've heard that the JDK 7 has a new GC algorithm called G1 garbage collector which is more suited for server applications so you should totally give it a try.
All that being said, your best bet would be to measure as much as possible (using 32 v/s 64 bit on 32 and 64 bit machines respectively) and then decide. FYI, our organization is totally pushing forward with the 64 bit JVM's given that most stock server hardware these days is 64 bits and 4GiB is pretty restrictive for a modern server application (at least in our domain).

We were contemplating moving our product to 64-bit java for those customers who are pushing the boundaries of the 32- bit server JVM on Solaris(SPARC) and Linux(RHEL 5.x). Our director asked "some years ago, 64-bit wasn't quite there. How about now ?"
Sun have been doing 64-bit for much longer than Windows. Solaris 2.5 (1995 the same year Windows 95 was released) wasn't as reliable in 64-bit as it could have been. Many people still on SunOS (32-bit) didn't see the point and few machine has enough memory to matter. Solaris 2.6 (1997) saw the first significant migration to the 64-bit platform. I didn't use Java seriously until 1999 (on Solaris) and at that point 64-bit what already established in my mind.
1) For those customers not pushing the 4 GB boundary, will using 64-bit JVM have adverse effects in terms of performance ? if yes, how much ?
The 64-bit JVM has registers twice the size and twice as many. If you use long alot you can see a dramatic improvement, however for typical applications the difference is 5-10% either way.
we create a lot of objects.
IMHO Performance hasn't been much of an issue for you if this isn't recognised as a problem for you. Use any profiler and there are two reports CPU and Memory usage. Often examining the memory profile makes more of a performance difference. (See below)
(we preferably don't want to support 32-bit and 64-bit JVMs at the same time. It's a either or situation, preferably)
Can't say there is much difference. What do you imagine is the overhead of supporting each. The code is exactly the same, from your point of view it might increase testing slightly. It not much different to supporting two versions of Java 6.
2) For those pushing the 4 GB boundary, can we expect the JVM to be as stable as 32-bit ?
Having used the 64-bit version since 1999 I can't remember an occasion where using the 32-bit would have made things better (only worse due to limited memory)
will performance be an issue ? if yes, how much ? we create a lot of objects.
If performance is an issue, discard less objects.
what GC tuning techniques are new ?
You can set the maximum memory size higher. That's about it. As long as you are below 32 GB there won't be a noticable increase in memory usage as it uses 32-bit references.
One thing I do is set the Eden size to 8 GB for a new application and reduce it if its not needed. This can dramatically reduce GC times. (To as low as once per day ;) This wouldn't be an option with a 32-bit JVM.
profilers : are they quite there for profiling 64-bit JVM apps ?
The VisualVM is pure Java and AFAIK works exactly the same. YourKit uses a native library and might need to ensure you are using the right version (it normally sets this up for you, but if you mess with your environment you might need to know there are two versions of the agent)
If you are worried about performance, don't create so many objects. You might be surprised how much slower creating objects freely makes in real world applications. It can slow an application by 2x to 10x or more. When I optimise code the first thing I do is reduce the discarding of object and I expect at least a three fold performance improvement.
By comparison using 64-bit vs 32-bit is likely to be 5%-10% difference. It can be faster or slower and both are just as likely. In terms of bloating your memory, use the latest JVMs and that is unlikely to be noticeable. This is because the 64-bit JVM uses 32-bit references by default when you use less than 32 GB of memory. The header overhead is still slightly higher but objects are not much bigger in 64-bit when -XX:+UseCompressedOops is on (the default for the latest releases).
Java: All about 64-bit programming
Test the size of common objects using 32-bit vs 64-bit JVMs Java: Getting the size of an Object
A extreme example of doing the same thing creating lots of objects and reflection vs not creating any and using dynamically generated code. 1000x performance improvement. Avoiding Java Serialization to increase performance
Using heap less memory can massively reduce your GC times. Collections Library for millions of elements
Using heap less memory can allow your application to use much more memory instead of passing data to another application Should server applications limit themselves to 4 GB?

There are so many aspects of your question that are dependent on your application that "real world" experiences with other peoples' applications are unlikely to tell you anything worthwhile with any degree of certainty.
My advice would be to stop wasting your time (and ours) asking what MIGHT happen, and just see what DOES happen when you use a 64-bit JVM instead of a 32-bit one. You are going to have to do the testing anyway ...

Why is memory management so visible in Java VM?

I'm playing around with writing some simple Spring-based web apps and deploying them to Tomcat. Almost immediately, I run into the need to customize the Tomcat's JVM settings with -XX:MaxPermSize (and -Xmx and -Xms); without this, the server easily runs out of PermGen space.
Why is this such an issue for Java VMs compared to other garbage collected languages? Comparing counts of "tune X memory usage" for X in Java, Ruby, Perl and Python, shows that Java has easily an order of magnitude more hits in Google than the other languages combined.
I'd also be interested in references to technical papers/blog-posts/etc explaining design choices behind JVM GC implementations, across different JVMs or compared to other interpreted language VMs (e.g. comparing Sun or IBM JVM to Parrot). Are there technical reasons why JVM users still have to deal with non-auto-tuning heap/permgen sizes?

The title of your question is misleading (not on purpose, I know): PermSize issues (and there are a lot of them, I was one of the first one to diagnose a Tomcat/Sun PermGen issue years ago, when there wasn't any knowledge on the issue yet) are not a Java specifity but a Sun VM specifity.
If you use a VM that doesn't use permanent generation (like, say, an IBM VM if I'm not mistaken) you cannot have permgen issues.
So it's is not a "Java" problem, but a Sun VM implementation problem.

Java gives you a bit more control about memory -- strike one for people wanting to apply that control there, vs Ruby, Perl, and Python, which give you less control on that. Java's typical implementation is also very memory hungry (because it has a more advanced garbage collection approach) wrt the typical implementations of the dynamic languages... but if you look at JRuby or Jython you'll find it's not a language issue (when these different languages use the same underlying VM, memory issues are pretty much equalized). I don't know of a widespread "Perl on JVM" implementation, but if there's one I'm willing to bet it wouldn't be measurably different in terms of footprint from JRuby or Jython!

Python/Perl/Ruby allocate their memory with malloc() or an optimization thereof. The limit to the heap space is determined by the operating system rather than the VM, so there's no need for options like -Xmxn. Also, the garbage collection is simpler, based mostly on reference counting. So there's a lot less to fine-tune.
Furthermore, dynamic languages tend to be implemented with bytecode interpreters rather than JIT compilers, so they aren't used for performance-critical code anyway.

The essence of #WizardOfOdds and #Alex-Martelli's answers appear to be correct: Java has an advanced set of GC options, and sometimes you need to tune them. However, I'm still not entirely clear on why you might design a JVM with or without a permanent generation. I have found a bunch of useful links about garbage collection in Java, though not necessarily in comparison to other languages with GC. Briefly:
The Sun GC evolves very slowly due to the fact that it is deployed everywhere and people may rely on quirks in its implementation.
Sun has detailed white papers on GC design and options, such as Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine.
There is a new GC in the wings, called the G1 GC. Alex Miller has a good summary of relevant blog posts and a link to the technical paper. But it still has a permanent generation (and doesn't necessarily do a great job with it).
Jon Masamitsu has (had?) an interesting blog at Sun various details of garbage collection.
Happy to update this answer with more details if anyone has them.

This is because Tomcat is running in the Java Virtual Machine, while other languages are either compiled or interpreted and run against your actual machine. When you set -Xmx and -Xms you are saying that you want to JVM to run like a computer with am amount of ram somewhere in the set range.
I think the reason so many people run in to this is that the default values are relatively low and people end up hitting the default ceiling pretty quickly (instead of waiting until you run out of actual ram as you would with other languages).

Best OS to deploy a low latency Java application?

We have a low latency trading system (feed handlers, analytics, order entry) written in Java. It uses TCP and UDP extensively, it does not use Infiniband or other non-standard networking.
Can anyone comment on the tradeoffs of various OSes or OS configurations to deploy this system? While throughput is obviously important to keep up with modern price feeds, latency is our #1 priority.
Solaris seems like a natural candidate since they created Java; should I use Sparc or x64 processors?
I've heard good things about RHEL and SLERT, are those the right versions of Linux to use in our benchmarking.
Has anyone tested Windows against the above OSes? Or is it assumed to not keep up?
I'd like to leave the Java vs C++ debate for a different thread.

Vendors love this kind of benchmark. You have code, right?
IBM, Sun/Oracle, HP will all love to run your app on their gear to demonstrate their advantages.
Make them do this. If you have code, make the vendors run a demonstration on their gear to show which is best for your needs.
It's easy, painless, free, and factual. The final decision will be easy and obvious. And you will know how to install and tune to maximize performance.
What I hate doing is predicting this kind of thing before the code is written. Too many customers have asked for a H/W and OS recommendation before we've finished identifying all the use cases. Asking for that kind of precognition is simple craziness.
But you have code. You can produce test cases that exercise your code. That's perfect.

For a trading environment, in addition to low latency you are probably concerned about consistency as well as latency so focusing on reducing the impact of GC pauses as much as possible may well give you more benefit than differnt OS choices.
The G1 garbage collector in recent versions of Suns Hotspot VM improves stop the world pauses a lot, in a similar way to the JRockit VM
For real performance guarantees though, Azul Systems version of the Hotspot compiler on their Java Appliance delivers the lowest guaranteed pauses available - also it scales to a massive size - 100s of GB stack and 100s of cores.
I'd discount Java Realtime - although you'd get guarantees of response, you'd sacrifice throughput to get those guarantees
However, if your planning on using your trading system in an environment where every microsecond counts, you're really going to have to live with the lack of consistency you will get from the current generation of VM's - none of them (except realtime) guarantees low microsecond GC pauses. Of course, at this level your going to run into the same issues from OS activity (process pre-emption, interrupt handling, page faults, etc.). In this case one of the real time variants of Linux is going to help you.

I wouldn't rule out Windows from this just because it's Windows. My expirience over the last few years has been that the Windows versions of the Sun JVM was usually the most mature performance wise in contrast to Linux or Soaris x86 on the same hardware. The JVM for Solaris SPARC may be good too, but I guess with Windows on x86 you'll get more power for less money.

I would strongly recommend that you look into an operating system you already have experience with. Solaris is a strange beast if you only know Linux, e.g.
Also I would strongly recommend to use a platform actually supported by Sun, as this will make it much easier to get professional assistance when you REALLY, REALLY need it.
http://java.sun.com/javase/6/webnotes/install/system-configurations.html

I'd probably worry about garbage collection causing latency well before the operating system; have you looked into tuning that at all?
If I were willing to spend the time to trial different OSs, I'd try Solaris 10 and NetBSD, and probably a Linux variant for good measure.
I'd experiment with 32-vs-64 bit architectures; 64 bit will give you a larger heap address space... but will take longer to address each bit of memory.
I'm assuming you've profiled your application and know where the bottlenecks are; by the comment about GC, you've done that. In that case, your application shouldn't be CPU-bound, and chip architecture shouldn't be a primary concern.

I don't think managed code environments and real-time processing go together very well. If you really care about latency, remove the layer imposed by the managed code. This is not a Java vs C++ argument, but a Java/C#/... vs C/C++/FORTRAN/... argument, and I believe that is a valid design discussion to have.
And yes, I do mean FORTRAN, we run a number of near real-time systems with a FORTRAN foundation.

One way to manage latency is to have several JVM's dividing the work with smaller heaps so that a stop the world garbage collection isn't as time consuming when it happens and affects less processes.
Another approach is to load up a cluster of JVM's with enough memory and allocate the processes to ensure there won't be a stop the world garbage collection during the hours you care about latency (if this isn't a 24/7 app), and restart JVMs on off hours.
You should also look at other JVM implementations as a possibility (such as JRocket). Of course if any of them are appropriate depends entirely on your specific application.
If any of the above matters to your approach, it will affect the choice of OS. For example, if you go with another JVM implementation, that might limit OS choices, and if you go with clustering or otherwise running a several JVM's for the application, that might require some better underlying OS tools to manage effectively, further influencing the OS choice.

The choice of operating system or configurable is completely redundant considering the availability of faster network fabrics.
Look at 10GigE with ToE NICs, or the faster solution of 4X QDR (40Gbs) InfiniBand but with IPoIB presenting a standard Ethernet interface and routing.

Why does Java have such a large footprint?

Java - or at least Sun's Hotspot JVM - has long had a reputation for having a very large memory footprint. What exactly is it about the JVM that gives it this reputation? I'd be interested in a detailed breakdown: how much memory goes to the runtime (the JIT? the GC/memory management? the classloader?) anything related to "auxiliary" APIs like JNI/JVMTI? the standard libraries? (which parts get how much?) any other major components?
I realize that this may not be straightforward to answer without a concrete application plus VM configuration, so just to narrow things down at least somewhat: I'm primarily interested in default/typical VM configurations, and in a baseline console "Hello world" app as well as any real-world desktop or server app. (I'm suspecting that a substantial part of the JVM's footprint is largely independent of the app itself, and it is in this part that I'd like to zoom in, ideally.)
I have a couple of other closely related questions:
Other similar technology, such as .NET/mono, don't exhibit nearly the same footprint. Why is this the case?
I've read somewhere on the intarwebs that a large portion of the footprint is due simply to the size of the standard libraries. If this is the case, then why is so much of the standard libraries being loaded up front?
Are there any efforts (JSRs, whatever) to tame the memory footprint? The closest thing I've come across is a project to reduce the on-disk footprint of the JVM.
I'm sure that the footprint has varied over the past decade or so with every new version of Java. Are there any specific numbers/charts chronicling precisely how much the JVM's footprint has changed?

Some initiatives:
Since 1.5 class data sharing can be used;
Java 6 update 14 brought in compressed oops which reduces the footprint of 64-bit JVMs with under 4GB of Heap.

We have some server-side apps which do nothing but bridge multicast traffic (i.e. they have no permanent state). They all run with about 2.3 - 2.5 Mb of Heap on a 32-bit Java6 (linux) JRE.
Is this a big footprint? I could easily have a thousand of these on a typical server-class machine (from a memory perspective), although that would be bit pointless from a threading perspective!
That said, there is the Jigsaw project to modularize the VM (the libraries I believe) which is coming in Java7; this will help those who wish for smaller footprints.
I realize that this doesn't really answer your question but it is relevant nonetheless! What sort of applications are you designing where you are finding that memory footprint is an issue?

At least one thing is Java's long history - it started in 1995 and is now version 6. Keeping backwards compatibility while adding features inevitably inflates its footprint. The image below tells pretty much...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.