Why is the 64bit JVM faster than the 32bit one? - java

Recently I've been doing some benchmarking of the write performance of my company's database product, and I've found that simply switching to a 64bit JVM gives a consistent 20-30% performance increase.
I'm not allowed to go into much detail about our product, but basically it's a column-oriented DB, optimised for storing logs. The benchmark involves feeding it a few gigabytes of raw logs and timing how long it takes to analyse them and store them as structured data in the DB. The processing is very heavy on both CPU and I/O, although it's hard to say in what ratio.
A few notes about the setup:
Processor: Xeon E5640 2.66GHz (4 core) x 2
RAM: 24GB
Disk: 7200rpm, no RAID
OS: RHEL 6 64bit
Filesystem: Ext4
JVMs: 1.6.0_21 (32bit), 1.6.0_23 (64bit)
Max heap size (-Xmx): 512 MB (for both 32bit and 64bit JVMs)
Constants for both JVMs:
Same OS (64bit RHEL)
Same hardware (64bit CPU)
Max heap size fixed to 512 MB (so the speed increase is not due to the 64bit JVM using a larger heap)
For simplicity I've turned off all multithreading options in our product, so pretty much all processing is happening in a single-threaded manner. (When I turned on multi-threading, of course the system got faster, but the ratio between 32bit and 64bit performance stayed about the same.)
So, my question is... Why would I see a 20-30% speed improvement when using a 64bit JVM? Has anybody seen similar results before?
My intuition up until now has been as follows:
64bit pointers are bigger, so the L1 and L2 caches overflow more easily, so performance on the 64bit JVM is worse.
The JVM uses some fancy pointer compression tricks to alleviate the above problem as much as possible. Details on the Sun site here.
The JVM is allowed to use more registers when running in 64bit mode, which speeds things up slightly.
Given the above three points, I would expect 64bit performance to be slightly slower, or approximately equal to, the 32bit JVM.
Any ideas? Thanks in advance.
Edit: Clarified some points about the benchmark environment.

From: http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#64bit_performance
"Generally, the benefits of being able to address larger amounts of memory come with a small performance loss in 64-bit VMs versus running the same application on a 32-bit VM. This is due to the fact that every native pointer in the system takes up 8 bytes instead of 4. The loading of this extra data has an impact on memory usage which translates to slightly slower execution depending on how many pointers get loaded during the execution of your Java program. The good news is that with AMD64 and EM64T platforms running in 64-bit mode, the Java VM gets some additional registers which it can use to generate more efficient native instruction sequences. These extra registers increase performance to the point where there is often no performance loss at all when comparing 32 to 64-bit execution speed.
The performance difference comparing an application running on a 64-bit platform versus a 32-bit platform on SPARC is on the order of 10-20% degradation when you move to a 64-bit VM. On AMD64 and EM64T platforms this difference ranges from 0-15% depending on the amount of pointer accessing your application performs."

Without knowing your hardware I'm just taking some wild stabs
Your specific CPU may be using microcode to 'emulate' some x86 instructions -- most notably the x87 ISA
x64 uses sse math instead of x87 math, I've noticed a %10-%20 speedup of some math-heavy C++ apps in this case. Math differences could be the real killer if you're using strictfp.
Memory. 64 bits gives you much more address space. Maybe the GC is a little less agressive on 64 bits mode because you have extra RAM.
Is your OS is in 64b mode and running a 32b jvm via some wrapper utility?

The 64-bit instruction set has 8 more registers, this should make the code faster overall.
But, since processsors nowaday mostly wait for memory or disk, i suppose that either the memory subsystem or the disk i/o might be more efficient in 64-bit mode.

My best guess, based on a quick google for 32- vs 64-bit performance charts,
is that 64 bit I/O is more efficient. I suppose you do a lot of I/O...
If memcpy is involved when moving the data, it's probably more efficient to copy longs than ints.

Realize that the 64-bit JVM is not magic pixie dust that makes Java apps
go faster. The 64-bit JVM allows heaps >> 4 GB and, as such, only makes sense
for applications which can take advantage of huge memory on systems which
have it.
Generally there is either a slight improvement (due to certain hardware
optimizations on certain platforms) or minor degradation (due to increased
pointer size). Generally speaking there will be a need for fewer GC's -- but
when they do occur they will likely be longer.
In memory databases or search engines that can use the increased memory
for caching objects and thus avoid IPC or disk accesses will see the biggest
application level improvements. In addition a 64-bit JVM will also
allow you to run many, many more threads than a 32-bit one, because
there's more address space for things like thread stacks, etc. The
maximum number of threads generally for a 32-bit JVM is ~1000but ~100000 threads with a 64-bit JVM.
Some drawbacks though:
Additional issues with the 64-bit JVM are that certain client
oriented features like Java Plug-in and Java Web Start
are not supported. Also any native code would also need
to be compatible (e.g. JNI for things like Type II JDBC drivers).
This is a bonus for pure-Java developers as pure apps should
just run out of the box.
More on this Thread at Java.net

Related

Is There a limit memory usage on JVM? [duplicate]

The theoretical maximum heap value that can be set with -Xmx in a 32-bit system is of course 2^32 bytes, but typically (see: Understanding max JVM heap size - 32bit vs 64bit) one cannot use all 4GB.
For a 64-bit JVM running in a 64-bit OS on a 64-bit machine, is there any limit besides the theoretical limit of 2^64 bytes or 16 exabytes?
I know that for various reasons (mostly garbage collection), excessively large heaps might not be wise, but in light of reading about servers with terrabytes of RAM, I'm wondering what is possible.
If you want to use 32-bit references, your heap is limited to 32 GB.
However, if you are willing to use 64-bit references, the size is likely to be limited by your OS, just as it is with 32-bit JVM. e.g. on Windows 32-bit this is 1.2 to 1.5 GB.
Note: you will want your JVM heap to fit into main memory, ideally inside one NUMA region. That's about 1 TB on the bigger machines. If your JVM spans NUMA regions the memory access and the GC in particular will take much longer. If your JVM heap start swapping it might take hours to GC, or even make your machine unusable as it thrashes the swap drive.
Note: You can access large direct memory and memory mapped sizes even if you use 32-bit references in your heap. i.e. use well above 32 GB.
Compressed oops in the Hotspot JVM
Compressed oops represent managed pointers (in many but not all places in the JVM) as 32-bit values which must be scaled by a factor of 8 and added to a 64-bit base address to find the object they refer to. This allows applications to address up to four billion objects (not bytes), or a heap size of up to about 32Gb. At the same time, data structure compactness is competitive with ILP32 mode.
The answer clearly depends on the JVM implementation. Azul claim that their JVM
can scale ... to more than a 1/2 Terabyte of memory
By "can scale" they appear to mean "runs wells", as opposed to "runs at all".
Windows imposes a memory limit per process, you can see what it is for each version here
See:
User-mode virtual address space for each 64-bit process;
With IMAGE_FILE_LARGE_ADDRESS_AWARE set (default):
x64: 8 TB
Intel IPF: 7 TB
2 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE cleared
I tried -Xmx32255M is accepted by vmargs for compressed oops.
For a 64-bit JVM running in a 64-bit OS on a 64-bit machine, is there any limit besides the theoretical limit of 2^64 bytes or 16 exabytes?
You also have to take hardware limits into account. While pointers may be 64bit current CPUs can only address a less than 2^64 bytes worth of virtual memory.
With uncompressed pointers the hotspot JVM needs a continuous chunk of virtual address space for its heap. So the second hurdle after hardware is the operating system providing such a large chunk, not all OSes support this.
And the third one is practicality. Even if you can have that much virtual memory it does not mean the CPUs support that much physical memory, and without physical memory you will end up swapping, which will adversely affect the performance of the JVM because the GCs generally have to touch a large fraction of the heap.
As other answers mention compressed oops: By bumping the object alignment higher than 8 bytes the limits with compressed oops can be increased beyond 32GB
In theory everything is possible but reality you find the numbers much lower than you might expect.
I have been trying to address huge spaces on servers often and found that even though a server can have huge amounts of memory it surprised me that most software actually never can address it in real scenario's simply because the cpu's are not fast enough to really address them.
Why would you say right ?! . Timings thats the endless downfall of every enormous machine which i have worked on.
So i would advise to not go overboard by addressing huge amounts just because you can, but use what you think could be used.
Actual values are often much lower than what you expected.
Ofcourse non of us really uses hp 9000 systems at home and most of you actually ever will go near the capacity of your home system ever.
For instance most users do not have more than 16 Gb of memory in their system. Ofcourse some of the casual gamers use work stations for a game once a month but i bet that is a very small percentage.
So coming down to earth means i would address on a 8 Gb 64 bit system not much more than 512 mb for heapspace or if you go overboard try 1 Gb. I am pretty sure its even with these numbers pure overkill.
I have constant monitored the memory usage during gaming to see if the addressing would make any difference but did not notice any difference at all when i addressed much lower values or larger ones. Even on the server/workstations there was no visible change in performance no matter how large i set the values.
That does not say some jave users might be able to make use of more space addressed, but this far i have not seen any of the applications needing so much ever.
Ofcourse i assume that their would be a small difference in performance if java instances would run out of enough heapspace to work with.
This far i have not found any of it at all, however lack of real installed memory showed instant drops of performance if you set too much heapspace.
When you have a 4 Gb system you run quickly out of heapspace and then you will see some errors and slowdowns because people address too much space which actually is not free in the system so the os starts to address drive space to make up for the shortage hence it starts to swap.

Difference betweeen jvm on linux and solaris machines

I'm running 2 jboss5.1 server, on linux & solaris machines, with similar jvm (xms & xmx) configurations. But when i check the memory usage on server start:
linux machine -- 2.1gb mem usage (RES)
Solaris machine -- 500mb mem usage
Memory used by jboss process on linux is above 1 gb from the start (even before any class loading starts). When i take dump from linux its size is around 700 mb only.
What could be causing such a difference of memory?
A lot of things could make the difference, and there is not enough information here to know what. For example, are they both 64-bit OS's and 64-bit JVMs? What about the behavior of malloc - that's up to the OS. Just because a process asks for N bytes of memory doesn't mean it immediately gets that much memory - memory allocators can be very clever. Then there is the question of whether it's actually an apples-to-apples measurement in terms of how the OS reports it.
"Memory usage" means a lot of things. Are we talking about the Java heap (if you take heap dumps of both VMs after startup and an identical priming bit of work, are they the same size or different?), or that plus class data, etc.? You also have hotspot in the picture, compiling Java bytecode into native code that will be different between the two OS's (maybe very different sizes if your Solaris box is a Sparc machine)
The most likely thing is 64-bit vs. 32-bit, but it's impossible to say. You might use some native profiling tools on each to see what calls are allocating memory - that would start to clarify things.
Unless it's causing a problem, it's probably not something to worry about - but healthy curiosity is a good thing.

Should I use 32-bit or 64-bit JDK?

Years ago, I tried 64-bit JDK but it was really buggy.
How stable would you say it is now? Would you recommend it? Should I install 64-bit JDK + eclipse or stick to 32-bit? Also, are there any advantages of 64-bit over 32-bit other than bypassing the 4 GB memory limit?
Only begin to bother with that if you want to build an application that will use a lot of memory (namely, a heap larger than 2GB).
Allow me to quote Red Hat:
The real advantage of the 64-bit JVM is that heap sizes much larger than 2GB can be used. Large page memory with the 64-bit JVM give further optimizations. The following graph shows the results running from 4GB heap sizes, in two gigabyte increments, up to 20GB heap.
That, in a pretty graph:
See, no big difference.
See more (more graphs, yay!) in: Java Virtual Machine Tuning
I think the answer can be found pretty simple.
Answer the question: "Do I need more than 4GB of RAM?".
A 64 bit JVM is as stable as a 32 bit JVM. There are no differences. In fact a Java Application running in a 64 bit JVM will consume more RAM compared to a 32 bit JVM. All internal datastructures will need more RAM.
My eclipse is running in a 64bit JVM.
Are you going to deploy to a 32 or a 64 bit environment? If you're affinitized to an environment in production then your development environment should use the same environment type.
If you're platform agnostic, go with x64 and don't even think about it. The platform is mature and stable. It gives you tremendous room to scale up as you can just add memory and make your heaps bigger.
Nobody wants to tell a CEO, "Sorry, we chose x86 and can't scale up like we hoped. It's a month long project project to retest and replatform everything for x64."
The only differences between 32-bit and 64-bit builds of any program are the sizes of machine words, the amount of addressable memory, and the Operating System ABI in use. With Java, the language specification means that the differences in machine word size and OS ABI should not matter at all unless you're using native code as well. (Native code must be built to be the same as the word-size of the JVM that will load it; you can't mix 32-bit and 64-bit builds in the same process without very exotic coding indeed, and you shouldn't be doing that with Java about.)
The 64-bitter uses 64-bit pointers. If you have 4GB+ RAM, and are running Java programs that keep 4GB+ of data structures in memory, the 64-bitter will accommodate that. The big fat pointers can point to any byte in a 4GB+ memory space.
But if your programs use less memory, and you run the 64-bit JVM, pointers in will still occupy 64 bits (8 bytes) each. This will cause data structures to be bigger, which will eat up memory unnecessarily.
I just compiled a MQTT client in both the 32-bit JDK (jdk-8u281-windows-i586) and the 64-bit JDK (jdk-8u281-windows-x64). The class files produced had matching MD5 checksums.
FYI, it's perfectly safe to have multiple JDKs on your system. But if the version you use is important, you should be comfortable with setting your system path and JAVA_HOME to ensure the correct version is used.

What is the largest possible heap size with a 64-bit JVM?

The theoretical maximum heap value that can be set with -Xmx in a 32-bit system is of course 2^32 bytes, but typically (see: Understanding max JVM heap size - 32bit vs 64bit) one cannot use all 4GB.
For a 64-bit JVM running in a 64-bit OS on a 64-bit machine, is there any limit besides the theoretical limit of 2^64 bytes or 16 exabytes?
I know that for various reasons (mostly garbage collection), excessively large heaps might not be wise, but in light of reading about servers with terrabytes of RAM, I'm wondering what is possible.
If you want to use 32-bit references, your heap is limited to 32 GB.
However, if you are willing to use 64-bit references, the size is likely to be limited by your OS, just as it is with 32-bit JVM. e.g. on Windows 32-bit this is 1.2 to 1.5 GB.
Note: you will want your JVM heap to fit into main memory, ideally inside one NUMA region. That's about 1 TB on the bigger machines. If your JVM spans NUMA regions the memory access and the GC in particular will take much longer. If your JVM heap start swapping it might take hours to GC, or even make your machine unusable as it thrashes the swap drive.
Note: You can access large direct memory and memory mapped sizes even if you use 32-bit references in your heap. i.e. use well above 32 GB.
Compressed oops in the Hotspot JVM
Compressed oops represent managed pointers (in many but not all places in the JVM) as 32-bit values which must be scaled by a factor of 8 and added to a 64-bit base address to find the object they refer to. This allows applications to address up to four billion objects (not bytes), or a heap size of up to about 32Gb. At the same time, data structure compactness is competitive with ILP32 mode.
The answer clearly depends on the JVM implementation. Azul claim that their JVM
can scale ... to more than a 1/2 Terabyte of memory
By "can scale" they appear to mean "runs wells", as opposed to "runs at all".
Windows imposes a memory limit per process, you can see what it is for each version here
See:
User-mode virtual address space for each 64-bit process;
With IMAGE_FILE_LARGE_ADDRESS_AWARE set (default):
x64: 8 TB
Intel IPF: 7 TB
2 GB with IMAGE_FILE_LARGE_ADDRESS_AWARE cleared
I tried -Xmx32255M is accepted by vmargs for compressed oops.
For a 64-bit JVM running in a 64-bit OS on a 64-bit machine, is there any limit besides the theoretical limit of 2^64 bytes or 16 exabytes?
You also have to take hardware limits into account. While pointers may be 64bit current CPUs can only address a less than 2^64 bytes worth of virtual memory.
With uncompressed pointers the hotspot JVM needs a continuous chunk of virtual address space for its heap. So the second hurdle after hardware is the operating system providing such a large chunk, not all OSes support this.
And the third one is practicality. Even if you can have that much virtual memory it does not mean the CPUs support that much physical memory, and without physical memory you will end up swapping, which will adversely affect the performance of the JVM because the GCs generally have to touch a large fraction of the heap.
As other answers mention compressed oops: By bumping the object alignment higher than 8 bytes the limits with compressed oops can be increased beyond 32GB
In theory everything is possible but reality you find the numbers much lower than you might expect.
I have been trying to address huge spaces on servers often and found that even though a server can have huge amounts of memory it surprised me that most software actually never can address it in real scenario's simply because the cpu's are not fast enough to really address them.
Why would you say right ?! . Timings thats the endless downfall of every enormous machine which i have worked on.
So i would advise to not go overboard by addressing huge amounts just because you can, but use what you think could be used.
Actual values are often much lower than what you expected.
Ofcourse non of us really uses hp 9000 systems at home and most of you actually ever will go near the capacity of your home system ever.
For instance most users do not have more than 16 Gb of memory in their system. Ofcourse some of the casual gamers use work stations for a game once a month but i bet that is a very small percentage.
So coming down to earth means i would address on a 8 Gb 64 bit system not much more than 512 mb for heapspace or if you go overboard try 1 Gb. I am pretty sure its even with these numbers pure overkill.
I have constant monitored the memory usage during gaming to see if the addressing would make any difference but did not notice any difference at all when i addressed much lower values or larger ones. Even on the server/workstations there was no visible change in performance no matter how large i set the values.
That does not say some jave users might be able to make use of more space addressed, but this far i have not seen any of the applications needing so much ever.
Ofcourse i assume that their would be a small difference in performance if java instances would run out of enough heapspace to work with.
This far i have not found any of it at all, however lack of real installed memory showed instant drops of performance if you set too much heapspace.
When you have a 4 Gb system you run quickly out of heapspace and then you will see some errors and slowdowns because people address too much space which actually is not free in the system so the os starts to address drive space to make up for the shortage hence it starts to swap.

Java: How does the VM handle a 64bit `long` on a 32bit processor

How does the JVM handle a primitive "long", which is 64bits, on a 32bit processor?
Can it utilise mulitple cores in parallel when on a Multi-Core 32bit machine?
How much slower are 64bit operations on a 32bit machine?
It may use multiple cores to run different threads, but it does not use them in parallel for 64 bit calculations. A 64 bit long is basically stored as two 32 bit ints. In order to add them, two additions are needed, keeping track of the carry bit. Multiplication is kind of like multiplying two two-digit numbers, except each digit is in base 2^32 instead of base 10. So on for other arithmetic operations.
Edit about speed: I can only guess about the speed difference. An addition requires two adds instead of one, and a multiplication would (I think) require four multiplies instead of one. However, I suspect that if everything can be kept in registers then the actual time for the computation would be dwarfed by the time required to go to memory twice for the read and twice for the write, so my guess is about twice as long for most operations. I imagine that it would depend on the processor, the particular JVM implementation, the phase of the moon, etc. Unless you are doing heavy number crunching, I wouldn't worry about it. Most programs spend most of their time waiting for IO to/from the disk or network.
From TalkingTree, and the Java HotSpot FAQ:
Generally, the benefits of being able to address larger amounts of memory come with a small performance loss in 64-bit VMs versus running the same application on a 32-bit VM. This is due to the fact that every native pointer in the system takes up 8 bytes instead of 4. The loading of this extra data has an impact on memory usage which translates to slightly slower execution depending on how many pointers get loaded during the execution of your Java program.
The good news is that with AMD64 and EM64T platforms running in 64-bit mode, the Java VM gets some additional registers which it can use to generate more efficient native instruction sequences. These extra registers increase performance to the point where there is often no performance loss at all when comparing 32 to 64-bit execution speed.
The performance difference comparing an application running on a 64-bit platform versus a 32-bit platform on SPARC is on the order of 10-20% degradation when you move to a 64-bit VM. On AMD64 and EM64T platforms this difference ranges from 0-15% depending on the amount of pointer accessing your application performs.

Categories