Java - Heap vs Direct memory access

Java - Heap vs Direct memory access - java

I recenty came across sun.misc.Unsafe class, allowing user to allocate,deallocate and in general access memory in a similar fashion like in C. I read in a couple of blogs that tackle this issue e.g.
Which is faster - heap or direct memory - test results claim heap
Off-heap memory vs DirectByteBuffer vs Heap - Off-heap seems to be fastest
Memory mapped files for time series data - MappedByteBuffer faster than heap objects
Article 1) seems to be in contradiction with the other ones and I fail to comprehend why. DirectMemoryBuffer is using sun.misc.Unsafe under the hood (so is MappedByteBuffer), so they should also suffer from JNI calls as described in article 1. Also, in article 2, the Off-heap memory accesses resemble the ones in article 1, and give completely opposite results.
Could someone generally comment on how to proceed with Off-heap memory i.e. when to use it, is there a significant benefit to it, and most importantly, why similar subject gives highly different results based on the articles above? Thanks.

1). Working with Native memory from Java has its usages such as when you
need to work with large amounts of data (> 2 gigabytes) or when you
want to escape from the garbage collector. However in terms of
latency, direct memory access from the JVM is not faster than
accessing the heap as demonstrated above. The results actually make
sense since crossing the JVM barrier must have a cost. That’s the
same dilema between using a direct or a heap ByteBuffer. The speed
advantage of the direct ByteBuffer is not access speed but the
ability to talk directly with the operating system’s native I/O
operations. Another great example discussed by Peter Lawrey is the
use of memory-mapped files when working with time-series.
Source: http://mentablog.soliveirajr.com/2012/11/which-one-is-faster-java-heap-or-native-memory/
2). Off heap via Unsafe is blazing fast with 330/11200 Million/Sec.
Performance for all other types of allocation is either good for read or write, none of the allocation is good for both.
Special note about ByteBuffer, it is pathetic , i am sure you will not use this after seeing such number. DirectBytebuffer sucks in read speed, i am not sure why it is so slow.So if memory read/write is becoming bottle neck in your system then definitely Off-heap is the way to go, remember it is highway, so drive with care.
Soruce: http://www.javacodegeeks.com/2013/08/which-memory-is-faster-heap-or-bytebuffer-or-direct.html

Related

Java collections faster than c++ containers?

I was reading the comments on this answer and I saw this quote.
Object instantiation and object-oriented features are blazing fast to use (faster than C++ in many cases) because they're designed in from the beginning. and Collections are fast. Standard Java beats standard C/C++ in this area, even for most optimized C code.
One user (with really high rep I might add) boldly defended this claim, stating that
heap allocation in java is better than C++'s
and added this statement defending the collections in java
And Java collections are fast compared to C++ collections due largely to the different memory subsystem.
So my question is can any of this really be true, and if so why is java's heap allocation so much faster.

This sort of statement is ridiculous; people making it are
either incredibly uninformed, or incredibly dishonest. In
particular:
The speed of dynamic memory allocation in the two cases will
depend on the pattern of dynamic memory use, as well as the
implementation. It is trivial for someone familiar with the
algorithms used in both cases to write a benchmark proving which
ever one he wanted to be faster. (Thus, for example, programs
using large, complex graphs that are build, then torn down and
rebuilt, will typically run faster under garbage collection. As
will programs that never use enough dynamic memory to trigger
the collector. Programs using few, large, long lived
allocations will often run faster with manual memory
management.)
When comparing the collections, you have to consider what is
in the collections. If you're comparing large vectors of
double, for example, the difference between Java and C++ will
likely be slight, and could go either way. If you're comparing
large vectors of Point, where Point is a value class containing
two doubles, C++ will probably blow Java out of the water,
because it uses pure value semantics (with no additional dynamic
allocation), where as Java needs to dynamically allocate each
Point (and no dynamic allocation is always faster than even
the fastest dynamic allocation). If the Point class in Java
is correctly designed to act as a value (and thus immutable,
like java.lang.String), then doing a translation on the
Point in a vector will require a new allocation for every
Point; in C++, you could just assign.
Much depends on the optimizer. In Java, the optimizer works
with perfect knowledge of the actual use cases, in this
particular run of the program, and perfect knowledge of the
actual processor it is running on, in this run. In C++, the
optimizer must work with data from a profiling run, which will
never correspond exactly to any one run of the program, and the
optimizer must (usually) generate code that will run (and run
quickly) on a wide variety of processor versions. On the other
hand, the C++ optimizer may take significantly more time
analysing the different paths (and effective optimization can
require a lot of CPU); the Java optimizer has to be fairly
quick.
Finally, although not relevant to all applications, C++ can be
single threaded. In which case, no locking is needed in the
allocator, which is never the case in Java.
With regards to the two numbered points: C++ can use more or
less the same algorithms as Java in its heap allocator. I've
used C++ programs where the ::operator delete() function was
empty, and the memory was garbage collected. (If your
application allocates lots of short lived, small objects, such
an allocator will probably speed things up.) And as for the
second: the really big advantage C++ has is that its memory
model doesn't require everything to be dynamically allocated.
Even if allocation in Java takes only a tenth of the time it
would take in C++ (which could be the case, if you only count
the allocation, and not the time needed for the collector
sweeps), with large vectors of Point, as above, you're
comparing two or three allocations in C++ with millions of
allocations in Java.
And finally: "why is Java's heap allocation so much faster?" It
isn't, necessarily, if you amortise the time for the
collection phases. The time for the allocation itself can be
very cheap, because Java (or at least most Java implementations)
use a relocating collector, which results in all of the free
memory being in a single contiguous block. This is at least
partially offset by the time needed in the collector: to get
that contiguity, you've got to move data, which means a lot of
copying. In most implementations, it also means an additional
indirection in the pointers, and a lot of special logic to avoid
issues when one thread has the address in a register, or such.

Your questions don't have concrete answers. For example, C++ does not define memory management at all. It leaves allocation details up to the library implementation. Therefore, within the bounds of C++, a given platform may have a very slow heap allocation scheme, and Java would certainly be faster if it bypasses that. On another platform, memory allocations may be blazing fast, outperforming Java. As James Kanze pointed out, Java also places very little constraints on memory management (e.g. even the GC algorithm is entirely up to the JVM implementor). Because Java and C++ do not place constraints on memory management, there is no concrete answer to that question. C++ is purposefully open about underlying hardware and kernel functions, and Java is purposefully open about JVM memory management. So the question becomes very fuzzy.
You may find that some operations are faster in Java, and some not. You never know until you try, however:
In practice, the real differences lie in your higher level algorithms and implementations. For all but the most absolutely performance critical applications, the differences in performance of identical data structures in different languages is completely negligible compared to the performance characteristics of the algorithm itself. Concentrate on optimizing your higher level implementations. Only after you have done so, and after you have determined that your performance requirements are not being met, and after you have benchmarked and found (unlikely) that your bottleneck is in container implementations, should you start to think of things like this.
In general, as soon as you find yourself thinking or reading about C++ vs. Java issues, stop and refocus on something productive.

Java heap is faster because (simplified) all you need to do to allocate is to increase heap top pointer (just like on stack). It is possible because heap is periodically compacted. So your price for speed is:
Periodic GC pauses for heap compacting
Increased memory usage
There is no free cheese... So while collection operations may be fast, it is amortized by overall slowing down during GC work.

While I am a fan of Java, it is worth noting that C++ supports allocation of objects on the stack which is faster than heap allocation.
If you use C++ efficiently with all it various ways of doing the same thing, it will be faster than Java (even if it takes you longer to find that optimal combination)
If you program in C++ as you would in Java, e.g. everything on the heap, all methods virtual, have lots of runtime checks which don't do anything and can be optimised away dynamically, it will be slower. Java has optimised these things further as these a) are the only thing Java does, b) can be optimised dynamically more efficiently, c) Java has less features and side effects so it is easier for optimiser for get decent speeds.

and Collections are fast. Standard Java beats standard C/C++ in this area, even for most optimized C code.
This may be true for particular collections, but most certainly isn't true for all collections in all usage patterns.
For instance, a java.util.HashMap will outperform a std:map, because the latter is required to be sorted. That is, the fastest Map in the Java Standard Library is faster that the fastest Map in the C++ one (at least prior to C++11, which added the std:unordered_map)
On the other side, a std:Vector<int> is far more efficient that an java.util.ArrayList<Integer> (due to type erasure, you can't use a java.util.ArrayList<int>, and therefore end up with about 4 times the memory consumption, and possibly poorer cache locality, and correspondingly slower iteration).
In short, like most sweeping generalizations, this one doesn't always apply. However, neither would the opposite assertion (that Java is always slower than C++). It really depends on the details, such as how you use the collection, or even which versions of the languages you compare).

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
Something along the lines of http://wwwasd.web.cern.ch/wwwasd/lhc++/Objectivity/V5.2/Java/guide/jgdStorage.fm.html and specifically non-garbage-collectible containers there (non-garbage-collectable?).
The problem is that I have lots of ordinary temporary objects, but I have even bigger (several Gigs) of objects that are stored for Cache purposes. For no reason should the Java GC traverse all those Cache gigabytes trying to find anything to collect, because they contain cached data which have their own timeouts.
This way I could partition my data in a custom way into infinite-lived and normal-lived objects, and hopefully GC would be quite fast because normal objects don't live so long and amount to smaller amounts.
There are some workarounds to this problem, such as Apache DirectMemory and Commercial Terracotta BigMemory(http://terracotta.org/products/bigmemory), but a java-native solution would be nicer (I mean free and probably more reliable?). Also I want to avoid serialization overhead which means it should happen within same jvm. To my understanding DirectMemory and BigMemory operate mainly off heap which means that the objects must be serialized/deserialized to/from memory outside jvm. Simply marking non-gc regions within the jvm would seem a better solution. Using Files for cache is not an option either, it has the same unaffordable serialization/deserialization overhead - use case is a HA server with lots of data used in random (human) order and low latency needed.

Any memory the JVM manages is also garbage-collected by the JVM. And any “live” objects which are directly available to Java methods without deserialization have to live in JVM memory. Therefore in my understanding you cannot have live objects which are immune to garbage collection.
On the other hand, the usage you describe should make the generational approach to garbage collection quite efficient. If your big objects stay around for a while, they will be checked for reclamation less often. So I doubt there is much to be gained from avoiding those checks.

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
No it is not possible.
You can prevent objects from being garbage collected by keeping them reachable, but the GC will still need to trace them to check reachability on each full; GC (at least).
Is simply my assumption, that when the jvm is starving it begins scanning all those unnecessary objects too.
Yes. That is correct. However, unless you've got LOTS of objects that you want to be treated this way, the overhead is likely to be insignificant. (And anyway, a better idea is to give the JVM more memory ... if that is possible.)

Quite simply, for you to be able to do this, the garbage collection algorithm would need to be aware of such a flag, and take it into account when doing its work.
I'm not aware of any of the standard GC algorithms having such a flag, so for this to work you would need to write your own GC algorithm (after deciding on some feasible way to communicate this information to it).
In principle, in fact, you've already started down this track - you're deciding how garbage collection should be done rather than being happy to leaving it to the JVM's GC algo. Is the situation you describe a measurable problem for you; something for which the existing garbage collection is insufficient, but your plan would work? Garbage collectors are extremely well-tuned, so I wouldn't be surprised if the "inefficient" default strategy is actually faster than your naively-optimal one.
(Doing manual memory management is tricky and error-prone at the best of times; managing some memory yourself while using a stock garbage collector to handle the rest seems even worse. I expect you'd run into a lot of edge cases where the GC assumes it "knows" what's happening with the whole heap, which would no longer be true. Steer clear if you can...)

The recommended approaches would be to use either a commerical RTSJ implementation to avoid GC, or to use off heap memory. One could also look into soft references for caches as well (they do get collected).
This is not recommended:
If for some reason you do not believe these options are sufficient, you could look into direct memory access which is UNSAFE (part of sun.misc.Unsafe). You can use the 'theUnsafe' field to get the 'Unsafe' instance. Unsafe allows to allocation/deallocate memory via 'allocateMemory' and 'freeMemory'. This is not under GC control nor limited by JVM heap size. The impact on GC/application, once you go down this route, is not guaranteed - which is why using byte buffers might be the way to go (if you're not using a RTSJ like implementation).
Hope this helps.

Living Java objects will always be part of the GC life cycle. Or said another way, marking an object to be non-gc is the same order of overhead than having your object referenced by a root reference (a static final map for instance).
But thinking a bit further, data put in a cache are most likely to be temporary, and would eventually be evicted. At that point you will start again to like the JVM and the GC.
If you have 100's of GBs of permanent data, you may want to rethink the architecture of your application, and try to shard and distribute your data (horizontally scalability).
Last but not least, lots of work has been done around serialization, and the overhead of serialization (I'm not speaking about the poor reputation of ObjectInputStream and ObjectOutputStream) is not that big.
More than that, if your data is mainly composed of primitive types (including bytes array), there is efficient way to readInt() or readBytes() from off heap buffers (for instannce netty.io's ChannelBuffer). This could be a way to go.

How to memory profile in Java?

I'm still learning the ropes of Java so sorry if there's a obvious answer to this. I have a program that is taking a ton of memory and I want to figure a way to reduce its usage, but after reading many SO questions I have the idea that I need to prove where the problem is before I start optimizing it.
So here's what I did, I added a break point to the start of my program and ran it, then I started visualVM and had it profile the memory(I also did the same thing in netbeans just to compare the results and they are the same). My problem is I don't know how to read them, I got the highest area just saying char[] and I can't see any code or anything(which makes sense because visualvm is connecting to the jvm and can't see my source, but netbeans also does not show me the source as it does when doing cpu profiling).
Basically what I want to know is which variable(and hopefully more details like in which method) all the memory is being used so I can focus on working there. Is there a easy way to do this? I right now I am using eclipse and java to develop(and installed visualVM and netbeans specifically for profiling but am willing to install anything else that you feel gets this job done).
EDIT: Ideally, I'm looking for something that will take all my objects and sort them by size(so I can see which one is hogging memory). Currently it returns generic information such as string[] or int[] but I want to know which object its referring to so I can work on getting its size more optimized.

Strings are problematic
Basically in Java, String references ( things that use char[] behind the scenes ) will dominate most business applications memory wise. How they are created determines how much memory they consume in the JVM.
Just because they are so fundamental to most business applications as a data type, and they are one of the most memory hungry as well. This isn't just a Java thing, String data types take up lots of memory in pretty much every language and run time library, because at the least they are just arrays of 1 byte per character or at the worse ( Unicode ) they are arrays of multiple bytes per character.
Once when profiling CPU usage on a web app that also had an Oracle JDBC dependency I discovered that StringBuffer.append() dominated the CPU cycles by many orders of magnitude over all other method calls combined, much less any other single method call. The JDBC driver did lots and lots of String manipulation, kind of the trade off of using PreparedStatements for everything.
What you are concerned about you can't control, not directly anyway
What you should focus on is what in in your control, which is making sure you don't hold on to references longer than you need to, and that you are not duplicating things unnecessarily. The garbage collection routines in Java are highly optimized, and if you learn how their algorithms work, you can make sure your program behaves in the optimal way for those algorithms to work.
Java Heap Memory isn't like manually managed memory in other languages, those rules don't apply
What are considered memory leaks in other languages aren't the same thing/root cause as in Java with its garbage collection system.
Most likely in Java memory isn't consumed by one single uber-object that is leaking ( dangling reference in other environments ).
It is most likely lots of smaller allocations because of StringBuffer/StringBuilder objects not sized appropriately on first instantantations and then having to automatically grow the char[] arrays to hold subsequent append() calls.
These intermediate objects may be held around longer than expected by the garbage collector because of the scope they are in and lots of other things that can vary at run time.
EXAMPLE: the garbage collector may decide that there are candidates, but because it considers that there is plenty of memory still to be had that it might be too expensive time wise to flush them out at that point in time, and it will wait until memory pressure gets higher.
The garbage collector is really good now, but it isn't magic, if you are doing degenerate things, it will cause it to not work optimally. There is lots of documentation on the internet about the garbage collector settings for all the versions of the JVMs.
These un-referenced objects may just have not reached the time that the garbage collector thinks it needs them to for them to be expunged from memory, or there could be references to them held by some other object ( List ) for example that you don't realize still points to that object. This is what is most commonly referred to as a leak in Java, which is a reference leak more specifically.
EXAMPLE: If you know you need to build a 4K String using a StringBuilder create it with new StringBuilder(4096); not the default, which is like 32 and will immediately start creating garbage that can represent many times what you think the object should be size wise.
You can discover how many of what types of objects are instantiated with VisualVM, this will tell you what you need to know. There isn't going to be one big flashing light that points at a single instance of a single class that says, "This is the big memory consumer!", that is unless there is only one instance of some char[] that you are reading some massive file into, and this is not possible either, because lots of other classes use char[] internally; and then you pretty much knew that already.
I don't see any mention of OutOfMemoryError
You probably don't have a problem in your code, the garbage collection system just might not be getting put under enough pressure to kick in and deallocate objects that you think it should be cleaning up. What you think is a problem probably isn't, not unless your program is crashing with OutOfMemoryError. This isn't C, C++, Objective-C, or any other manual memory management language / runtime. You don't get to decide what is in memory or not at the detail level you are expecting you should be able to.

In JProfiler, you can take go to the heap walker and activate the biggest objects view. You will see the objects the retain most memory. "Retained" memory is the memory that would be freed by the garbage collector if you removed the object.
You can then open the object nodes to see the reference tree of the retained objects. Here's a screen shot of the biggest object view:
Disclaimer: My company develops JProfiler

I would recommend capturing heap dumps and using a tool like Eclipse MAT that lets you analyze them. There are many tutorials available. It provides a view of the dominator tree to provide insight into the relationships between the objects on the heap. Specifically for what you mentioned, the "path to GC roots" feature of MAT will tell you where the majority of those char[], String[] and int[] objects are being referenced. JVisualVM can also be useful in identifying leaks and allocations, particularly by using snapshots with allocation stack traces. There are quite a few walk-throughs of the process of getting the snapshots and comparing them to find the allocation point.

Java JDK comes with JVisualVM under bin folder, once your application server (for example is running) you can run visualvm and connect it to your localhost, which will provide you memory allocation and enable you to perform heap dump

If you use visualVM to check your memory usage, it focuses on the data, not the methods. Maybe your big char[] data is caused by many String values? Unless you are using recursion, the data will not be from local variables. So you can focus on the methods that insert elements into large data structures. To find out what precise statements cause your "memory leakage", I suggest you additionally
read Josh Bloch's Effective Java Item 6: (Eliminate obsolete object references)
use a logging framework an log instance creations on the highest verbosity level.

There are generally two distinct approaches to analyse Java code to gain an understanding of its memory allocation profile. If you're trying to measure the impact of a specific, small section of code – say you want to compare two alternative implementations in order to decide which one gives better runtime performance – you would use a microbenchmarking tool such as JMH.
While you can pause the running program, the JVM is a sophisticated runtime that performs a variety of housekeeping tasks and it's really hard to get a "point in time" snapshot and an accurate reading of the "level of memory usage". It might allocate/free memory at a rate that does not directly reflect the behaviour of the running Java program. Similarly, performing a Java object heap dump does not fully capture the low-level machine specific memory layout that dictates the actual memory footprint, as this could depend on the machine architecture, JVM version, and other runtime factors.
Tools like JMH get around this by repeatedly running a small section of code, and observing a long-running average of memory allocations across a number of invocations. E.g. in the GC profiling sample JMH benchmark the derived *·gc.alloc.rate.norm metric gives a reasonably accurate per-invocation normalised memory cost.
In the more general case, you can attach a profiler to a running application and get JVM-level metrics, or perform a heap dump for offline analysis. Some commonly used tools for profiling full applications are Async Profiler and the newly open-sourced Java Flight Recorder in conjunction with Java Mission Control to visualise results.

Java allocation : allocating objects from a pre-existing/allocated pool

In a Java program when it is necessary to allocate thousands of similar-size objects, it would be better (in my mind) to have a "pool" (which is a single allocation) with reserved items that can be pulled from when needed. This single large allocation wouldn't fragment the heap as much as thousands of smaller allocations.
Obviously, there isn't a way to specifically point an object reference to an address in memory (for its member fields) to set up a pool. Even if the new object referenced an area of the pool, the object itself would still need to be allocated. How would you handle many allocations like this without resorting to native OS libraries?

You could try using the Commons Pool library.
That said, unless I had proof the JVM wasn't doing what I needed, I'd probably hold off on optimizing object creation.

Don't worry about it. Unless you have done a lot of testing and analysis on the actual code being run and know that it is a problem with garbage collection and that the JVM isn't doing a good enough job, spend your time elsewhere.

If you are building an application, where a predictable response time is very important, then pooling of objects, no matter how small they are will pay you dividends. Again, pooling is also a factor of how big of a data set you are trying to pool and how much physical memory your machine has.
There is ample proof on the web that shows that object pooling, no matter how small the objects are, is beneficial for application performance.
There are two levels of pooling you could do:
Pooling of the basic objects such as Vectors, which you retrieve from the pool each time you have to use the vector to form a map or such.
Have the higher level composite objects pooled, which are most commonly used, pooled.
This is generally an application design decision.
Also, in a multi-threaded application, you would like to be sensitive about how many different threads are going to be allocating and returning to the pool. You certainly do not want your application to be bogged down by contention - especially if you are dealing with thousands of objects at the same time.

#Dave and Casey, you don't need any proof to show that contiguous memory layout improves Cache efficiency, which is the major bottleneck in most OOP apps that need high performance but follow a "too idealistic" OOP-design trajectory.
People often think of the GC as the culprit causing low performance in high performance Java applications and after fixing it, just leave it at that, without actually profiling memory-behavior of the application. Note though that un-cached memory instructions are inherently more expensive than arithmetic instructions (and are getting more and more expensive due to the memory access <-> computation gap). So if you care about performance, you should certainly care about memory management.
Cache-aware, or more general, data-oriented programming, is the key to achieving high performance in many kinds of applications, such as games, or mobile apps (to reduce power consumption).
Here is a SO thread on DOP.
Here is a slideshow from the Sony R&D department that shows the usefulness of DOP as applied to a playstation game (high performance required).
So how to solve the problem that Java, does not, in general allow you to allocate a chunk of memory? My guess is that when the program is just starting, you can assume that there is very little internal fragmentation in the already allocated pages. If you now have a loop that allocates thousands or millions of objects, they will probably all be as contiguous as possible. Note that you only need to make sure that consecutive objects stretch out over the same cacheline, which in many modern systems, is only 64 bytes. Also, take a look at the DOP slides, if you really care about the (memory-) performance of your application.
In short: Always allocate multiple objects at once (increase temporal locality of allocation), and, if your GC has defragmentation, run it beforehand, else try to reduce such allocations to the beginning of your program.
I hope, this is of some help,
-Domi
PS: #Dave, the commons pool library does not allocate objects contiguously. It only keeps track of the allocations by putting them into a reference array, embedded in a stack, linked list, or similar.

Is Java Native Memory Faster than the heap?

I'm exploring options to help my memory-intensive application, and in doing so I came across Terracotta's BigMemory. From what I gather, they take advantage of non-garbage-collected, off-heap "native memory," and apparently this is about 10x slower than heap-storage due to serialization/deserialization issues. Prior to reading about BigMemory, I'd never heard of "native memory" outside of normal JNI. Although BigMemory is an interesting option that warrants further consideration, I'm intrigued by what could be accomplished with native memory if the serialization issue could be bypassed.
Is Java native memory faster (I think this entails ByteBuffer objects?) than traditional heap memory when there are no serialization issues (for instance if I am comparing it with a huge byte[])? Or do the vagaries of garbage collection, etc. render this question unanswerable? I know "measure it" is a common answer around here, but I'm afraid I would not set up a representative test as I don't yet know enough about how native memory works in Java.

Direct memory is faster when performing IO because it avoid one copy of the data. However, for 95% of application you won't notice the difference.
You can store data in direct memory, however it won't be faster than storing data POJOs. (or as safe or readable or maintainable) If you are worried about GC, try creating your objects (have to be mutable) in advance and reuse them without discarding them. If you don't discard your objects, there is nothing to collect.
Is Java native memory faster (I think this entails ByteBuffer objects?) than traditional heap memory when there are no serialization issues (for instance if I am comparing it with a huge byte[])?
Direct memory can be faster than using a byte[] if you use use non bytes like int as it can read/write the whole four bytes without turning the data into bytes.
However it is slower than using POJOs as it has to bounds check every access.
Or do the vagaries of garbage collection, etc. render this question unanswerable?
The speed has nothing to do with the GC. The GC only matters when creating or discard objects.
BTW: If you minimise the number of object you discard and increase your Eden size, you can prevent even minor collection occurring for a long time e.g. a whole day.

The point of BigMemory is not that native memory is faster, but rather, it's to reduce the overhead of the garbage collector having to go through the effort of tracking down references to memory and cleaning it up. As your heap size increases, so do your GC intervals and CPU commitment. Depending upon the situation, this can create a sort of "glass ceiling" where the Java heap gets so big that the GC turns into a hog, taking up huge amounts of processor power each time the GC kicks in. Also, many GC algorithms require some level of locking that means nobody can do anything until that portion of the GC reference tracking algorithm finishes, though many JVM's have gotten much better at handling this. Where I work, with our app server and JVM's, we found that the "glass ceiling" is about 1.5 GB. If we try to configure the heap larger than that, the GC routine starts eating up more than 50% of total CPU time, so it's a very real cost. We've determined this through various forms of GC analysis provided by our JVM vendor.
BigMemory, on the other hand, takes a more manual approach to memory management. It reduces the overhead and sort of takes us back to having to do our own memory cleanup, as we did in C, albeit in a much simpler approach akin to a HashMap. This essentially eliminates the need for a traditional garbage collection routine, and as a result, we eliminate that overhead. I believe that the Terracotta folks used native memory via a ByteBuffer as it's an easy way to get out from under the Java garbage collector.
The following whitepaper has some good info on how they architected BigMemory and some background on the overhead of the GC: http://www.terracotta.org/resources/whitepapers/bigmemory-whitepaper.

I'm intrigued by what could be accomplished with native memory if the serialization issue could be bypassed.
I think that your question is predicated on a false assumption. AFAIK, it is impossible to bypass the serialization issue that they are talking about here. The only thing you could do would be to simplify the objects that you put into BigMemory and use custom serialization / deserialization code to reduce the overheads.
While benchmarks might give you a rough idea of the overheads, the actual overheads will be very application specific. My advice would be:
Only go down this route if you know you need to. (You will be tying your application to a particular implementation technology.)
Be prepared for some intrusive changes to your application if the data involved isn't already managed using as a cache.
Be prepared to spend some time in (re-)tuning your caching code to get good performance with BigMemory.
If your data structures are complicated, expect a proportionately larger runtime overheads and tuning effort.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.