Best practice for creating millions of small temporary objects

Best practice for creating millions of small temporary objects - java

What are the "best practices" for creating (and releasing) millions of small objects?
I am writing a chess program in Java and the search algorithm generates a single "Move" object for each possible move, and a nominal search can easily generate over a million move objects per second. The JVM GC has been able to handle the load on my development system, but I'm interested in exploring alternative approaches that would:
Minimize the overhead of garbage collection, and
reduce the peak memory footprint for lower-end systems.
A vast majority of the objects are very short-lived, but about 1% of the moves generated are persisted and returned as the persisted value, so any pooling or caching technique would have to provide the ability to exclude specific objects from being re-used.
I don't expect fully-fleshed out example code, but I would appreciate suggestions for further reading/research, or open source examples of a similar nature.

Run the application with verbose garbage collection:
java -verbose:gc
And it will tell you when it collects. There would be two types of sweeps, a fast and a full sweep.
[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]
The arrow is before and after size.
As long as it is just doing GC and not a full GC you are home safe. The regular GC is a copy collector in the 'young generation', so objects that are no longer referenced are simply just forgotten about, which is exactly what you would want.
Reading Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning is probably helpful.

Since version 6, the server mode of JVM employs an escape analysis technique. Using it you can avoid GC all together.

Well, there are several questions in one here !
1 - How are short-lived objects managed ?
As previously stated, the JVM can perfectly deal with a huge amount of short lived object, since it follows the Weak Generational Hypothesis.
Note that we are speaking of objects that reached the main memory (heap). This is not always the case. A lot of objects you create does not even leave a CPU register. For instance, consider this for-loop
for(int i=0, i<max, i++) {
// stuff that implies i
}
Let's not think about loop unrolling (an optimisations that the JVM heavily performs on your code). If max is equal to Integer.MAX_VALUE, you loop might take some time to execute. However, the i variable will never escape the loop-block. Therefore the JVM will put that variable in a CPU register, regularly increment it but will never send it back to the main memory.
So, creating millions of objects are not a big deal if they are used only locally. They will be dead before being stored in Eden, so the GC won't even notice them.
2 - Is it useful to reduce the overhead of the GC ?
As usual, it depends.
First, you should enable GC logging to have a clear view about what is going on. You can enable it with -Xloggc:gc.log -XX:+PrintGCDetails.
If your application is spending a lot of time in a GC cycle, then, yes, tune the GC, otherwise, it might not be really worth it.
For instance, if you have a young GC every 100ms that takes 10ms, you spend 10% of your time in the GC, and you have 10 collections per second (which is huuuuuge). In such a case, I would not spend any time in GC tuning, since those 10 GC/s would still be there.
3 - Some experience
I had a similar problem on an application that was creating a huge amount of a given class. In the GC logs, I noticed that the creation rate of the application was around 3 GB/s, which is way too much (come on... 3 gigabytes of data every second ?!).
The problem : Too many frequent GC caused by too many objects being created.
In my case, I attached a memory profiler and noticed that a class represented a huge percentage of all my objects. I tracked down the instantiations to find out that this class was basically a pair of booleans wrapped in an object. In that case, two solutions were available :
Rework the algorithm so that I do not return a pair of booleans but instead I have two methods that return each boolean separately
Cache the objects, knowing that there were only 4 different instances
I chose the second one, as it had the least impact on the application and was easy to introduce. It took me minutes to put a factory with a not-thread-safe cache (I did not need thread safety since I would eventually have only 4 different instances).
The allocation rate went down to 1 GB/s, and so did the frequency of young GC (divided by 3).
Hope that helps !

If you have just value objects (that is, no references to other objects) and really but I mean really tons and tons of them, you can use direct ByteBuffers with native byte ordering [the latter is important] and you need some few hundred lines of code to allocate/reuse + getter/setters. Getters look similar to long getQuantity(int tupleIndex){return buffer.getLong(tupleInex+QUANTITY_OFFSSET);}
That would solve the GC problem almost entirely as long as you do allocate once only, that is, a huge chunk and then manage the objects yourself. Instead of references you'd have only index (that is, int) into the ByteBuffer that has to be passed along. You may need to do the memory align yourself as well.
The technique would feel like using C and void*, but with some wrapping it's bearable. A performance downside could be bounds checking if the compiler fails to eliminate it. A major upside is the locality if you process the tuples like vectors, the lack of the object header reduces the memory footprint as well.
Other than that, it's likely you'd not need such an approach as the young generation of virtually all JVM dies trivially and the allocation cost is just a pointer bump. Allocation cost can be a bit higher if you use final fields as they require memory fence on some platforms (namely ARM/Power), on x86 it is free, though.

Assuming you find GC is an issue (as others point out it might not be) you will be implementing your own memory management for you special case i.e. a class which suffers massive churn. Give object pooling a go, I've seen cases where it works quite well. Implementing object pools is a well trodden path so no need to re-visit here, look out for:
multi-threading: using thread local pools might work for your case
backing data structure: consider using ArrayDeque as it performs well on remove and has no allocation overhead
limit the size of your pool :)
Measure before/after etc,etc

I've met a similar problem. First of all, try to reduce the size of the small objects. We introduced some default field values referencing them in each object instance.
For example, MouseEvent has a reference to Point class. We cached Points and referenced them instead of creating new instances. The same for, for example, empty strings.
Another source was multiple booleans which were replaced with one int and for each boolean we use just one byte of the int.

I dealt with this scenario with some XML processing code some time ago. I found myself creating millions of XML tag objects which were very small (usually just a string) and extremely short-lived (failure of an XPath check meant no-match so discard).
I did some serious testing and came to the conclusion that I could only achieve about a 7% improvement on speed using a list of discarded tags instead of making new ones. However, once implemented I found that the free queue needed a mechanism added to prune it if it got too big - this completely nullified my optimisation so I switched it to an option.
In summary - probably not worth it - but I'm glad to see you are thinking about it, it shows you care.

Given that you are writing a chess program there are some special techniques you can use for decent performance. One simple approach is to create a large array of longs (or bytes) and treat it as a stack. Each time your move generator creates moves it pushes a couple of numbers onto the stack, e.g. move from square and move to square. As you evaluate the search tree you will be popping off moves and updating a board representation.
If you want expressive power use objects. If you want speed (in this case) go native.

One solution I've used for such search algorithms is to create just one Move object, mutate it with new move, and then undo the move before leaving the scope. You are probably analyzing just one move at a time, and then just storing the best move somewhere.
If that's not feasible for some reason, and you want to decrease peak memory usage, a good article about memory efficiency is here: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

Just create your millions of objects and write your code in the proper way: don't keep unnecessary references to these objects. GC will do the dirty job for you. You can play around with verbose GC as mentioned to see if they are really GC'd. Java IS about creating and releasing objects. :)

I think you should read about stack allocation in Java and escape analysis.
Because if you go deeper into this topic you may find that your objects are not even allocated on the heap, and they are not collected by GC the way that objects on the heap are.
There is a wikipedia explanation of escape analysis, with example of how this works in Java:
http://en.wikipedia.org/wiki/Escape_analysis

I am not a big fan of GC, so I always try finding ways around it. In this case I would suggest using Object Pool pattern:
The idea is to avoid creating new objects by store them in a stack so you can reuse it later.
Class MyPool
{
LinkedList<Objects> stack;
Object getObject(); // takes from stack, if it's empty creates new one
Object returnObject(); // adds to stack
}

Object pools provide tremendous (sometime 10x) improvements over object allocation on the heap. But the above implementation using a linked list is both naive and wrong! The linked list creates objects to manage its internal structure nullifying the effort.
A Ringbuffer using an array of objects work well. In the example give (a chess programm managing moves) the Ringbuffer should be wrapped into a holder object for the list of all computed moves. Only the moves holder object references would then be passed around.

Related

Is it normal when Java Heap grows during the game and then resets?

Should it remain constant? Native Heap does not change. I use Gdx.app.getJavaHeap(); Reset doesn't happen often, and about every few minutes, depending on the game. While the game objects are moving, it grows faster.

That behavior is perfectly normal, assuming it does not start to affect your runtime performance (Garbage collection can affect your frame time).
There are many ways to mitigate runtime allocation of objects, one of which is to avoid creating new ones at a time that can affect game-play.
[ Loading screens ]
Use a loading screen to pre-allocate and dispose of large objects or vast amounts of small ones, this will ensure that when you get to processing the game's rendering and logic your program will not have to wait for memory allocation/GC pauses for those.
[ Object pooling ]
You can, for example, pre-instantiate objects in a cache pool, and instead of creating new ones simply borrow those instances, use them as you see fit, and when you don't need them anymore you can release them back to the pool to be used once again.
LibGDX has an effective tool for this, right out of the box, and in fact makes extensive use of it. You can read more about it and how to use it here.
For example, I often do a lot of coordinate projections back and forth between screen space and world space, and this involves using a lot of temporary Vector2 objects to store intermediate values. At the end of the program execution I get a report of usage (custom debug code, not included in release builds) that tells me if all borrowers from a specific Pool have returned their instances, how many times the type was borrowed, and how many actual instances were allocated:
TYPE <Vector2Poolable> [OK] -> Borrowed [45948], Returned [45948], Free [38], Peak free [38]
This is from 60 seconds of execution, as you can see I only ever instantiated 38 Vector2Poolable objects, which have been used ~46k times in that time span.
Make no mistake, the goal here is not strictly on saving memory usage, the key takeaway here is that I avoided dumping on the JVM the burden of figuring out and managing ~50k small instances per minute. This can quickly pile up and start affecting your frame time.
[ Strings are your enemy ]
Strings in Java are Objects that are backed internally by a char array. Since arrays are immutable (cannot grow), Strings are immutable as well. Whenever a change to a String is made, an entirely new String is created. This means that what appear as very inexpensive operations, repeated at 60 times per second, can generate a lot of work for the GC.
This is more evident when concatenating strings via the + operator, this causes at least one allocation.
The best practice is to do concatenations using something like LibGDX's StringBuilder class, and reuse it as much as possible (you can have one per class if you're not concerned with concurrency, and you can wrap one in a ThreadLocal variable otherwise). StringBuilder's approach is similare to Pooling as it grows the backing store only when needed so that at some point during execution it will not have to reallocate memory for your string manipulation needs.
Do not underestimate the impact of strings manipulation on Garbage Collection, it's not by chance that most Scene2D.ui classes within Libgdx (such as Label) accept a StringBuilder instance directly, so that updating them does not even require allocating a String from the StringBuilder's buffer in order to use it (as seen here, allocations are minimized by simply copying data around in existing buffers)
[ Final thoughts ]
You don't know what you don't know and this is why, especially in game development and regardless of the language used, you are strongly encouraged to Profile your code and memory allocation. Solutions to have a look at what's going on during runtime in your program exist, these are some of my favorites:
JVisualVM Completely free, sufficient for most profiling tasks.
YourKit Java Profiler Requires a license and can easily be too expensive for non professional developers, but it's one of the most powerful tools I've ever used to profile java applications. Definitely worth having a look with a trial.

This is normal behaviour when you are creating a lot of new objects and then not using them anymore. After there is enough objects like this garbage collector gets executed and cleans them up. (Better description is here.)
It is perfectly fine and works without problems, just when this garbage collection happens you might experience small lag. If you wish to solve this problem then LibGDX have object pooling for this. You can read more about it here.

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
Something along the lines of http://wwwasd.web.cern.ch/wwwasd/lhc++/Objectivity/V5.2/Java/guide/jgdStorage.fm.html and specifically non-garbage-collectible containers there (non-garbage-collectable?).
The problem is that I have lots of ordinary temporary objects, but I have even bigger (several Gigs) of objects that are stored for Cache purposes. For no reason should the Java GC traverse all those Cache gigabytes trying to find anything to collect, because they contain cached data which have their own timeouts.
This way I could partition my data in a custom way into infinite-lived and normal-lived objects, and hopefully GC would be quite fast because normal objects don't live so long and amount to smaller amounts.
There are some workarounds to this problem, such as Apache DirectMemory and Commercial Terracotta BigMemory(http://terracotta.org/products/bigmemory), but a java-native solution would be nicer (I mean free and probably more reliable?). Also I want to avoid serialization overhead which means it should happen within same jvm. To my understanding DirectMemory and BigMemory operate mainly off heap which means that the objects must be serialized/deserialized to/from memory outside jvm. Simply marking non-gc regions within the jvm would seem a better solution. Using Files for cache is not an option either, it has the same unaffordable serialization/deserialization overhead - use case is a HA server with lots of data used in random (human) order and low latency needed.

Any memory the JVM manages is also garbage-collected by the JVM. And any “live” objects which are directly available to Java methods without deserialization have to live in JVM memory. Therefore in my understanding you cannot have live objects which are immune to garbage collection.
On the other hand, the usage you describe should make the generational approach to garbage collection quite efficient. If your big objects stay around for a while, they will be checked for reclamation less often. So I doubt there is much to be gained from avoiding those checks.

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
No it is not possible.
You can prevent objects from being garbage collected by keeping them reachable, but the GC will still need to trace them to check reachability on each full; GC (at least).
Is simply my assumption, that when the jvm is starving it begins scanning all those unnecessary objects too.
Yes. That is correct. However, unless you've got LOTS of objects that you want to be treated this way, the overhead is likely to be insignificant. (And anyway, a better idea is to give the JVM more memory ... if that is possible.)

Quite simply, for you to be able to do this, the garbage collection algorithm would need to be aware of such a flag, and take it into account when doing its work.
I'm not aware of any of the standard GC algorithms having such a flag, so for this to work you would need to write your own GC algorithm (after deciding on some feasible way to communicate this information to it).
In principle, in fact, you've already started down this track - you're deciding how garbage collection should be done rather than being happy to leaving it to the JVM's GC algo. Is the situation you describe a measurable problem for you; something for which the existing garbage collection is insufficient, but your plan would work? Garbage collectors are extremely well-tuned, so I wouldn't be surprised if the "inefficient" default strategy is actually faster than your naively-optimal one.
(Doing manual memory management is tricky and error-prone at the best of times; managing some memory yourself while using a stock garbage collector to handle the rest seems even worse. I expect you'd run into a lot of edge cases where the GC assumes it "knows" what's happening with the whole heap, which would no longer be true. Steer clear if you can...)

The recommended approaches would be to use either a commerical RTSJ implementation to avoid GC, or to use off heap memory. One could also look into soft references for caches as well (they do get collected).
This is not recommended:
If for some reason you do not believe these options are sufficient, you could look into direct memory access which is UNSAFE (part of sun.misc.Unsafe). You can use the 'theUnsafe' field to get the 'Unsafe' instance. Unsafe allows to allocation/deallocate memory via 'allocateMemory' and 'freeMemory'. This is not under GC control nor limited by JVM heap size. The impact on GC/application, once you go down this route, is not guaranteed - which is why using byte buffers might be the way to go (if you're not using a RTSJ like implementation).
Hope this helps.

Living Java objects will always be part of the GC life cycle. Or said another way, marking an object to be non-gc is the same order of overhead than having your object referenced by a root reference (a static final map for instance).
But thinking a bit further, data put in a cache are most likely to be temporary, and would eventually be evicted. At that point you will start again to like the JVM and the GC.
If you have 100's of GBs of permanent data, you may want to rethink the architecture of your application, and try to shard and distribute your data (horizontally scalability).
Last but not least, lots of work has been done around serialization, and the overhead of serialization (I'm not speaking about the poor reputation of ObjectInputStream and ObjectOutputStream) is not that big.
More than that, if your data is mainly composed of primitive types (including bytes array), there is efficient way to readInt() or readBytes() from off heap buffers (for instannce netty.io's ChannelBuffer). This could be a way to go.

How to memory profile in Java?

I'm still learning the ropes of Java so sorry if there's a obvious answer to this. I have a program that is taking a ton of memory and I want to figure a way to reduce its usage, but after reading many SO questions I have the idea that I need to prove where the problem is before I start optimizing it.
So here's what I did, I added a break point to the start of my program and ran it, then I started visualVM and had it profile the memory(I also did the same thing in netbeans just to compare the results and they are the same). My problem is I don't know how to read them, I got the highest area just saying char[] and I can't see any code or anything(which makes sense because visualvm is connecting to the jvm and can't see my source, but netbeans also does not show me the source as it does when doing cpu profiling).
Basically what I want to know is which variable(and hopefully more details like in which method) all the memory is being used so I can focus on working there. Is there a easy way to do this? I right now I am using eclipse and java to develop(and installed visualVM and netbeans specifically for profiling but am willing to install anything else that you feel gets this job done).
EDIT: Ideally, I'm looking for something that will take all my objects and sort them by size(so I can see which one is hogging memory). Currently it returns generic information such as string[] or int[] but I want to know which object its referring to so I can work on getting its size more optimized.

Strings are problematic
Basically in Java, String references ( things that use char[] behind the scenes ) will dominate most business applications memory wise. How they are created determines how much memory they consume in the JVM.
Just because they are so fundamental to most business applications as a data type, and they are one of the most memory hungry as well. This isn't just a Java thing, String data types take up lots of memory in pretty much every language and run time library, because at the least they are just arrays of 1 byte per character or at the worse ( Unicode ) they are arrays of multiple bytes per character.
Once when profiling CPU usage on a web app that also had an Oracle JDBC dependency I discovered that StringBuffer.append() dominated the CPU cycles by many orders of magnitude over all other method calls combined, much less any other single method call. The JDBC driver did lots and lots of String manipulation, kind of the trade off of using PreparedStatements for everything.
What you are concerned about you can't control, not directly anyway
What you should focus on is what in in your control, which is making sure you don't hold on to references longer than you need to, and that you are not duplicating things unnecessarily. The garbage collection routines in Java are highly optimized, and if you learn how their algorithms work, you can make sure your program behaves in the optimal way for those algorithms to work.
Java Heap Memory isn't like manually managed memory in other languages, those rules don't apply
What are considered memory leaks in other languages aren't the same thing/root cause as in Java with its garbage collection system.
Most likely in Java memory isn't consumed by one single uber-object that is leaking ( dangling reference in other environments ).
It is most likely lots of smaller allocations because of StringBuffer/StringBuilder objects not sized appropriately on first instantantations and then having to automatically grow the char[] arrays to hold subsequent append() calls.
These intermediate objects may be held around longer than expected by the garbage collector because of the scope they are in and lots of other things that can vary at run time.
EXAMPLE: the garbage collector may decide that there are candidates, but because it considers that there is plenty of memory still to be had that it might be too expensive time wise to flush them out at that point in time, and it will wait until memory pressure gets higher.
The garbage collector is really good now, but it isn't magic, if you are doing degenerate things, it will cause it to not work optimally. There is lots of documentation on the internet about the garbage collector settings for all the versions of the JVMs.
These un-referenced objects may just have not reached the time that the garbage collector thinks it needs them to for them to be expunged from memory, or there could be references to them held by some other object ( List ) for example that you don't realize still points to that object. This is what is most commonly referred to as a leak in Java, which is a reference leak more specifically.
EXAMPLE: If you know you need to build a 4K String using a StringBuilder create it with new StringBuilder(4096); not the default, which is like 32 and will immediately start creating garbage that can represent many times what you think the object should be size wise.
You can discover how many of what types of objects are instantiated with VisualVM, this will tell you what you need to know. There isn't going to be one big flashing light that points at a single instance of a single class that says, "This is the big memory consumer!", that is unless there is only one instance of some char[] that you are reading some massive file into, and this is not possible either, because lots of other classes use char[] internally; and then you pretty much knew that already.
I don't see any mention of OutOfMemoryError
You probably don't have a problem in your code, the garbage collection system just might not be getting put under enough pressure to kick in and deallocate objects that you think it should be cleaning up. What you think is a problem probably isn't, not unless your program is crashing with OutOfMemoryError. This isn't C, C++, Objective-C, or any other manual memory management language / runtime. You don't get to decide what is in memory or not at the detail level you are expecting you should be able to.

In JProfiler, you can take go to the heap walker and activate the biggest objects view. You will see the objects the retain most memory. "Retained" memory is the memory that would be freed by the garbage collector if you removed the object.
You can then open the object nodes to see the reference tree of the retained objects. Here's a screen shot of the biggest object view:
Disclaimer: My company develops JProfiler

I would recommend capturing heap dumps and using a tool like Eclipse MAT that lets you analyze them. There are many tutorials available. It provides a view of the dominator tree to provide insight into the relationships between the objects on the heap. Specifically for what you mentioned, the "path to GC roots" feature of MAT will tell you where the majority of those char[], String[] and int[] objects are being referenced. JVisualVM can also be useful in identifying leaks and allocations, particularly by using snapshots with allocation stack traces. There are quite a few walk-throughs of the process of getting the snapshots and comparing them to find the allocation point.

Java JDK comes with JVisualVM under bin folder, once your application server (for example is running) you can run visualvm and connect it to your localhost, which will provide you memory allocation and enable you to perform heap dump

If you use visualVM to check your memory usage, it focuses on the data, not the methods. Maybe your big char[] data is caused by many String values? Unless you are using recursion, the data will not be from local variables. So you can focus on the methods that insert elements into large data structures. To find out what precise statements cause your "memory leakage", I suggest you additionally
read Josh Bloch's Effective Java Item 6: (Eliminate obsolete object references)
use a logging framework an log instance creations on the highest verbosity level.

There are generally two distinct approaches to analyse Java code to gain an understanding of its memory allocation profile. If you're trying to measure the impact of a specific, small section of code – say you want to compare two alternative implementations in order to decide which one gives better runtime performance – you would use a microbenchmarking tool such as JMH.
While you can pause the running program, the JVM is a sophisticated runtime that performs a variety of housekeeping tasks and it's really hard to get a "point in time" snapshot and an accurate reading of the "level of memory usage". It might allocate/free memory at a rate that does not directly reflect the behaviour of the running Java program. Similarly, performing a Java object heap dump does not fully capture the low-level machine specific memory layout that dictates the actual memory footprint, as this could depend on the machine architecture, JVM version, and other runtime factors.
Tools like JMH get around this by repeatedly running a small section of code, and observing a long-running average of memory allocations across a number of invocations. E.g. in the GC profiling sample JMH benchmark the derived *·gc.alloc.rate.norm metric gives a reasonably accurate per-invocation normalised memory cost.
In the more general case, you can attach a profiler to a running application and get JVM-level metrics, or perform a heap dump for offline analysis. Some commonly used tools for profiling full applications are Async Profiler and the newly open-sourced Java Flight Recorder in conjunction with Java Mission Control to visualise results.

Is object creation a bottleneck in Java in multithreaded environment?

Based on the understanding from the following:
Where is allocated variable reference, in stack or in the heap?
I was wondering since all the objects are created on the common heap. If multiple threads create objects then to prevent data corruption there has to be some serialization that must be happening to prevent the multiple threads from creating objects at same locations. Now, with a large number of threads this serialization would cause a big bottleneck. How does Java avoid this bottleneck? Or am I missing something?
Any help appreciated.

Modern VM implementations reserve for each thread an own area on the heap to create objects in. So, no problem as long as this area does not get full (then the garbage collector moves the surviving objects).
Further read: how TLAB works in Sun's JVM. Azul's VM uses slightly different approach (look at "A new thread & stack layout"), the article shows quite a few tricks JVMs may perform behind the scenes to ensure nowadays Java speed.
The main idea is keeping per thread (non-shared) area to allocate new objects, much like allocating on the stack with C/C++. The copy garbage collection is very quick to deallocate the short-lived objects, the few survivors, if any, are moved into different area. Thus, creating relatively small objects is very fast and lock free.
The lock free allocation is very important, especially since the question regards multithreaded environment. It also allows true lock-free algorithms to exist. Even if an algorithm, itself, is a lock-free but allocation of new objects is synchronized, the entire algorithm is effectively synchronized and ultimately less scalable.
java.util.concurrent.ConcurrentLinkedQueue which is based on the work of Maged M. Michael Michael L. Scott is a classic example.
What happens if an object is referenced by another thread? (due to discussion request)
That object (call it A) will be moved to some "survivor" area. The survivor area is checked less often than the ThreadLocal areas. It contains, like the name suggests, objects whose references managed to escape, or in particular A managed to stay alive. The copy (move) part occurs during some "safe point" (safe point excludes properly JIT'd code), so the garbage collector is sure the object is not being referenced. The references to the object are updated, the necessary memories fences issued and the application (java code) is free to continue. Further read to this simplistic scenario.
To the very interested reader and if possible to chew it: the highly advanced Pauseless GC Algorithm

No. The JVM has all sorts of tricks up its sleeves to avoid any sort of simpleminded serialization at the point of 'new'.

Sometimes. I wrote a recursive method that generates integer permutations and creates objects from those. The multithreaded version (every branch from root = task, but concurrent thread count limited to number of cores) of that method wasn't faster. And the CPU load wasn't higher. The tasks didn't share any object. After I removed the object creation from both methods the multithreaded method was ~4x faster (6 cores) and used 100% CPU. In my test case the methods generated ~4,500,000 permutations, 1500 per task.
I think TLAB didn't work because it's space is limited (see: Thread Local Allocation Buffers).

frequent garbage collection java web app

I have a web app that serializes a java bean into xml or json according to the user request.
I am facing a mind bending problem when I put a little bit of load on it, it quickly uses all allocated memory, and reach max capacity. I then observe full GC working really hard every 20-40 seconds.
Doesnt look like a memory leak issue... but I am not quite sure how to trouble shoot this?
The bean that is serialized to xml/json has reference to other beans and those to others. I use json-lib and jaxb to serialize the beans.
yourkit memory profiler is telling me that a char[] is the most memory consuming live object...
any insight is appreciated.

There are two possibilities: you've got a memory leak, or your webapp is just generating lots of garbage.
The brute-force way to tell if you've got a memory leak is to run it for a long time and see if it falls over with an OOME. Or turn on GC logging, and see if the average space left after garbage collection continually trends upwards over time.
Whether or not you have a memory leak, you can probably improve performance (reduce the percentage GC time) by increasing the max heap size. The fact that your webapp is seeing lots of full GCs suggests to me that it needs more heap. (This is just a bandaid solution if you have a memory leak.)
If it turns out that you are not suffering from a memory leak, then you should take a look at why your application is generating so much garbage. It could be down to the way that you are doing the XML and JSON serialization.

Why do you think you have a problem? GC is a natural and normal thing to happen. We have customers that GC every second (for less than 100ms duration), and that's fine as long as memory keeps getting reclaimed.
GCing every 20-40 seconds isn't a problem IMO - as long as it doesn't take a large % of that 20-40s. Most major commercial JVMs aim to keep GC in the 5-10% of time range (so 1-4 seconds of that 20-40s). Posting more data in the form of the GC logs might help, and I'd also suggest tools like GCMV would help you visualize and get recommendations on what your GC profile looks like.

It's impossible to diagnose this without a lot more information - code and GC logs - but my guess would be that you're reading data in as large strings, then chopping out little bits with substring(). When you do that, the substring string is made using the same underlying character array as the parent string, and so as long as it's alive, will keep that array in memory. That means code like this:
String big = a string of one million characters;
String small = big.substring(0, 1);
big = null;
Will still keep the huge string's character data in memory. If this is the case, then you can address it by forcing the small strings to use fresh, smaller, character arrays by constructing new instances:
small = new String(small);
But like i said, this is just a guess.

I'm not sure how much of it is in your code and how much might be in the tools you are using, but there are some key things to watch for.
One of the worst is if you constantly add to strings in loops. A simple "hello"+"world" is no problem at all, it's actually very smart about that, but if you do it in a loop it will constantly reallocate the string. Use StringBuilder where you can.
There are profilers for Java that should quickly point you to where the allocations are taking place. Just fool around with a profiler for a while while your java app is running and you will probably be able to reduce your GCs to virtually nothing unless the problem is inside your libraries--and even then you may figure out some way to fix it.
Things you allocate and then free quickly don't require time in the GC phase--it's pretty much free. Be sure you aren't keeping Strings around longer than you need them. Bring them in, process them and return to your previous state before returning from your request handler.

You should attach yourkit and record allocations (e.g., every 10th allocation; including all large ones). They have a step by step guide on diagnosing excessive gc:
http://www.yourkit.com/docs/90/help/excessive_gc.jsp

To me that sounds like you are trying to serialize a recursive object by some encoder which is not prepared for it.
(or at least: very deep/almost recursive)

Java's native XML API is really "noisy" and generally wasteful in terms of resources which means that if your requests and XML/JSON generation cycles are short-lived, the GC will have lots to clean up for.
I have debugged a very similar case and found out this the hard way, only way I could at least somewhat improve the situation without major refactorings was implicitly calling GC with the appropriate VM flags which actually turn System.gc(); from a non-op call to maybe-op call.

I would start by inspecting my running application to see what was being created on the heap.
HPROF can collect this information for you, which you can then analyse using HAT.

To debug issues with memory allocations, InMemProfiler can be used at the command line. Collected object allocations can be tracked and collected objects can be split into buckets based on their lifetimes.
In trace mode this tool can be used to identify the source of memory allocations.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.