Ways to reduce memory churn

Ways to reduce memory churn - java

Background
I have a Spring batch program that reads a file (example file I am working with is ~ 4 GB in size), does a small amount of processing on the file, and then writes it off to an Oracle database.
My program uses 1 thread to read the file, and 12 worker threads to do the processing and database pushing.
I am churning lots and lots and lots of young gen memory, which is causing my program to go slower than I think it should.
Setup
JDK 1.6.18
Spring batch 2.1.x
4 Core Machine w 16 GB ram
-Xmx12G
-Xms12G
-NewRatio=1
-XX:+UseParallelGC
-XX:+UseParallelOldGC
Problem
With these JVM params, I get somewhere around 5.x GB of memory for Tenured Generation, and around 5.X GB of memory for Young Generation.
In the course of processing this one file, my Tenured Generation is fine. It grows to a max of maybe 3 GB, and I never need to do a single full GC.
However, the Young Generation hits it's max many times. It goes up to 5 GB range, and then a parallel minor GC happens and clears Young Gen down to 500MB used. Minor GCs are good and better than a full GC, but it still slows down my program a lot (I am pretty sure the app still freezes when a young gen collection occurs, because I see the database activity die off). I am spending well over 5% of my program time frozen for minor GCs, and this seems excessive. I would say over the course of processing this 4 GB file, I churn through 50-60GB of young gen memory.
I don't see any obvious flaws in my program. I am trying to obey the general OO principles and write clean Java code. I am trying not to create objects for no reason. I am using thread pools, and whenever possible passing objects along instead of creating new objects. I am going to start profiling the application, but I was wondering if anyone had some good general rules of thumb or anti patterns to avoid that lead to excessive memory churn? Is 50-60GB of memory churn to process a 4GB file the best I can do? Do I have to revert to JDk 1.2 tricks like Object Pooling? (although Brian Goetz give a presentation that included why object pooling is stupid, and we don't need to do it anymore. I trust him a lot more than I trust myself .. :) )

I have a feeling that you are spending time and effort trying to optimize something that you should not bother with.
I am spending well over 5% of my program time frozen for minor GCs, and this seems excessive.
Flip that around. You are spending just under 95% of your program time doing useful work. Or put it another way, even if you managed to optimize the GC to run in ZERO time, the best you can get is something over 5% improvement.
If your application has hard timing requirements that are impacted by the pause times, you could consider using a low-pause collector. (Be aware that reducing pause times increases the overall GC overheads ...) However for a batch job, the GC pause times should not be relevant.
What probably matters most is the wall clock time for the overall batch job. And the (roughly) 95% of the time spent doing application specific stuff is where you are likely to get more pay-off for your profiling / targeted optimization efforts. For example, have you looked at batching the updates that you send to the database?
So.. 90% of my total memory is in char[] in "oracle.sql.converter.toOracleStringWithReplacement"
That would tend to indicate that most of your memory usage occurs in the Oracle JDBC drivers while preparing stuff to be sent to the database. There's very little you about that. I'd chalk it up as an unavoidable overhead.

It would be really usefull if you clarify your terms "young" and "tentured" generation because Java 6 has a slightly different GC-Model: Eden, S0+S1, Old, Perm
Have you experimented with the different garbage collection algorithms? How has "UseConcMarkSweepGC" or "UseParNewGC" performed.
And don't forget simply increasing the available space is NOT the solution, because a gc run will take much longer, decrease the size to normal values ;)
Are you sure you have no memory-leaks? In a consumer-producer-pattern - you describe - rarely seldom data should be in the Old Gen because those jobs are proccessed really fast and then "thrown away", or is your work queue filling up?
You should defintely observe your program with a memory analyzer.

I think a session with a memory profiler will shed a lot of light on the subject. This gives a nice overview how many objects are created and this is somtimes revealing.
I am always amazed how many strings are generated.
For domain objects crossreferencing them is also revealing. If you see suddenly 3 times more objects from a derived object than from the source then there something going on there.
Netbeans has a nice one built it. I used JProfiler in the past. I think if you bang long enough on eclipse you can get the same info from the PPTP tools.

You need to profile your application to see what is happening exactly. And I would also try first to use the ergonomics feature of the JVM, as recommended:
2. Ergonomics
A feature referred to here as
ergonomics was introduced in J2SE 5.0.
The goal of ergonomics is to provide
good performance with little or no
tuning of command line options by
selecting the
garbage collector,
heap size,
and runtime compiler
at JVM startup, instead of using fixed
defaults. This selection assumes that
the class of the machine on which the
application is run is a hint as to the
characteristics of the application
(i.e., large applications run on large
machines). In addition to these
selections is a simplified way of
tuning garbage collection. With the
parallel collector the user can
specify goals for a maximum pause time
and a desired throughput for an
application. This is in contrast to
specifying the size of the heap that
is needed for good performance. This
is intended to particularly improve
the performance of large applications
that use large heaps. The more general
ergonomics is described in the
document entitled “Ergonomics in the
5.0 Java Virtual Machine”. It is recommended that the ergonomics as
presented in this latter document be
tried before using the more detailed
controls explained in this document.
Included in this document are the
ergonomics features provided as part
of the adaptive size policy for the
parallel collector. This includes the
options to specify goals for the
performance of garbage collection and
additional options to fine tune that
performance.
See the more detailed section about Ergonomics in the Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning guide.

In my opinion, the young generation should not be equally big as the old generation, so that the small garbage collections stay fast.
Do you have many objects that represent the same value? If you do, merge these duplicate objects using a simple HashMap:
public class MemorySavingUtils {
ConcurrentHashMap<String, String> knownStrings = new ConcurrentHashMap<String, String>();
public String unique(String s) {
return knownStrings.putIfAbsent(s, s);
}
public void clear() {
knownStrings.clear();
}
}
With the Sun Hotspot compiler, the native String.intern() is really slow for large numbers of Strings, that's why I suggest to build your own String interner.
Using this method, strings from the old generation are reused and strings from the new generation can be garbage collected quickly.

Read a line from a file, store as a string and put in a list. When the list has 1000 of these strings, put it in a queue to be read by worker threads. Have said worker thread make a domain object, peel a bunch of values off the string to set the fields (int, long, java.util.Date, or String), and pass the domain object along to a default spring batch jdbc writer
if that's your program, why not set a smaller memory size, like 256MB?

I'm guessing with a memory limit that high you must be reading the file entirely into memory before doing the processing. Could you consider using a java.io.RandomAccessFile instead?

Related

java full gc taking too long

I have a Java client which consumes a large amount of data from a server. If the client does not keep up with the data stream at a fast enough rate, the server disconnects the socket connection. My client gets disconnected a few times per day. I ran jconsole to see the memory usage, and the heap space graph looks like a fairly well defined sawtooth pattern, oscillating between about 0.5GB and 1.8GB (2GB of heap space is allocated). But every time I get disconnected is during a full GC (but not on every full GC). I see the full GC takes a bit over 1 second on average. Depending on the time of day, full GC happens as often as every 5 minutes when busy, or up to 30 minutes can go by in between full GCs during the slow periods.
I suspect if I can reduce the full GC time, the client will be able to better keep up with the incoming data, but I do not have much experience with GC tuning. Does anyone have some insight on if this might be a good idea, and how to do it? Or is there an alternative idea which may work as well?
** UPDATE **
I used -XX:+UseConcMarkSweepGC and it improved, but I still got disconnected during the very busy moments. So I increased the heap allocation to 3GB to help weather through the busy moments and it seems to be chugging along pretty well now, but it's only been 1 day without a disconnection. Maybe if I get some time I will go through and try to reduce the amount of garbage created which I'm confident will help as well. Thanks for all the suggestions.

Full GC could take very long to complete, and is not that easy to tune.
One way to (easily) tune it is to increase the heap space - generally speaking, double the heap space can double the interval between two GCs, but will double the time consumed by a GC. If the program you are running has very clear usage patterns, maybe you can consider increase the heap space to make the interval so large that you can guarantee to have some idle time to try to make the system perform a GC. On the other hand, following this logic, if the heap is small a full garbage collection will finish in a instant, but that seems like inviting more troubles than helping.
Also, -XX:+UseConcMarkSweepGC might help since it will try to perform the GC operations concurrently (not stopping your program; see here).
Here's a very nice talk by Til Gene (CTO of Azul systems, maker of high performance JVM, and published several GC algos), about GC in JVM in general.

It is not easy to tune away the Full GC. A much better approach is to produce less garbage. Producing less garbage reduces pressure on the collection to pass objects into the tenured space where they are more expensive to collect.
I suggest you use a memory profiler to
reduce the amount of garbage produced. In many applications this can be reduce by a factor of 2 - 10x relatively easily.
reduce the size of the objects you are creating e.g. use primitive and smaller datatypes like double instead of BigDecimal.
recycle mutable object instead of discarding them.
retain less data on the client if you can.
By reducing the amount of garbage you create, objects are more likely to die in the eden, or survivor spaces meaning you have far less Full collections, which can be shorter as well.
Don't take it for granted you have to live with lots of collections, in extreme cases you can avoid it almost completely http://vanillajava.blogspot.ro/2011/06/how-to-avoid-garbage-collection.html

Take out calls to Runtime.getRuntime().gc() - When garbage collection is triggered manually it either does nothing or it does a full stop-the-world garbage collection. You want incremental GC to happen.
Have you tried using the server jvm from a jdk install? It changes a bunch of the default configuration settings (including garbage collection) and is easy to try - just add -server to your java command.
java -server
What is all the garbage that gets created? Can you generate less of it? Where possible, try to use the valueOf methods. By using less memory you'll save yourself time in gc AND in memory allocation.

Long GC pauses in application

I am currently running an application which requires a maximum heap size of 16GB.
Currently I use the following flags to handle garbage collection.
-XX\:+UseParNewGC, -XX\:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=50, -XX\:+DisableExplicitGC, -XX\:+PrintGCDateStamps, -XX\:+PrintGCDetails, -Xloggc\:/home/user/logs/gc.log
However, I have noticed that during some garbage collections, the application locks up for a few seconds and then carries on - This is completely unacceptable as it's a game server.
An exert from my garbage collection logs can be found here.
Any advice on what I should change in order to reduce these long pauses would be greatly appreciated.

Any advice on what I should change in order to reduce these long pauses would be greatly appreciated.
The chances are that the CMS GC cannot keep up with the amount of garbage your system is generating. But the work that the GC has to perform is actually more closely related to the amount of NON-garbage that your system is retaining.
So ...
Try to reduce the actual memory usage of your application; e.g. by not caching so much stuff, or reducing the size of your "world".
Try to reduce the rate at which your application generates garbage.
Upgrade to a machine with more cores so that there are more cores available to run the parallel GC threads when necessary.
To Mysticial:
Yes in hindsight, it might have been better to implement the server in C++. However, we don't know anything about "the game". If it involves a complicated world model with complicated heterogeneous data structures, then implementing it in C++ could mean that that you replace the "GC pause" problem with the problem that the server crashes all the time due to problems with the way it manages its data structures.

Looking at your logs, I don't see any long pauses. But young GC is very frequent. Promotion rate is very low though (most garbage cleared by young GC as it should). At same time your old space utilization is low.
BTW are we talking about minecraft server?
To reduce frequency of young GC you should increase its size. I would suggest start with -XX:NewSize=8G -XX:MaxNewSize=8G
For such large young space, you should also reduce survivor space size -XX:SurvivorRatio=512
GC tuning is a path of trial and errors, so you may need some more iterations and tweaking.
You can find couple of useful articles at mu blog
HotSpot JVM GC options cheatsheet
Understanding young GC pauses in HotSpot JVM

I'm not an expert on Java garbage collection, but it looks like you're doing the right thing by using the concurrent collector (the UseConcMarkSweepGC flag), assuming the server has multiple processors. Follow the suggestions for troubleshooting at http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#cms. If you already have, let us know what happened when you tried them.

Which version of java are you using?http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html
For better try to minimize the use of instance variables in a class.It would be better to perform on local variables than instance varibles .It helps in gaining the performance and safe from synchronization problem.In the end of operation before exit of program always reset the used variables if you are using instance variables and set again when it is required. It helps more in enhancing performance.Besides in the version of java a good garbage collection policy is implemented.It would be better to move to new version if that is fleasible.
Also you can monitor the garbage collector pause time via VisualVm and you can get more idea when it is performing more garbage collection.

How can I analyze what objects are in which generation?

I have a long-running (multiple days) application with objects that I expect to stay around for varying lengths of time before they can be garbage collected. Let's say there's four categories:
Very short-lived (<1s)
Alive for the duration of user attention (1s < 18h)
Daily data (~24h)
'Eternal' (very few, life of the application)
To help with tuning, I'd like to find a way of checking what actual data is getting in to the tenured generation, using the Java 6 Hotspot VM. Using jmap to generate HPROF files doesn't seem to include generational information. Is there another way of getting this information?

No, there is no simple way to get generation information for individual object. In fact if you ask for "live" objects, this will trigger a Full GC and place all objects into the old generation (so now you know where all the objects are, but not where they were)
Objects which survive a full GC are likely to be in old generation so if your system does a full GC every 5 minutes, anything which lasts longer than that is much the same.
The best thing you can do is to minimise discarded objects (use a memory profiler to help) This will improve GC performance and descrease how often they occur. In extreme examples you can use off heap memory which is difficult to work with but uses next to no heap. esp useful if you have many GB of data.

How to memory profile in Java?

I'm still learning the ropes of Java so sorry if there's a obvious answer to this. I have a program that is taking a ton of memory and I want to figure a way to reduce its usage, but after reading many SO questions I have the idea that I need to prove where the problem is before I start optimizing it.
So here's what I did, I added a break point to the start of my program and ran it, then I started visualVM and had it profile the memory(I also did the same thing in netbeans just to compare the results and they are the same). My problem is I don't know how to read them, I got the highest area just saying char[] and I can't see any code or anything(which makes sense because visualvm is connecting to the jvm and can't see my source, but netbeans also does not show me the source as it does when doing cpu profiling).
Basically what I want to know is which variable(and hopefully more details like in which method) all the memory is being used so I can focus on working there. Is there a easy way to do this? I right now I am using eclipse and java to develop(and installed visualVM and netbeans specifically for profiling but am willing to install anything else that you feel gets this job done).
EDIT: Ideally, I'm looking for something that will take all my objects and sort them by size(so I can see which one is hogging memory). Currently it returns generic information such as string[] or int[] but I want to know which object its referring to so I can work on getting its size more optimized.

Strings are problematic
Basically in Java, String references ( things that use char[] behind the scenes ) will dominate most business applications memory wise. How they are created determines how much memory they consume in the JVM.
Just because they are so fundamental to most business applications as a data type, and they are one of the most memory hungry as well. This isn't just a Java thing, String data types take up lots of memory in pretty much every language and run time library, because at the least they are just arrays of 1 byte per character or at the worse ( Unicode ) they are arrays of multiple bytes per character.
Once when profiling CPU usage on a web app that also had an Oracle JDBC dependency I discovered that StringBuffer.append() dominated the CPU cycles by many orders of magnitude over all other method calls combined, much less any other single method call. The JDBC driver did lots and lots of String manipulation, kind of the trade off of using PreparedStatements for everything.
What you are concerned about you can't control, not directly anyway
What you should focus on is what in in your control, which is making sure you don't hold on to references longer than you need to, and that you are not duplicating things unnecessarily. The garbage collection routines in Java are highly optimized, and if you learn how their algorithms work, you can make sure your program behaves in the optimal way for those algorithms to work.
Java Heap Memory isn't like manually managed memory in other languages, those rules don't apply
What are considered memory leaks in other languages aren't the same thing/root cause as in Java with its garbage collection system.
Most likely in Java memory isn't consumed by one single uber-object that is leaking ( dangling reference in other environments ).
It is most likely lots of smaller allocations because of StringBuffer/StringBuilder objects not sized appropriately on first instantantations and then having to automatically grow the char[] arrays to hold subsequent append() calls.
These intermediate objects may be held around longer than expected by the garbage collector because of the scope they are in and lots of other things that can vary at run time.
EXAMPLE: the garbage collector may decide that there are candidates, but because it considers that there is plenty of memory still to be had that it might be too expensive time wise to flush them out at that point in time, and it will wait until memory pressure gets higher.
The garbage collector is really good now, but it isn't magic, if you are doing degenerate things, it will cause it to not work optimally. There is lots of documentation on the internet about the garbage collector settings for all the versions of the JVMs.
These un-referenced objects may just have not reached the time that the garbage collector thinks it needs them to for them to be expunged from memory, or there could be references to them held by some other object ( List ) for example that you don't realize still points to that object. This is what is most commonly referred to as a leak in Java, which is a reference leak more specifically.
EXAMPLE: If you know you need to build a 4K String using a StringBuilder create it with new StringBuilder(4096); not the default, which is like 32 and will immediately start creating garbage that can represent many times what you think the object should be size wise.
You can discover how many of what types of objects are instantiated with VisualVM, this will tell you what you need to know. There isn't going to be one big flashing light that points at a single instance of a single class that says, "This is the big memory consumer!", that is unless there is only one instance of some char[] that you are reading some massive file into, and this is not possible either, because lots of other classes use char[] internally; and then you pretty much knew that already.
I don't see any mention of OutOfMemoryError
You probably don't have a problem in your code, the garbage collection system just might not be getting put under enough pressure to kick in and deallocate objects that you think it should be cleaning up. What you think is a problem probably isn't, not unless your program is crashing with OutOfMemoryError. This isn't C, C++, Objective-C, or any other manual memory management language / runtime. You don't get to decide what is in memory or not at the detail level you are expecting you should be able to.

In JProfiler, you can take go to the heap walker and activate the biggest objects view. You will see the objects the retain most memory. "Retained" memory is the memory that would be freed by the garbage collector if you removed the object.
You can then open the object nodes to see the reference tree of the retained objects. Here's a screen shot of the biggest object view:
Disclaimer: My company develops JProfiler

I would recommend capturing heap dumps and using a tool like Eclipse MAT that lets you analyze them. There are many tutorials available. It provides a view of the dominator tree to provide insight into the relationships between the objects on the heap. Specifically for what you mentioned, the "path to GC roots" feature of MAT will tell you where the majority of those char[], String[] and int[] objects are being referenced. JVisualVM can also be useful in identifying leaks and allocations, particularly by using snapshots with allocation stack traces. There are quite a few walk-throughs of the process of getting the snapshots and comparing them to find the allocation point.

Java JDK comes with JVisualVM under bin folder, once your application server (for example is running) you can run visualvm and connect it to your localhost, which will provide you memory allocation and enable you to perform heap dump

If you use visualVM to check your memory usage, it focuses on the data, not the methods. Maybe your big char[] data is caused by many String values? Unless you are using recursion, the data will not be from local variables. So you can focus on the methods that insert elements into large data structures. To find out what precise statements cause your "memory leakage", I suggest you additionally
read Josh Bloch's Effective Java Item 6: (Eliminate obsolete object references)
use a logging framework an log instance creations on the highest verbosity level.

There are generally two distinct approaches to analyse Java code to gain an understanding of its memory allocation profile. If you're trying to measure the impact of a specific, small section of code – say you want to compare two alternative implementations in order to decide which one gives better runtime performance – you would use a microbenchmarking tool such as JMH.
While you can pause the running program, the JVM is a sophisticated runtime that performs a variety of housekeeping tasks and it's really hard to get a "point in time" snapshot and an accurate reading of the "level of memory usage". It might allocate/free memory at a rate that does not directly reflect the behaviour of the running Java program. Similarly, performing a Java object heap dump does not fully capture the low-level machine specific memory layout that dictates the actual memory footprint, as this could depend on the machine architecture, JVM version, and other runtime factors.
Tools like JMH get around this by repeatedly running a small section of code, and observing a long-running average of memory allocations across a number of invocations. E.g. in the GC profiling sample JMH benchmark the derived *·gc.alloc.rate.norm metric gives a reasonably accurate per-invocation normalised memory cost.
In the more general case, you can attach a profiler to a running application and get JVM-level metrics, or perform a heap dump for offline analysis. Some commonly used tools for profiling full applications are Async Profiler and the newly open-sourced Java Flight Recorder in conjunction with Java Mission Control to visualise results.

Java 6 Excessive Memory Usage

Does Java 6 consume more memory than you expect for largish applications?
I have an application I have been developing for years, which has, until now taken about 30-40 MB in my particular test configuration; now with Java 6u10 and 11 it is taking several hundred while active. It bounces around a lot, anywhere between 50M and 200M, and when it idles, it does GC and drop the memory right down. In addition it generates millions of page faults. All of this is observed via Windows Task Manager.
So, I ran it up under my profiler (jProfiler) and using jVisualVM, and both of them indicate the usual moderate heap and perm-gen usages of around 30M combined, even when fully active doing my load-test cycle.
So I am mystified! And it not just requesting more memory from the Windows Virtual Memory pool - this is showing up as 200M "Mem Usage".
CLARIFICATION: I want to be perfectly clear on this - observed over an 18 hour period with Java VisualVM the class heap and perm gen heap have been perfectly stable. The allocated volatile heap (eden and tenured) sits unmoved at 16MB (which it reaches in the first few minutes), and the use of this memory fluctuates in a perfect pattern of growing evenly from 8MB to 16MB, at which point GC kicks in an drops it back to 8MB. Over this 18 hour period, the system was under constant maximum load since I was running a stress test. This behavior is perfectly and consistently reproducible, seen over numerous runs. The only anomaly is that while this is going on the memory taken from Windows, observed via Task Manager, fluctuates all over the place from 64MB up to 900+MB.
UPDATE 2008-12-18: I have run the program with -Xms16M -Xmx16M without any apparent adverse affect - performance is fine, total run time is about the same. But memory use in a short run still peaked at about 180M.
Update 2009-01-21: It seems the answer may be in the number of threads - see my answer below.
EDIT: And I mean millions of page faults literally - in the region of 30M+.
EDIT: I have a 4G machine, so the 200M is not significant in that regard.

In response to a discussion in the comments to Ran's answer, here's a test case that proves that the JVM will release memory back to the OS under certain circumstances:
public class FreeTest
{
public static void main(String[] args) throws Exception
{
byte[][] blob = new byte[60][1024*1024];
for(int i=0; i<blob.length; i++)
{
Thread.sleep(500);
System.out.println("freeing block "+i);
blob[i] = null;
System.gc();
}
}
}
I see the JVM process' size decrease when the count reaches around 40, on both Java 1.4 and Java 6 JVMs (from Sun).
You can even tune the exact behaviour with the -XX:MaxHeapFreeRatio and -XX:MinHeapFreeRatio options -- some of the options on that page may also help with answering the original question.

I don't know about the page faults. but about the huge memory allocated for Java:
Sun's JVM only allocates memory, never deallocates it (until JVM death) deallocates memory only after a specific ratio between internal memory needs and allocated memory drops beneath a (tunable) value. The JVM starts with the amount specified in -Xms and can be extended up to the amount specified in -Xmx. I'm not sure what the defaults are. Whenever the JVM needs more memory (new objects / primitives / arrays) it allocates an entire chunk from the OS. However, when the need subsides (a momentary need, see 2 as well) it doesn't deallocates the memory back the the OS immediately, but keeps it to itself until that ratio has been reached. I was once told that JRockit behaves better, but I can't verify it.
Sun's JVM runs a full GC based on several triggers. One of them is the amount of available memory - when it falls down too much the JVM tries to perform a full GC to free some more. So, when more memory is allocated from the OS (momentary need) the chance for a full GC is lowered. This means that while you may see 30Mb of "live" objects, there might be a lot more "dead" objects (not reachable), just waiting for a GC to happen. I know yourkit has a great view called "dead objects" where you may see these "left-overs".
In "-server" mode, Sun's JVM runs GC in parallel mode (as opposed the older serial "stop the world" GC). This means that while there may be garbage to collect, it might not be collected immediately because of other threads taking all available CPU time. It will be collected before reaching out of memory (well, kinda. see http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html), if more memory can be allocated from the OS, it might be before the GC runs.
Combined, a large initial memory configuration and short bursts creating a lot of short-lived objects might create a scenario as described.
edit: changed "never deallcoates" to "only after ratio reached".

Excessive thread creation explains your problem perfectly:
Each Thread gets its own stack, which is separate from heap memory and therefore not registered by profilers
The default thread stack size is quite large, IIRC 256KB (at least it was for Java 1.3)
Tread stack memory is probably not reused, so if you create and destroy lots of threads, you'll get lots of page faults
If you ever really need to have hundreds of threads aound, the thread stack size can be configured via the -Xss command line parameter.

Garbage collection is a rather arcane science. As the state of the art develops, un-tuned behaviour will change in response.
Java 6 has different default GC behaviour and different "ergonomics" to earlier JVM versions. If you tell it that it can use more memory (either explicitly on the command line, or implicitly by failing to specify anything more explicit), it will use more memory if it believes that this is likely to improve performance.
In this case, Java 6 appears to believe that reserving the extra space which the heap could grow into will give it better performance - presumably because it believes that this will cause more objects to die in Eden space, and limit the number of objects promoted to the tenured generation space. And from the specifications of your hardware, the JVM doesn't think that this extra reserved heap space will cause any problems. Note that many (though not all) of the assumptions the JVM makes in reaching its conclusion are based on "typical" applications, rather than your specific application. It also makes assumptions based on your hardware and OS profile.
If the JVM has made the wrong assumptions, you can influence its behaviour through the command line, though it is easy to get things wrong...
Information about performance changes in java 6 can be found here.
There is a discussion about memory management and performance implications in the Memory Management White Paper.

Over the last few weeks I had cause to investigate and correct a problem with a thread pooling object (a pre-Java 6 multi-threaded execution pool), where is was launching far more threads than required. In the jobs in question there could be up to 200 unnecessary threads. And the threads were continually dying and new ones replacing them.
Having corrected that problem, I thought to run a test again, and now it seems the memory consumption is stable (though 20 or so MB higher than with older JVMs).
So my conclusion is that the spikes in memory were related to the number of threads running (several hundred). Unfortunately I don't have time to experiment.
If someone would like to experiment and answer this with their conclusions, I will accept that answer; otherwise I will accept this one (after the 2 day waiting period).
Also, the page fault rate is way down (by a factor of 10).
Also, the fixes to the thread pool corrected some contention issues.

Lots of memory allocated outside Java's heap after upgrading to Java 6u10? Can only be one thing:
Java6 u10 Release Notes: "New Direct3D Accelerated Rendering Pipeline (...) Enabled by Default"
Sun enabled Direct 3D accelerations by default in Java 6u10. This option creates lots of (temporary?) native memory buffers, which are allocated outside the Java Heap. Add the following vm argument to disable it again:
-Dsun.java2d.d3d=false
Note that this will NOT disable 2D hardware acceleration, just some features that can make use of 3D hardware acceleration. You will see that your Java heap usage will increase by up to 7MB, but that's a good trade-off because you'll save ~100MB(+) of this temporary volatile memory.
I did a fair amount of testing within 2 Swing desktop application, on two platforms:
a high-end Intel-i7 with nVidia GTX 260 graphics card,
a 3-year laptop with Intel graphics.
On both hardware platforms the option made practically zero subjective difference. (Tests included: scrolling tables, zooming graphical flowsheets, charts, etc.). On the few tests where something was subtly different, disabling d3d counter-intuitively increased performance. I suspect that memory management/bandwidth problems counteracted whatever benefits the d3d accelerated functions were supposed to achieve. (Your mileage may vary!)
If you need to do some performance tuning, here's an excellent reference (e.g. "Troubleshooting Java 2D")

Are you using the ConcMarkSweep collector? It can increase the amount of memory required for your application due to increased memory fragmentation, and "floating garbage" - objects that become unreachable only after the collector has examined them, and therefore are not collected until the next pass.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.