Why does Spring Boot WEB take to respond more faster?

Why does Spring Boot WEB take to respond more faster? - java

I usually use Spring Boot + JPA + Hibernate + Postgres.
At the end of the development of a WEB application I compile in Jar, then I run it directly with Java and then I do reverse proxy with Apache (httpd).
I have noticed that when starting there are no problems or latency, when accessing the website it works very quickly, but when several hours pass without anyone making a request to the server and then I want to access I must wait at least 20 seconds until the server responds, after this I can continue to access the site normally.
Why does this happen ?, It is as if Spring were in standby mode every time it detects that it has no load of requests, but I am not sure if it is so or is a problem. If it's some native spring functionality, how can I disable it?
Although I need to use a little more memory in idle state I want the answers to be fast regardless of whether it is loaded or not.

Without knowing more, it is likely that while your webapp is sitting idle, other programs on your server is using memory and cause the JVM memory to be swapped to disk.
When you then access the webapp again, the OS has to swap that JVM memory back into RAM, one page at a time. That takes time, but once the memory is back in RAM, your webapp will run normally.
Unfortunately, the way Java memory works, swapping JVM memory to disk is very bad for performance. That is an issue for most languages that rely on garbage collectors to free memory. Languages with manual memory management, e.g. C++ code, will usually not be hit as badly, when memory is swapped to disk, because memory use is more "focused" in those languages.
Solution: If my guess at the cause of your problem is correct, reconfigure your server so the JVM memory won't be swapped to disk.
Note that when I say server, I mean the physical machine. The "other programs", that your JVM is fighting for memory, might be running in different VMs, i.e. not in the same OS.

Related

Storing 1 MB byte array as session attribute

I am running a Java web app.
A user uploads a file (max 1 MB) and I would like to store that file until the user completes an entire process (which consists of multiple requests).
Is it ok to store the file as a byte array in the session until the user completes the entire process? Or is this expensive in terms of resources used?
The reason I am doing this is because I ultimately store the file on an external server (eg aws s3) but I only want to send it to that server if the whole process is completed.
Another option would be to just write the file to a temporary file on my server. However, this means I would need to remove the file in case the user exits the website. But it seems excessive for me to add code to the SessionDestroyed method in my SessionListener which removes the file if it’s just for this one particular case (ie: sessions are created throughout my entire application where I don’t need to check for temp files).
Thanks.

Maybe Yes, maybe No
Certainly it is reasonable to store such data in memory in a session if that fits your deployment constraints.
Remember that each user has their own session. So if all of your users have such a file in their session, then you must multiply to calculate the approximate impact on memory usage.
If you exceed the amount of memory available at runtime, there will be consequences. Your Servlet container may serialize less-used sessions to storage, which is a problem if you’ve not programmed all of your objects to support serialization. The JVM and OS may use a swap file to move contents out of real memory as part of the virtual memory system. That swapping may impact or even cripple performance.
You must consider your runtime deployment constraints, which you did not disclose. Are you running on a Raspberry Pi or inexpensive little cloud server with little memory available? Or will you run on an enterprise-class server with half a terabyte of RAM? Do you have 3 users, 300, or 30,000? You need to crunch the numbers and determine your needs, and maybe do some runtime profiling to see actual usage.
For example… I write web apps using the Vaadin Framework, a sophisticated package for creating desktop-style apps within a web browser. Being Servlet-based, Vaadin maintains a complete representation of each user’s entire work data on the server-side in the Servlet session. Multiplied by the number of users, and depending on the complexity of the app, this may require much memory. So I need to account for this and run my server on sufficient hardware with 64-bit Java tuned to run with a large amount of memory. Or take other approaches such load-balancing across multiple servers with sticky sessions.
Fortunately, RAM is quite cheap nowadays. And 64-bit hardware with large physical support for RAM modules, 64-bit operating systems, and 64-bit JVM implementations ( Azul, others ) are all readily available.

Does the application server affect Java memory usage?

Let's say I have a very large Java application that's deployed on Tomcat. Over the course of a few weeks, the server will run out of memory, application performance is degraded, and the server needs a restart.
Obviously the application has some memory leaks that need to be fixed.
My question is.. If the application were deployed to a different server, would there be any change in memory utilization?

Certainly the services offered by the application server might vary in their memory utilization, and if the server includes its own unique VM -- i.e., if you're using J9 or JRockit with one server and Oracle's JVM with another -- there are bound to be differences. One relevant area that does matter is class loading: some app servers have better behavior than others with regard to administration. Warm-starting the application after a configuration change can result in serious memory leaks due to class loading problems on some server/VM combinations.
But none of these are really going to help you with an application that leaks. It's the program using the memory, not the server, so changing the server isn't going to affect much of anything.

There will probably be a slight difference in memory utilisation, but only in as much as the footprint differs between servlet containers. There is also a slight chance that you've encountered a memory leak with the container - but this is doubtful.
The most likely issue is that your application has a memory leak - in any case, the cause is more important than a quick fix - what would you do if the 'new' container just happens to last an extra week etc? Moving the problem rarely solves it...
You need to start analysing the applications heap memory, to locate the source of the problem. If your application is crashing with an OOME, you can add this to the JVM arguments.
-XX:-HeapDumpOnOutOfMemoryError
If the performance is just degrading until you restart the container manually, you should get into the routine of triggering periodic heap dumps. A timeline of dumps is often the most help, as you can see which object stores just grow over time.
To do this, you'll need a heap analysis tool:
JHat or IBM Heap Analyser or whatever your preference :)
Also see this question:
Recommendations for a heap analysis tool for Java?
Update:
And this may help (for obvious reasons):
How do I analyze a .hprof file?

Tracking down a memory leak / garbage-collection issue in Java

This is a problem I have been trying to track down for a couple months now. I have a java app running that processes xml feeds and stores the result in a database. There have been intermittent resource problems that are very difficult to track down.
Background:
On the production box (where the problem is most noticeable), i do not have particularly good access to the box, and have been unable to get Jprofiler running. That box is a 64bit quad-core, 8gb machine running centos 5.2, tomcat6, and java 1.6.0.11. It starts with these java-opts
JAVA_OPTS="-server -Xmx5g -Xms4g -Xss256k -XX:MaxPermSize=256m -XX:+PrintGCDetails -
XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC -XX:+PrintTenuringDistribution -XX:+UseParNewGC"
The technology stack is the following:
Centos 64-bit 5.2
Java 6u11
Tomcat 6
Spring/WebMVC 2.5
Hibernate 3
Quartz 1.6.1
DBCP 1.2.1
Mysql 5.0.45
Ehcache 1.5.0
(and of course a host of other dependencies, notably the jakarta-commons libraries)
The closest I can get to reproducing the problem is a 32-bit machine with lower memory requirements. That I do have control over. I have probed it to death with JProfiler and fixed many performance problems (synchronization issues, precompiling/caching xpath queries, reducing the threadpool, and removing unnecessary hibernate pre-fetching, and overzealous "cache-warming" during processing).
In each case, the profiler showed these as taking up huge amounts of resources for one reason or another, and that these were no longer primary resource hogs once the changes went in.
The Problem:
The JVM seems to completely ignore the memory usage settings, fills all memory and becomes unresponsive. This is an issue for the customer facing end, who expects a regular poll (5 minute basis and 1-minute retry), as well for our operations teams, who are constantly notified that a box has become unresponsive and have to restart it. There is nothing else significant running on this box.
The problem appears to be garbage collection. We are using the ConcurrentMarkSweep (as noted above) collector because the original STW collector was causing JDBC timeouts and became increasingly slow. The logs show that as the memory usage increases, that is begins to throw cms failures, and kicks back to the original stop-the-world collector, which then seems to not properly collect.
However, running with jprofiler, the "Run GC" button seems to clean up the memory nicely rather than showing an increasing footprint, but since I can not connect jprofiler directly to the production box, and resolving proven hotspots doesnt seem to be working I am left with the voodoo of tuning Garbage Collection blind.
What I have tried:
Profiling and fixing hotspots.
Using STW, Parallel and CMS garbage collectors.
Running with min/max heap sizes at 1/2,2/4,4/5,6/6 increments.
Running with permgen space in 256M increments up to 1Gb.
Many combinations of the above.
I have also consulted the JVM [tuning reference](http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html) , but can't really find anything explaining this behavior or any examples of _which_ tuning parameters to use in a situation like this.
I have also (unsuccessfully) tried jprofiler in offline mode, connecting with jconsole, visualvm, but I can't seem to find anything that will interperet my gc log data.
Unfortunately, the problem also pops up sporadically, it seems to be unpredictable, it can run for days or even a week without having any problems, or it can fail 40 times in a day, and the only thing I can seem to catch consistently is that garbage collection is acting up.
Can anyone give any advice as to:
a) Why a JVM is using 8 physical gigs and 2 gb of swap space when it is configured to max out at less than 6.
b) A reference to GC tuning that actually explains or gives reasonable examples of when and what kind of setting to use the advanced collections with.
c) A reference to the most common java memory leaks (i understand unclaimed references, but I mean at the library/framework level, or something more inherenet in data structures, like hashmaps).
Thanks for any and all insight you can provide.
EDIT
Emil H:
1) Yes, my development cluster is a mirror of production data, down to the media server. The primary difference is the 32/64bit and the amount of RAM available, which I can't replicate very easily, but the code and queries and settings are identical.
2) There is some legacy code that relies on JaxB, but in reordering the jobs to try to avoid scheduling conflicts, I have that execution generally eliminated since it runs once a day. The primary parser uses XPath queries which call down to the java.xml.xpath package. This was the source of a few hotspots, for one the queries were not being pre-compiled, and two the references to them were in hardcoded strings. I created a threadsafe cache (hashmap) and factored the references to the xpath queries to be final static Strings, which lowered resource consumption significantly. The querying still is a large part of the processing, but it should be because that is the main responsibility of the application.
3) An additional note, the other primary consumer is image operations from JAI (reprocessing images from a feed). I am unfamiliar with java's graphic libraries, but from what I have found they are not particularly leaky.
(thanks for the answers so far, folks!)
UPDATE:
I was able to connect to the production instance with VisualVM, but it had disabled the GC visualization / run-GC option (though i could view it locally). The interesting thing: The heap allocation of the VM is obeying the JAVA_OPTS, and the actual allocated heap is sitting comfortably at 1-1.5 gigs, and doesnt seem to be leaking, but the box level monitoring still shows a leak pattern, but it is not reflected in the VM monitoring. There is nothing else running on this box, so I am stumped.

Well, I finally found the issue that was causing this, and I'm posting a detail answer in case someone else has these issues.
I tried jmap while the process was acting up, but this usually caused the jvm to hang further, and I would have to run it with --force. This resulted in heap dumps that seemed to be missing a lot of data, or at least missing the references between them. For analysis, I tried jhat, which presents a lot of data but not much in the way of how to interpret it. Secondly, I tried the eclipse-based memory analysis tool ( http://www.eclipse.org/mat/ ), which showed that the heap was mostly classes related to tomcat.
The issue was that jmap was not reporting the actual state of the application, and was only catching the classes on shutdown, which was mostly tomcat classes.
I tried a few more times, and noticed that there were some very high counts of model objects (actually 2-3x more than were marked public in the database).
Using this I analyzed the slow query logs, and a few unrelated performance problems. I tried extra-lazy loading ( http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html ), as well as replacing a few hibernate operations with direct jdbc queries (mostly where it was dealing with loading and operating on large collections -- the jdbc replacements just worked directly on the join tables), and replaced some other inefficient queries that mysql was logging.
These steps improved pieces of the frontend performance, but still did not address the issue of the leak, the app was still unstable and acting unpredictably.
Finally, I found the option: -XX:+HeapDumpOnOutOfMemoryError . This finally produced a very large (~6.5GB) hprof file that accurately showed the state of the application. Ironically, the file was so large that jhat could not anaylze it, even on a box with 16gb of ram. Fortunately, MAT was able to produce some nice looking graphs and showed some better data.
This time what stuck out was a single quartz thread was taking up 4.5GB of the 6GB of heap, and the majority of that was a hibernate StatefulPersistenceContext ( https://www.hibernate.org/hib_docs/v3/api/org/hibernate/engine/StatefulPersistenceContext.html ). This class is used by hibernate internally as its primary cache (i had disabled the second-level and query-caches backed by EHCache).
This class is used to enable most of the features of hibernate, so it can't be directly disabled (you can work around it directly, but spring doesn't support stateless session) , and i would be very surprised if this had such a major memory leak in a mature product. So why was it leaking now?
Well, it was a combination of things:
The quartz thread pool instantiates with certain things being threadLocal, spring was injecting a session factory in, that was creating a session at the start of the quartz threads lifecycle, which was then being reused to run the various quartz jobs that used the hibernate session. Hibernate then was caching in the session, which is its expected behavior.
The problem then is that the thread pool was never releasing the session, so hibernate was staying resident and maintaining the cache for the lifecycle of the session. Since this was using springs hibernate template support, there was no explicit use of the sessions (we are using a dao -> manager -> driver -> quartz-job hierarchy, the dao is injected with hibernate configs through spring, so the operations are done directly on the templates).
So the session was never being closed, hibernate was maintaining references to the cache objects, so they were never being garbage collected, so each time a new job ran it would just keep filling up the cache local to the thread, so there was not even any sharing between the different jobs. Also since this is a write-intensive job (very little reading), the cache was mostly wasted, so the objects kept getting created.
The solution: create a dao method that explicitly calls session.flush() and session.clear(), and invoke that method at the beginning of each job.
The app has been running for a few days now with no monitoring issues, memory errors or restarts.
Thanks for everyone's help on this, it was a pretty tricky bug to track down, as everything was doing exactly what it was supposed to, but in the end a 3 line method managed to fix all the problems.

Can you run the production box with JMX enabled?
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=<port>
...
Monitoring and Management Using JMX
And then attach with JConsole, VisualVM?
Is it ok to do a heap dump with jmap?
If yes you could then analyze the heap dump for leaks with JProfiler (you already have), jhat, VisualVM, Eclipse MAT. Also compare heap dumps that might help to find leaks/patterns.
And as you mentioned jakarta-commons. There is a problem when using the jakarta-commons-logging related to holding onto the classloader. For a good read on that check
A day in the life of a memory leak hunter (release(Classloader))

It seems like memory other than heap is leaking, you mention that heap is remaining stable. A classical candidate is permgen (permanent generation) which consists of 2 things: loaded class objects and interned strings. Since you report having connected with VisualVM you should be able to seem the amount of loaded classes, if there is a continues increase of the loaded classes (important, visualvm also shows the total amount of classes ever loaded, it's okay if this goes up but the amount of loaded classes should stabilize after a certain time).
If it does turn out to be a permgen leak then debugging gets trickier since tooling for permgen analysis is rather lacking in comparison to the heap. Your best bet is to start a small script on the server that repeatedly (every hour?) invokes:
jmap -permstat <pid> > somefile<timestamp>.txt
jmap with that parameter will generate an overview of loaded classes together with an estimate of their size in bytes, this report can help you identify if certain classes do not get unloaded. (note: with I mean the process id and should be some generated timestamp to distinguish the files)
Once you identified certain classes as being loaded and not unloaded you can figure out mentally where these might be generated, otherwise you can use jhat to analyze dumps generated with jmap -dump. I'll keep that for a future update should you need the info.

I would look for directly allocated ByteBuffer.
From the javadoc.
A direct byte buffer may be created by invoking the allocateDirect factory method of this class. The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers. The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious. It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system's native I/O operations. In general it is best to allocate direct buffers only when they yield a measureable gain in program performance.
Perhaps the Tomcat code uses this do to I/O; configure Tomcat to use a different connector.
Failing that you could have a thread that periodically executes System.gc(). "-XX:+ExplicitGCInvokesConcurrent" might be an interesting option to try.

Any JAXB? I find that JAXB is a perm space stuffer.
Also, I find that visualgc, now shipped with JDK 6, is a great way to see what's going on in memory. It shows the eden, generational, and perm spaces and the transient behavior of the GC beautifully. All you need is the PID of the process. Maybe that will help while you work on JProfile.
And what about the Spring tracing/logging aspects? Maybe you can write a simple aspect, apply it declaratively, and do a poor man's profiler that way.

"Unfortunately, the problem also pops up sporadically, it seems to be unpredictable, it can run for days or even a week without having any problems, or it can fail 40 times in a day, and the only thing I can seem to catch consistently is that garbage collection is acting up."
Sounds like, this is bound to a use case which is executed up to 40 times a day and then not anymore for days. I hope, you do not just track only the symptoms. This must be something, that you can narrow down by tracing the actions of the application's actors (users, jobs, services).
If this happens by XML imports, you should compare the XML data of the 40 crashes day with data, that is imported on a zero crash day. Maybe it's some sort of logical problem, that you do not find inside your code, only.

I had the same problem, with couple of differences..
My technology is the following:
grails 2.2.4
tomcat7
quartz-plugin 1.0
I use two datasources on my application. That is a
particularity determinant to bug causes..
Another thing to consider is that quartz-plugin, inject hibernate session in quartz threads, just like #liam says, and quartz threads still alive, untill I finish application.
My problem was a bug on grails ORM combined with the way the plugin handle session and my two datasources.
Quartz plugin had a listener to init and destroy hibernate sessions
public class SessionBinderJobListener extends JobListenerSupport {
public static final String NAME = "sessionBinderListener";
private PersistenceContextInterceptor persistenceInterceptor;
public String getName() {
return NAME;
}
public PersistenceContextInterceptor getPersistenceInterceptor() {
return persistenceInterceptor;
}
public void setPersistenceInterceptor(PersistenceContextInterceptor persistenceInterceptor) {
this.persistenceInterceptor = persistenceInterceptor;
}
public void jobToBeExecuted(JobExecutionContext context) {
if (persistenceInterceptor != null) {
persistenceInterceptor.init();
}
}
public void jobWasExecuted(JobExecutionContext context, JobExecutionException exception) {
if (persistenceInterceptor != null) {
persistenceInterceptor.flush();
persistenceInterceptor.destroy();
}
}
}
In my case, persistenceInterceptor instances AggregatePersistenceContextInterceptor, and it had a List of HibernatePersistenceContextInterceptor. One for each datasource.
Every opertion do with AggregatePersistenceContextInterceptor its passed to HibernatePersistence, without any modification or treatments.
When we calls init() on HibernatePersistenceContextInterceptor he increment the static variable below
private static ThreadLocal<Integer> nestingCount = new ThreadLocal<Integer>();
I don't know the pourpose of that static count. I just know he it's incremented two times, one per datasource, because of the AggregatePersistence implementation.
Until here I just explain the cenario.
The problem comes now...
When my quartz job finish, the plugin calls the listener to flush and destroy hibernate sessions, like you can see in source code of SessionBinderJobListener.
The flush occurs perfectly, but the destroy not, because HibernatePersistence, do one validation before close hibernate session... It examines nestingCount to see if the value is grather than 1. If the answer is yes, he not close the session.
Simplifying what was did by Hibernate:
if(--nestingCount.getValue() > 0)
do nothing;
else
close the session;
That's the base of my memory leak..
Quartz threads still alive with all objects used in session, because grails ORM not close session, because of a bug caused because I have two datasources.
To solve that, I customize the listener, to call clear before destroy, and call destroy two times, (one for each datasource). Ensuring my session was clear and destroyed, and if the destroy fails, he was clear at least.

CPU Usage Spikes in WebSphere 6.1

First, just a bit of background:
One of our customers is experiencing CPU usage spikes for WebSphere instances running one of our web apps (other instances with other apps are fine). They have a test environment and a live environment (both iSeries) which both experience the problem - with a single app per instance setup. We have deployed this application locally in our own test environments and also for many other customers all on iSeries with no similar problems.
What's actually happening:
Every one second or so, the CPU usage for the WebSphere process' CPU usage jumps to anywhere from 7%-20% even though there are no requests being processed at the time. Customer has reported seeing spikes as high as 30%. These spikes average out to be 1.5% of CPU overall - the other WebSphere instances typically use 0%-0.1% when idle.
My investigations so far
So, I had a look at the threads. One thread in there test environment was using ~350 CPU cycles per second. A similar thread in their live environment was using ~1500 CPU cycles per second (showing that it has bigger CPU). The call stack for these threads looks like
Type Program Statement Procedure
QLESPI QSYS 17 LE_Create_Thread2__FP12crtt >
QJVALIBJVM QSYS 7 startThread__FPv
J com/ibm/ws/util/Threa > run
J com/ibm/ws/util/Threa > run
J com/ibm/ws/util/Threa > getTask
J com/ibm/ws/util/Bound > poll
The entire class name from the bottom line is com/ibm/ws/util/BoundedBuffer. I asked the customer to do a JVM Dump for me - the only additional information I got from this was the thread name:
Thread: 00002F82 Deferrable Alarm : 11
Now for my questions:
Can any of you identify the problem, given these symptoms? (Maybe that's a long shot!)
What is Deferrable Alarm? From the JVM Dump, I can see 4 threads with this name. The other three seem to be doing just fine. By debugging my local WebSphere (on Windows) and adding breakpoints in the BoundedBuffer class, I see that BoudedBuffers are polling and periodically invoking some listener.
I don't have access to the WebSphere console for the customer machines, and they aren't owning up to having made any config changes. I can ask them to check the console for me though - what should I be asking them to look at?
I have telnet access to the customer boxes, is there anything else I can investigate here? Looking at the WebSphere profile files, etc? Which files should I be looking at?
Because the Call Stack and JVM Dump don't explicitly reference our code, is it safe to assume that this is a configuration problem?
It's been a long question, so thanks for reading this far.
30 April Update (1)
This morning I've noticed that this behaviour only happens after the first request of the day has been processed (irrespective of which Web Service is invoked). This points the finger back at our application or Apache Axis. Could it be that this is just normal behaviour?!
30 April Update (2)
So it seems that this CPU activity is some kind of housekeeping activity for the web-container or maybe something within Apache Axis. I've now observed this happening on a few different web-applications on a few different servers. Applications with no web component don't suffer the same additional CPU overhead.
I'd imagine if it is housekeeping work, that "tuning" it somehow could be counter productive - by that, I mean that making the App Server idle better would probably negatively affect the amount of "real" work it can do.

You could try to profile and do heap dumps of the application, that could answer a few questions related to memory and cpu usage.

I would recommend following the must gather documentation provided by IBM, and raising a PMR along with your own investigation. Things you might suspect:
Garbage collection (unlikely on low application utilization)
Timers or tasks (such as java.util.Timer or commonj work manager)
Pretest connection that has a complex SQL query (in the DataSource's WebSphere Application Server data source properties)
I would also recommend using the profiler to determine the cause, YourKit profiler is a pretty decent one.

Very instinctively (being unfamiliar with iSeries platforms) I would look at disk IO related issues. Can you describe the disk subsystem? Can you see if your app is spending an unusually large amount of time in iowait?

I know this doesn't quite match your problem. But might be worth a look if your running prior to WAS 6.1 patch 17.
http://www-01.ibm.com/support/docview.wss?uid=swg24018437
Hope this helps. Cheers John

My best guess is that it is some type of monitoring is being done on the instance, like Tivioli etc. Have you ruled out any GC activity?
HTH Tom

Most application servers are implemented in java itself and so is WebSphere. This servers apart from serving client requests have to do other periodical jobs like say resource pool management. Performing this jobs will create some temporary objects that needs to be garbage collected.
Depending on how much heap you have allocated, usage and garbage collector settings, garbage collector will be invoked. I'd say try to see if it is garbage collector thread that is taking up your CPU. For this connect jconsole utility to remote websphere process for a day and see if there is any co-relation between heap usage and cpu usage.

I am also experiencing this very same issue, [Deferrable Alarm:x] using with BoundedBuffer. The only difference I have is that this is on a Windows 7 64bit machine. There is absolutely no Tivioli or other batch process running, no requests being made, the single instance is just idle.
I can run the application in DEBUG mode and pause the Deferrable Alarm thread and the CPU spikes stop, resume and they start again.
I've checked disk activity, network activity and their is nothing happening there.
I am running WebSphere 6.1.0.27 .

Java Application Server Performance

I've got a somewhat dated Java EE application running on Sun Application Server 8.1 (aka SJSAS, precursor to Glassfish). With 500+ simultaneous users the application becomes unacceptably slow and I'm trying to assist in identifying where most of the execution time is spent and what can be done to speed it up. So far, we've been experimenting and measuring with LoadRunner, the app server logs, Oracle statpack, snoop, adjusting the app server acceptor and session (worker) threads, adjusting Hibernate batch size and join fetch use, etc but after some initial gains we're struggling to improve matters more.
Ok, with that introduction to the problem, here's the real question: If you had a slow Java EE application running on a box whose CPU and memory use never went above 20% and while running with 500+ users you showed two things: 1) that requesting even static files within the same app server JVM process was exceedingly slow, and 2) that requesting a static file outside of the app server JVM process but on the same box was fast, what would you investigate?
My thoughts initially jumped to the application server threads, both acceptor and session threads, thinking that even requests for static files were being queued, waiting for an available thread, and if the CPU/memory weren't really taxed then more threads were in order. But then we upped both the acceptor and session threads substantially and there was no improvement.
Clarification Edits:
1) Static files should be served by a web server rather than an app server. I am using the fact that in our case this (unfortunately) is not the configuration so that I can see the app server performance for files that it doesn't execute -- therefore excluding any database performance costs, etc.
2) I don't think there is a proxy between the requesters and the app server but even if there was it doesn't seem to be overloaded because static files requested from the same application server machine but outside of the application's JVM instance return immediately.
3) The JVM heap size (Xmx) is set to 1GB.
Thanks for any help!

SunONE itself is a pain in the ass. I have a very same problem, and you know what? A simple redeploy of the same application to Weblogic reduced the memory consumption and CPU consumption by about 30%.
SunONE is a reference implementation server, and shouldn't be used for production (don't know about Glassfish).
I know, this answer doesn't really helps, but I've noticed considerable pauses even in a very simple operations, such as getting a bean instance from a pool.
May be, trying to deploy JBoss or Weblogic on the same machine would give you a hint?
P.S. You shouldn't serve static content from under application server (though I do it too sometimes, when CPU is abundant).
P.P.S. 500 concurrent users is quite high a load, I'd definetely put SunONE behind a caching proxy or Apache which serves static content.

After using a Sun performance monitoring tool we found that the garbage collector was running every couple seconds and that only about 100MB out of the 1GB heap was being used. So we tried adding the following JVM options and, so far, this new configuration as greatly improved performance.
-XX:+DisableExplicitGC -XX:+AggressiveHeap
See http://java.sun.com/docs/performance/appserver/AppServerPerfFaq.html
Our lesson: don't leave JVM option tuning and garbage collection adjustments to the end. If you're having performance trouble, look at these settings early in your troubleshooting process.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.