Google App Engine - Cloud Console Stackdriver trace details

Google App Engine - Cloud Console Stackdriver trace details - java

I'm trying to better understand the way Google's Cloud Console Stackdriver Trace shows call details and to debug some performance issues for my app.
Most requests work heavily with memcache set/get operations and I'm having some issues here, but what I don't understand is why there's a long time gap between calls. I have uploaded 2 screenshots.
So, as you can see, the call #1025ms took 2ms, but there's more than 4 seconds between it and the urlfetch call #5235ms.
First of all, my code is not intensive at that point (and the full requests shows about 9000ms of untraced time), and second, most similar requests that run the same code do not have these gaps (ie. repeating the request doesn't have the same behavior). But I also see this issue on other requests as well and I cannot reproduce them.
Please advise!
EDIT:
I have uploaded another screenshot from appstats. It is a "normal" request that usually takes a few hundred ms to run (max 1s), and also in localhost (development). I cannot manage to find anything to take the debug further. I feel like I am missing something simple, something at base level, regarding to the DOs and DO NOTs of app engine.

I'm aware of the following common causes of such gaps ("untraced time"):
The request is actually CPU-bound during these gaps.
To check for this issue, go to the logs viewer
and view the details of the affected incoming HTTP request. Note that
there's also a convenient direct link from the trace details to the log
entry. In the request log entry, look for the cpu_ms field, which states the
CPU milliseconds required to fulfill the request. This is the number of milliseconds spent by the CPU actually executing your application code, expressed in terms of a baseline 1.2 GHz Intel x86 CPU. If the CPU actually used is faster than the baseline, the CPU milliseconds can be larger than the actual clock time [..]. (doc).
This metric is also available in protoPayload.megaCycles.
Here's an example log entry of a slow request with substantial untraced time:
2001:... - - [02/Mar/2017:19:20:22 +0100] "GET / HTTP/1.1" 200 660 - "Mozilla/5.0 ..." "example.com" ms=4966 **cpu_ms=11927** cpm_usd=7.376e-8 loading_request=1 instance=... app_engine_release=1.9.48 trace_id=...
The cpu_ms field is unusally high (11927) for this example request, and indicates that most of the untraced time was spent in the application itself (or the runtime).
Why is request handler using that much CPU? Typically, it's next to impossible to tell exactly where the CPU time was spent, but if you know what is supposed to happen in a given request, you can narrow it down more easily. Two common causes are:
It's the very first request to a newly started App Engine instance. The JVM needs to load classes and JIT-compile hot methods - this is expected to significantly impact the first request (and potentially a few more). Look for loading_request=1 in the request log entry to check if your request was slow because of this. Consider Configuring Warmup Requests to Improve Performance.
Protip, in case you want to focus your investigation filter out such loading requests in the logs viewer, apply this advanced filter:
protoPayload.megaCycles > 10000 and protoPayload.wasLoadingRequest=false
Some parts of the application code are massively slowed down by excessive use of reflection. This is specific to the App Engine Standard Environment where a security manager restricts the usage of reflection. Only mitigation is to use less reflection. Note that the App Engine serving infrastructure is constantly evolving, so this hint may be hopefully outdated sooner than later.
If the issue is reproducible locally in the dev appserver, you can use a profiler (or maybe just jstack) to narrow it down. In some other cases, I literally had to incrementally bisect the application code, add more log statements, redeploy, etc., until the offending code was identified.
There are actually untraced calls to backends that are not covered out of the box by Stackdriver Trace in the App Engine Standard Environment. The only example I'm aware of so far is Cloud SQL. Consider using Google Cloud Trace for JDBC to get interactions with Cloud SQL traced, too.
The application is multithreaded (great!) and experiences some self-inflicted synchronisation issues. Examples I've seen in the wild:
Application-specific synchronization forces all requests to the storage backend to be serialized (for a given App Engine instance). Nothing sticks out in the traces, except those mysterious gaps...
The application uses a database connection pool. The number of parallel requests exceeds the capacity of the pool (for a given App Engine instance), some requests have to wait until a connection becomes available. This is a more sophisticated variation of the previous item.

Given that this is occurring infrequently and that the actual processing times (indicated by the span lengths) are short, my suspicion is that some kind of App Engine scaling action is occurring in the background. For example, the slowdown may be caused by a new instance being added to your application. You can dig into this further by looking at the activity graph on the App Engine dashboard or by using AppStats (see this SO post).
Showing App Engine events in the trace timeline view is something that we've been wanting to do for a while, as it would dramatically shorten the analysis process for situations like this.

Related

How to run very long request that uses high memory in tomcat?

I have a tomcat server.
In the tomcat server, I handle some restful request which calls to very high memory usage server which can last 15 minutes and finally can crash the tomcat.
How can I run this request:
1. without crash the tomcat?
2. without exceed the limit of 3 min on restful requests?
Thank you.

Try another architectural approach.
REST is designed to be statusless, so you have to introduce status.
I suggest you implement ...
the long running task as batch in the background (as
#kamran-ghiasvand suggests).
a submit request that starts the batch and returns a unique ID
a status request that reports the status of the task (auto refresh
the screen every 5s e.g.). You can do that on html/page basis or as
ajax.
To give you an idea what you'll might need on the backend, I quote our PaymentService interface below.
public interface PaymentService {
PnExecution createPaymentExecution(List<Period> Periods, Date calculationDate) throws PnValidationException;
Long createPaymentExecutionAsync(List<Period> Periods, Date calculationDate);
PnExecution simulatePaymentExecution(Period Period, Date calculationDate) throws PnValidationException;
Void deletePaymentExecution(Long pnExecutionId, AsyncTaskListener<?, ?> listener);
Long deletePaymentExecutionAsync(Long pnExecutionId);
void removePaymentNotificationFromPaymentExecution(Long pnExecutionId, Pn paymentNotification);
}
About performance:
Try to find the memory consumers, and try to sequentialize the problem, cut it into steps. Make sure you have not created memory leaks by keeping unused objects referenced. Last resort would be concurrence (of independent tasks) or parallel (processing of similar tasks). But most of these problems are the result of a too straight-forward architectural approach.

Crashing tomcat server has nothing to do with request processing time, though, it might occur due to JVM heap memory overflow (or thousands of other reasons). You should make sure about reason of crash by investigating tomcat logs carefully. If its reason is lack of memory, you can allocate more memory to JVM upon starting tomcat using '-Xmx' flag. For example, you can add the following line in your setenv.sh for allocating 2GB of ram to tomcat:
CATALINA_OPTS="-Xmx2048m"
In terms of request timeout, also there are many reasons that play role here. For example, connectionTimeout of your http connector (see server.xml), network or browser or web client limitations and many other reasons.
Generally speaking, its very bad practice to make such long request synchronously via restful request. I suggest that you consider another workarounds such as websocket or push notification for announcing user that his time-consuming request is completed on server side.

Basically what you are asking boils down to this:
For some task running on Tomcat, that I have not told you anything about, how do I make it run faster, use less memory and not crash.
In the general case, you need to analyze your code to work out why it is taking so long and using so much memory. Then you need to modify or rewrite it as required to reduce memory utilization and improve its efficiency.
I don't think we can offer sensible advice into how to make the request faster, etc without more details. For example, the advice that some people have been offering to split the request into smaller requests, or perform the large request asynchronously won't necessarily help. You should not try these ideas without first understanding what the real problem is.
It is also possible that your task is taking too long and crashing Tomcat for a specific reason:
It is possible that the request's use of (too much) memory is actually causing the requests to take too long. If a JVM is running out of heap memory, it will spend more and more time running the GC. Ultimately it will fail with an OutOfMemoryError.
The excessive memory use could be related to the size of the task that the request is performing.
The excessive memory use could be caused by a bug (memory leak) in your code or some 3rd party library that you are using.
Depending on the above, the problem could be solved by:
increasing Tomcat's heapsize,
fixing the memory leak, or
limiting the size of the "problem" that the request is trying to solve.
It is also possible that you just have a bug in your code; e.g. an infinite loop.
In summary, you have not provided enough information to allow a proper diagnosis. The best we can do is suggest possible causes. Guesses, really.

Jetty server unexpectedly trades cpu to memory and vise versa

I have a rest service based on Spark Java 2.5, which uses Jetty server under the hood.
My problem is that it doesn't work on a constant performance and suddenly "decides to trade" cpu to memory and vise versa some time later.
Plots are created using Java melody.
As you can see - at about 18:00 performance plots abruptly changed. Memory consumption began to grow and processor load goes down. At the same time, request latencies didn't change as well as request per|second and request types. Additional parameters also changed - especially used buffered memory and number of opened files.
A week later everything will change back and maybe two weeks or a month later the cycle repeats, I saw this cycle for the last three months.
I tried to use profiler, but didn't find anything useful.
I'm pretty sure such change is not provoked by business logic, because nothing is changed in user's interaction with web server and no background tasks are active, so probably it's jetty's or java's internals or misconfiguration.
Server runs on a Java 8 in a Docker container in AWS EC2 (we use AWS ECS for docker autoscaling). There is about 50 requests per second load. The api itself uses spring and hibernate with postgresql 9.4 driver. For hibernate second level ehcache is used. Some of api requests are multipart with size about 100kb and they are uniformly distributed on a request timeline. Java is started with parameters: -server -Xmx6000m -XX:+UseG1GC. If more details needed, please ask me.
What I want is a constant performance. If my problem resonates with your experience, please reply.
ps: on ~23:30 change is not related to the problem, so don't analyze it.

Finally the solution was simple. Increasing -Xms to 500mb helped, after it behavior became stationary.

Managing Google App Engine java instances with quite heavy load to avoid 500 errors

We have a Google App Engine Java app with 50 - 120 req/s depending on the hour of the day.
Our frontend appengine-web.xml is like that :
<instance-class>F1</instance-class>
<automatic-scaling>
<min-idle-instances>3</min-idle-instances>
<max-idle-instances>3</max-idle-instances>
<min-pending-latency>300ms</min-pending-latency>
<max-pending-latency>1.0s</max-pending-latency>
<max-concurrent-requests>100</max-concurrent-requests>
</automatic-scaling>
Usually 1 frontend instance manages to handle around 20 req/s. Start up time is around 15s.
I have a few questions :
When I change the frontend Default version, I get thousands of Error 500 - Request was aborted after waiting too long to attempt to service your request.
So, to avoid that, I switch from one version to the other using the Traffic splitting feature by IP address, going from 1 to 100% by steps of 5%, it takes around 5 minutes to do it properly and avoid massive 500 errors. Moreover, that feature seems only available for the default frontend module.
-> Is there a better way to switch versions ?
To avoid thousands of Error 500 - Request was aborted after waiting too long to attempt to service your request., We must use at least 3 Resident (min-idle) instances. And as our traffic grows, even with 3, we sometimes still get massive Error 500. Am I supposed to go to 4 residents? I thought App Engine was nice because you only pay for the instances you use, so if in order to work properly we need at least half our running instances in Idle mode, that's not great, is it? It's not really cost effective as when the load is low, still having 4 idle instances is a big waste :( What's weird is that they seem to wait only 10s before responding 500 : pending_ms=10248
-> Do you have advices to avoid that ?
Quite often, we also get thousands of Error 500 - A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104). I don't understand, there aren't any exceptions, and we get hundreds of them for a few seconds.
-> Do you have advices to avoid that ?
Thanks a lot in advance for your help ! ;)

Those error messages are mostly related to loading requests which last too long in being loaded and therefore they finish in something similar to a DeadlineExceededException, which affects dramatically the performance and users experience as you probably already know.
This is a very often issue specially when using DI frameworks with google app engine, and so far it´s an unavoidable and serious unfixed issue when using automatic scaling, which is the scaling policy that App Engine provides for hanndling public requests since its inception.
Try changing the Frontend Instance Class to F2, specially if your memory consumption is higher than 128MB per instance, and set to 15s the min-max pending latency so your requests will get more chances to be processed by a resident instance. However you will get still long response times for some requests, since Google App Engine may not issue a warmup request every time your application needs a new instance, and I undestand that F4 would break the bank.

What could cause global Tomcat/JVM slowdown?

I'm experiencing a strange but severe problem running several (about 15) instances of a Java EE-ish web applications (Hibernate 4+Spring+Quartz+JSF+Facelets+Richfaces) on Tomcat 7/Java 7.
The system runs just fine, but after a greatly variyng amount of time all instances of the application at the same time suddenly suffer from rising response times. Basically the application still works, but the response times are about three times higher.
This are two diagrams displaying the response time of two certain short workflows/actions (log in, access list of seminars, ajax-refresh this list, log out; the lower line is just the request time for the ajax refresh) of two example instances of the application:
As you can see both instances of the application "explode" at the exact same time and stay slow. After restarting the server everything's back to normal. All the instances of the application "explode" simultaneously.
We're storing the session data to a database and use this for clustering. We checked session size and number and both are rather low (meaning that on other servers with other applications we sometimes have larger and more sessions). The other Tomcat in the cluster usually stays fast for some more hours and after this random-ish amount of time it also "dies". We checked the heap sizes with jconsole and the main heap stays between 2.5 and 1 GB size, db connection pool is basically full of free connections, as well as the thread pools. Max heap size is 5 GB, there's also plenty of perm gen space available. The load is not especially high; there's just about 5% load on the main CPU. The server does not swap. It's also no hardware issue as we additionally deployed the applications to a VM where the problems remain the same.
I don't know where to look anymore, I am out of ideas. Has someone an idea where to look?
2013-02-21 Update: New Data!
I added two more timing traces to the application. As for the measurement: the monitoring system calls a servlet that performs two tasks, measures execution time for each on the server and writes the time taken as response. These values are logged by the monitoring system.
I have several interesting new facts: a hot redeployment of the application causes this single instance on the current Tomcat to go nuts. This also seems to affect raw CPU calculation performance (see below). This individual-context-explosion is different from the overall-context-explosion that occurs randomly.
Now for some data:
First the individual lines:
Light blue is total execution time of a small workflow (details see above), measured on the client
Red is "part" of light blue and is the time taken to perform a special step of that workflow, measured on the client
Dark blue is measured in the application and consists of reading a list of entities from the DB through Hibernate and iterating over that list, fetching lazy collections and lazy entities.
Green is a small CPU benchmark using floating point and integer operations. As far as I see no object allocation, so no garbage.
Now for the individual stages of explosion: I marked each image with three black dots. The first one is a "small" explostion in more or less only one application instance - in Inst1 it jumps (especially visible in the red line), while Inst2 below more or less stays calm.
After this small explosion the "big bang" occurs and all application instances on that Tomcat explode (2nd dot). Note that this explosion affects all high level operations (request processing, DB access), but not the CPU benchmark. It stays low in both systems.
After that I hot-redeployed Inst1 by touching the context.xml file. As I said earlier this instance goes from exploded to completely devestated now (the light blue line is out of the chart - it is at about 18 secs). Note how a) this redeployment does not affect Inst2 at all and b) how the raw DB access of Inst1 is also not affected - but how the CPU suddenly seems to have become slower!. This is crazy, I say.
Update of update
The leak prevention listener of Tomcat does not whine about stale ThreadLocals or Threads when the application is undeployed. There obviously seems to be some cleanup problem (which is I assume not directly related to the Big Bang), but Tomcat doesn't have a hint for me.
2013-02-25 Update: Application Environment and Quartz Schedule
The application environment is not very sophisticated. Network components aside (I don't know enough about those) there's basically one application server (Linux) and two database servers (MySQL 5 and MSSQL 2008). The main load is on the MSSQL server, the other one merely serves as a place to store the sessions.
The application server runs an Apache as a load balancer between two Tomcats. So we have two JVMs running on the same hardware (two Tomcat instances). We use this configuration not to actually balance load as the application server is capable of running the application just fine (which it did for years now) but to enable small application updates without downtime. The web application in question is deployed as separate contexts for different customers, about 15 contexts per Tomcat. (I seemm to have mixed up "instances" and "contexts" in my posting - here in the office they're often used synonymously and we usually magically know what the colleague is talking about. My bad, I'm really sorry.)
To clarify the situation with better wording: the diagrams I posted show response times of two different contexts of the same application on the same JVM. The Big Bang affects all contexts on one JVM but doesn't happen on the other one (the order in which the Tomcats explode is random btw). After hot-redeployment one context on one Tomcat instance goes nuts (with all the funny side effects, like seemingly slower CPU for that context).
The overall load on the system is rather low. It's an internal core business related software with about 30 active users simultaneously. Application specific requests (server touches) are currently at about 130 per minute. The number of single requests are low but the requests itself often require several hundred selects to the database, so they're rather expensive. But usually everything's perfectly acceptable. The application also does not create large infinite caches - some lookup data is cached, but only for a short amount of time.
Above I wrote that the servers where capable of running the application just fine for several years. I know that the best way to find the problem would be to find out exactly when things went wrong for the first time and see what has been changed in this timeframe (in the application itself, the associated libraries or infrastructure), however the problem is that we don't know when the problems first occured. Just let's call that suboptimal (in the sense of absent) application monitoring... :-/
We ruled out some aspects, but the application has been updated several times during the last months and thus we e.g. cannot simply deploy an older version. The largest update that wasn't feature change was a switch from JSP to Facelets. But still, "something" must be the cause of all the problems, yet I have no idea why Facelets for instance should influence pure DB query times.
Quartz
As for the Quartz schedule: there's a total of 8 jobs. Most of them run only once per day and have to do with large volume data synchronization (absolutely not "large" as in "big data large"; it's just more than the averate user sees through his usual daily work). However, those jobs of course run at night and the problems occur during daytime. I omit a detailled job listing here (if beneficial I can provide more details of course). The jobs' source code has not been altered during the last months. I already checked whether the explosions align with the jobs - yet the results are inconclusive at best. I'd actually say that they don't align, but as there are several jobs that run every minute I can't rule it out just yet. The acutal jobs that run every minute are pretty low-weight in my opinion, they usually check if data is available (in different sources, DB, external systems, email account) and if so write it to the DB or push it to another system.
However I'm currently enabling logging of indivdual job execution so that I can exactly see start and end timestamp of each single job execution. Perhaps this provides more insight.
2013-02-28 Update: JSF Phases and Timing
I manually added a JSF phae listener to the application. I executed a sample call (the ajax refresh) and this is what I've got (left: normal running Tomcat instance, right: Tomcat instance after Big Bang - the numbers have been taken almost simultaneously from both Tomcats and are in milliseconds):
RESTORE_VIEW: 17 vs 46
APPLY_REQUEST_VALUES: 170 vs 486
PROCESS_VALIDATIONS: 78 vs 321
UPDATE_MODEL_VALUES: 75 vs 307
RENDER_RESPONSE: 1059 vs 4162
The ajax refresh itself belongs to a search form and its search result. There's also another delay between the application's outmost request filter and web flow starts its work: there's a FlowExecutionListenerAdapter that measures time taken in certain phases of web flow. This listener reports 1405 ms for "Request submitted" (which is as far as I know the first web flow event) out of a total of 1632 ms for the complete request on an un-exploded Tomcat, thus I estimate about 200ms overhead.
But on the exploded Tomcat it reports 5332 ms for request submitted (meaning all JSF phases happen in those 5 seconds) out of a total request duration of 7105ms, thus we're up to almost 2 seconds overhead for everything outside of web flow's request submitted.
Below my measurement filter the filter chain contains a org.ajax4jsf.webapp.BaseFilter, then the Spring servlet is called.
2013-06-05 Update: All the stuff going on in the last weeks
A small and rather late update... the application performance still sucks after some time and the behaviour remains erratic. Profiling did not help much yet, it just generated an enormous amount of data that's hard to dissect. (Try poking around in performance data on or profile a production system... sigh) We conducted several tests (ripping out certain parts of the software, undeploying other applications etc.) and actually had some improvements that affect the whole application. The default flush mode of our EntityManager is AUTO and during view rendering lots of fetches and selects are issued, always including the check whether flushing is neccesary.
So we built a JSF phase listener that sets the flush mode to COMMIT during RENDER_RESPONSE. This improved overall performance a lot and seems to have mitigated the problems somewhat.
Yet, our application monitoring keeps yielding completely insane results and performance on some contexts on some tomcat instances. Like an action that should finish in under a second (and that actually does it after deployment) and that now takes more than four seconds. (These numbers are supported by manual timing in the browsers, so it's not the monitoring that causes the problems).
See the following picture for example:
This diagram shows two tomcat instances running the same context (meaning same db, same configuration, same jar). Again the blue line is the amount of time taken by pure DB read operations (fetch a list of entities, iterate over them, lazily fetch collections and associated data). The turquoise-ish and red line are measured by rendering several views and doing an ajax refresh, respectively. The data rendered by two of the requests in turquoise-ish and red is mostly the same as is queried for the blue line.
Now around 0700 on instance 1 (right) there's this huge increase in pure DB time which seems to affect actual render response times as well, but only on tomcat 1. Tomcat 0 is largely unaffected by this, so it cannot be caused by the DB server or network with both tomcats running on the same physical hardware. It has to be a software problem in the Java domain.
During my last tests I found out something interesting: All responses contain the header "X-Powered-By: JSF/1.2, JSF/1.2". Some (the redirect responses produced by WebFlow) even have "JSF/1.2" three times in there.
I traced down the code parts that set those headers and the first time this header is set it's caused by this stack:
... at org.ajax4jsf.webapp.FilterServletResponseWrapper.addHeader(FilterServletResponseWrapper.java:384)
at com.sun.faces.context.ExternalContextImpl.<init>(ExternalContextImpl.java:131)
at com.sun.faces.context.FacesContextFactoryImpl.getFacesContext(FacesContextFactoryImpl.java:108)
at org.springframework.faces.webflow.FlowFacesContext.newInstance(FlowFacesContext.java:81)
at org.springframework.faces.webflow.FlowFacesContextLifecycleListener.requestSubmitted(FlowFacesContextLifecycleListener.java:37)
at org.springframework.webflow.engine.impl.FlowExecutionListeners.fireRequestSubmitted(FlowExecutionListeners.java:89)
at org.springframework.webflow.engine.impl.FlowExecutionImpl.resume(FlowExecutionImpl.java:255)
at org.springframework.webflow.executor.FlowExecutorImpl.resumeExecution(FlowExecutorImpl.java:169)
at org.springframework.webflow.mvc.servlet.FlowHandlerAdapter.handle(FlowHandlerAdapter.java:183)
at org.springframework.webflow.mvc.servlet.FlowController.handleRequest(FlowController.java:174)
at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:925)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:856)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:920)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:827)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
... several thousands ;) more
The second time this header is set by
at org.ajax4jsf.webapp.FilterServletResponseWrapper.addHeader(FilterServletResponseWrapper.java:384)
at com.sun.faces.context.ExternalContextImpl.<init>(ExternalContextImpl.java:131)
at com.sun.faces.context.FacesContextFactoryImpl.getFacesContext(FacesContextFactoryImpl.java:108)
at org.springframework.faces.webflow.FacesContextHelper.getFacesContext(FacesContextHelper.java:46)
at org.springframework.faces.richfaces.RichFacesAjaxHandler.isAjaxRequestInternal(RichFacesAjaxHandler.java:55)
at org.springframework.js.ajax.AbstractAjaxHandler.isAjaxRequest(AbstractAjaxHandler.java:19)
at org.springframework.webflow.mvc.servlet.FlowHandlerAdapter.createServletExternalContext(FlowHandlerAdapter.java:216)
at org.springframework.webflow.mvc.servlet.FlowHandlerAdapter.handle(FlowHandlerAdapter.java:182)
at org.springframework.webflow.mvc.servlet.FlowController.handleRequest(FlowController.java:174)
at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:925)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:856)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:920)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:827)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
I have no idea if this could indicate a problem, but I did not notice this with other applications that are running on any of our servers, so this might as well provide some hints. I really have no idea what that framework code is doing (admittedly I did not dive into it yet)... perhaps someone has an idea? Or am I running into a dead end?
Appendix
My CPU benchmark code consists of a loop that calculates Math.tan and uses the result value to modify some fields on the servlet instance (no volatile/synchronized there), and secondly performs several raw integer calcualations. This is not severly sophisticated, I know, but well... it seems to show something in the charts, however I am not sure what it shows. I do the field updates to prevent HotSpot from optimizing away all my precious code ;)
long time2 = System.nanoTime();
for (int i = 0; i < 5000000; i++) {
double tan = Math.tan(i);
if (tan < 0) {
this.l1++;
} else {
this.l2++;
}
}
for (int i = 1; i < 7500; i++) {
int n = i;
while (n != 1) {
this.steps++;
if (n % 2 == 0) {
n /= 2;
} else {
n = n * 3 + 1;
}
}
}
// This execution time is written to the client.
time2 = System.nanoTime() - time2;

Solution
Increase the maximum size of the Code Cache:
-XX:ReservedCodeCacheSize=256m
Background
We are using ColdFusion 10 which runs on Tomcat 7 and Java 1.7.0_15. Our symptoms were similar to yours. Occasionally the response times and the CPU usage on the server would go up by a lot for no apparent reason. It seemed as if the CPU got slower. The only solution was to restart ColdFusion (and Tomcat).
Initial analysis
I started by looking at the memory usage and the garbage collector log. There was nothing there that could explain our problems.
My next step was to schedule a heap dump every hour and to regularly perform sampling using VisualVM. The goal was to get data from before and after a slowdown so that it could be compared. I managed to get accomplish that.
There was one function in the sampling that stood out: get() in coldfusion.runtime.ConcurrentReferenceHashMap. A lot of time was spent in it after the slowdown compared to very little before. I spent some time on understanding how the function worked and developed a theory that maybe there was a problem with the hash function resulting in some huge buckets. Using the heap dumps I was able to see that the largest buckets only contained 6 elements so I discarded that theory.
Code Cache
I finally got on the right track when I read "Java Performance: The Definitive Guide". It has a chapter on the JIT Compiler which talks about the Code Cache which I had not heard of before.
Compiler disabled
When monitoring the number of compilations performed (monitored with jstat) and the size of the Code Cache (monitored with Memory Pools plugin of VisualVM) I saw that the size increased up to the maximum size (which is 48 MB by default in our environment -- the default varies depending on Java version and Java compiler). When the Code Cache became full the JIT Compiler was turned off. I have read that "CodeCache is full. Compiler has been disabled." should be printed when that happens but I did not see that message; maybe the version we are using does not have that message. I know that the compiler was turned off because the number of compilations performed stopped increasing.
Deoptimization continues
The JIT Compiler can deoptimize previously compiled functions which will caues the function to be executed by the interpreter again (unless the function is replaced by an improved compilation). The deoptimized function can be garbage collected to free up space in the Code Cache.
For some reason functions continued to be deoptimized even though nothing was compiled to replace them. More and more memory would become available in the Code Cache but the JIT Compiler was not restarted.
I never had -XX:+PrintCompilation enabled when we experience a slowdown but I am quite sure that I would have seen either ConcurrentReferenceHashMap.get(), or a function that it depends on, be deoptimized at that time.
Result
We have not seen any slowdowns since we increased the maximum size of the Code Cache to 256 MB and we have also seen a general performance improvement. There is currently 110 MB in our Code Cache.

First, let me say that you have done an excellent job grabbing detailed facts about the problem; I really like how you make it clear what you know and what you are speculating - it really helps.
EDIT 1 Massive edit after the update on context vs. instance
We can rule out:
GCs (that would affect the CPU benchmark service thread and spike the main CPU)
Quartz jobs (that would either affect both Tomcats or the CPU benchmark)
The database (that would affect both Tomcats)
Network packet storms and similar (that would affect both Tomcats)
I believe that you are suffering from is an increase in latency somewhere in your JVM. Latency is where a thread is waiting (synchronously) for a response from somewhere - it's increased your servlet response time but at no cost to the CPU. Typical latencies are caused by:
Network calls, including
JDBC
EJB or RMI
JNDI
DNS
File shares
Disk reading and writing
Threading
Reading from (and sometimes writing to) queues
synchronized method or block
futures
Thread.join()
Object.wait()
Thread.sleep()
Confirming that the problem is latency
I suggest using a commercial profiling tool. I like [JProfiler](http://www.ej-technologies.com/products/jprofiler/overview.html, 15 day trial version available) but YourKit is also recommended by the StackOverflow community. In this discussion I will use JProfiler terminology.
Attach to the Tomcat process while it is performing fine and get a feel for how it looks under normal conditions. In particular, use the high-level JDBC, JPA, JNDI, JMS, servlet, socket and file probes to see how long the JDBC, JMS, etc operations take (screencast. Run this again when the server is exhibiting problems and compare. Hopefully you will see what precisely has been slowed down. In the product screenshot below, you can see the SQL timings using the JPA Probe:
(source: ej-technologies.com)
However it's possible that the probes did not isolate the issue - for example it might be some threading issue. Go to the Threads view for the application; this displays a running chart of the states of each thread, and whether it is executing on the CPU, in an Object.wait(), is waiting to enter a synchronized block or is waiting on network I/O . When you know which thread or threads is exhibiting the issue, go to the CPU views, select the thread and use the thread states selector to immediately drill down to the expensive methods and their call stacks. [Screencast]((screencast). You will be able to drill up into your application code.
This is a call stack for runnable time:
And this is the same one, but showing network latency:
When you know what is blocking, hopefully the path to resolution will be clearer.

We had the same problem, running on Java 1.7.0_u101 (one of Oracle's supported versions, since the latest public JDK/JRE 7 is 1.7.0_u79), running on G1 garbage collector. I cannot tell if the problem appears in other Java 7 versions or with other GCs.
Our process was Tomcat running Liferay Portal (I believe the exact version of Liferay is of no interest here).
This is the behavior we observed: using a -Xmx of 5GB, the inital Code Cache pool size right after startup ranged at about 40MB. After a while, it dropped to about 30MB (which is kind of normal, since there is a lot of code running during startup which will be never executed again, so it is expected to be evicted from the cache after some time). We observed that there was some JIT activity, so the JIT actually populated the cache (comparing to the sizes I am mentioning later, it seems that the small cache size relative to the overall heap size places stringent requirements on the JIT, and this makes the latter evict the cache rather nervously). However, after a while, no more compilations ever took place, and the JVM got painfully slow. We had to kill our Tomcats every now and then to get back adequate performance, and as we added more code to our portal, the problem got worse and worse (since the Code Cache got saturated more quickly, I guess).
It seems that there are several bugs in JDK 7 JVM that cause it to not restart the JIT (look at this blog post: https://blogs.oracle.com/poonam/entry/why_do_i_get_message), even in JDK 7, after an emergency flush (the blog mentions Java bugs 8006952, 8012547, 8020151 and 8029091).
This is why increasing manually the Code Cache to a level where an emergency flush is unlikely to ever occur "fixes" the issue (I guess this is the case with JDK 7).
In our case, instead of trying to adjust the Code Cache pool size, we chose to upgrade to Java 8. This seems to have fixed the issue. Also, the Code Cache now seems to be quite larger (startup size gets about 200MB, and cruising size gets to about 160MB). As it is expected, after some idling time, the cache pool size drops, to get up again if some user (or robot, or whatever) browses our site, causing more code to be executed.
I hope you find the above data helpful.
Forgot to say: I found the exposition, the supporting data, the infering logic and the conclusion of this post very, very helpful. Thank you, really!

Has someone an idea where to look?
Issue could be out of Tomcat/JVM- do you have some batch job which kicks in and stress the shared resource(s) like a common database?
Take a thread dump and see what the java processes are doing when application response time explodes?
If you are using Linux, use a tool like strace and check what is java process doing.

Have you checked JVM GC times? Some GC algorithms might 'pause' the application threads and increase the response time.
You can use jstat utility to monitor garbage collection statistics:
jstat -gcutil <pid of tomcat> 1000 100
Above command would print GC statistics on every 1 second for 100 times. Look at the FGC/YGC columns, if the number keeps raising, there is something wrong with your GC options.
You might want to switch to CMS GC if you want to keep response time low:
-XX:+UseConcMarkSweepGC
You can check more GC options here.

What happens after your app is performing slow for a while, does it get back to performing well?
If so then I would check if there is any activity that is not related to your app taking place at this time.
Something like an antivirus scan or a system/db backup.
If not then I would suggest running it with a profiler (JProfiler, yourkit, etc.) this tools can point you to your hotspots very easily.

You are using Quartz, which manages timed processes, and this seems to take place at particular times.
Post your Quartz schedule and let us know if that aligns, and if so, you can determine which internal application process may be kicking off to consume your resources.
Alternately, it is possible a portion of your application code has finally been activated and decides to load data to the memory cache. You're using Hibernate; check the calls to your database and see if anything coincides.

Heroku, Cedar stack: What requests take up dyno time

I have given some thought on how to calculate how many users I can handle with Heroku and one dyno.
But to figure it out I need some input.
And I must say that the official documentation isn't nice to navigate and interpreter so I haven't read it all. My complaints about it are that it doesn't describe things very well. Sometimes it describes old stacks, sometimes it's ruby specific, sometime it isn't described at all and so on.
So I need some input on how Heroku, Cedar stack, works regarding requests to make my calculations.
You are more than welcome to correct me on my assumptions as I am relatively new to dyno theory.
Lets say I have a controller that takes a request and calculate a JSON response in 10ms locally will I be able to serve 100request a second?
As I understand the cedar stack doesn't have a fronting caching solution, many questions arises.
Does static content requests take up dyno time?
Does transfer time count to request time.
Can one dyno solution transfer many response to a request at the same time if the request requires small CPU utilization.
Some of the question is intertwined so a combined answer or other thought is valued.
An example:
Static HTML page.
<HTML>...<img><css><script>...
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
...</HTML>
Can I serve (1000ms / (10ms x 4)) = 25HTML pages a second?
This assumes that static content isn't provided by a dyno.
This assumes that transfer time isn't blamed on the dyno.
If this isn't the case I would be a catastrophe. Lets say a mobile phone in Africa makes 10 request and have a 10sec transfer time then my App will be unavailable for over 1½ minute.

I can only really answer the first question: Static assets most certainly do take up dyno time. In fact, I think it's best to keep all static assets, including stylesheets and JS on an asset server when using heroku's free package. (If everyone did that, heroku would benefit and so would you). I recommend using the asset_sync gem to handle that. The Readme does explain that there are one or two, easily resolved, current issues.
Regarding your last point, sorry if I'm misinterpreting here, but a user in south africa might take 10 seconds to have their request routed to Heroku, but most of that time is probably spent trafficking around the maze of telephone exchanges between SA and USA. Your dyno is only tied up for the portion of the request that takes place inside Heroku's servers, not the 9.9 seconds your request spent getting there. So effectively Heroku is oblivious to whether your request is coming from South Africa or Sweden.
There are all sorts of things you can do to speed your app up: Caching, more dynos, Unicorn with several workers

You're making two wrong assumptions. The good news is that your problem becomes much simpler once you think about things differently.
First off remember that a dyno is a single process, not a single thread. If you're using Java then you'll be utilizing many request threads. Therefore you don't have to worry about your application being unavailable while a request is being processed. You'll be able to process requests in parallel.
Also when talking about dyno time that refers to the amount of time that your process is running not just request processing time. So a web process that is waiting for a request still consumes dyno time since the process is up while it waits for requests. This is why you get 750 free dyno hours a month. You'll be able to run a single dyno for the entire month (720 hours).
As far as computing how many requests your application can serve per second the best way to do that is to test it. You can use New Relic to monitor your application while you load test it with JMeter or whatever your favorite load testing program is: http://devcenter.heroku.com/articles/newrelic

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.