I have an JAVA application running in Kubernetes which listens to port 8080. When I connect JProfiler to JVM and run a few requests sequentially, everything works fine. But as soon as I fire some load using Jmeter, my application stops responding on Port 8080 and I get request timeout.
When JProfiler is detached from JVM, everything starts working fine again.
I explored a lot but couldn't find any help regarding what in JProfiler is blocking my application to respond.
From the feedback you have sent me by email, the overhead becomes noticeable when you switch on allocation recording. Just with CPU and probe recording you don't experience any problem.
Allocation recording is an expensive operation that should only be used when you have a related problem. The added overhead can be reduced by reducing the allocation sampling rate.
Related
We're using Appdynamics Java agent for monitoring our production applications. We have noticed slow growth in memory and the application eventually stalls. We ran a head dump on one of the JVMs and got the below reports.
Problem Suspect 1:
The thread com.singularity.ee.agent.appagent.kernel.config.xml.a# 0x1267......
AD thread config Poller keeps local variable config size of 28546.79(15.89%) KB
Problem Suspect 2:
280561 Instances of
com.singularity.ee.agent.appagent.services.transactionmonitor.com.exitcall.p loaded by com.singularity.ee.agent.appagent.kernel.classloader.d# 0x6c000....
occupy 503413.3(28.05%) KB. These instances are referenced from one instance of java.util.HashMap$Node[]...
We figured that these classes were from the Appdynamics APM that hooks on to the running JVM and sends monitored events to the controller. There is so much convoluted process associated with reaching out to the vendor, so I am wondering if there are any work arounds for this like we enabling our java apps with JMX and Appd getting the monitoring events from JMX rather than directly hooking on to the applications' JVM. Thanks for your suggestions.
I have a very simple Java REST service. At lower traffic volumes, the service runs perfectly with response times of ~1ms and zero server backlog.
When traffic rises past a certain threshold the response times skyrocket from 1ms to 2.0 seconds, the http active session queue and open file counts spike, and the server is performing unacceptably. I posted a metrics graph of a typical six hour window where traffic starts low and goes above the problem threshold.
Any ideas on what could be causing this or how to diagnose further?
Your webapp will use a thread (borrowed from thread pool) to server one request.
Under stress load, many threads are created, if the number of requests exceed the capacity of the pool, they have to queue, waiting till a thread is available again from pool.
If your service is not run fast enough, (especially you're doing IO - open file), the wait time is increase, lead to slow response.
CPU has to switch between many threads hence the CPU will spike under load.
That's why they need a load balancing and many webapp to server as a service. The stress load is distributed to many subwebapp which improve the end user experience
The usual approach to diagnostic is to create load with JMeter and investigate results with Java VisualVM, Eclipse memory analyzer and so on. I don't know whether you have tried it.
I am currently load testing my web application ( Spring + Hibernate based) on a standalone tomcat server (v7.0.27)on a Windows Server 2008 machine. I need to know how tomcat behaves as bulk requests come. e.g.
300 requests recevied - current heap size, server is hung up, server is unable to process, heap size, size of objects, number of objects. So on and so forth.
Is there a way to see this already ? (Info from the manager app is insufficient "current Threads active and memory occupied is not for my requirement).
P.S. maxThreads property for Connector element is 350.
Update : Another issue I faced while load testing - (Tomcat hangs up when i send 300 requests in some cases).
Any help would be highly and greatly appreciated.
you can use jconsole that ships with jdk.
http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html
If the server hangs, there might be a deadlock.
You can try to attach with JProfiler, the monitoring section will show you the current locking situation and a possible deadlock.
Disclaimer: My company develops JProfiler.
Having issues with FIN_WAIT1 on my RHEL 5.4 running Introscope. What I have observed so far is whenever the target JVM which we are monitoring using Introscope is hung the agent running on that host stop sending data and after some time the socket on the server (Introscope Server) goes in FIN_WAIT1 state and it remains there for a long time it gets cleaned up if we restart the target JVM.
I would like to know if this is happening because of a bug in Introscope or is it something to do with TCP layer.
FIN_WAIT1 is at the TCP layer - it means your computer's tcp stack is waiting for one of the connection-close messages from the other side's TCP stack. It usually doesn't really cause much harm, other than taking some tiny amount of kernel state until it times out. However sometimes it can prevent you from restarting a server on the same port, in which case you can set the SO_REUSESOCKET and/or SO_REUSEPORT options on the socket before opening it the first time. (This does have some security implications if you're sharing the machine.)
I've seen several StackOverflow posts that discuss what tools to use to monitor web application performance, but none that talk about what metrics to focus on.
What web server metrics should be monitored and which should have alerts setup on?
Here are some I currently have in mind:
requests timeouts (alerts)
requests queued (alerts)
time to first byte (may need to be monitored externally)
requests / second
Also, how can these be measured on a java web application server.
You're off to a good start. I would monitor:
Total response time
Total bytes
Throughput (reqs/sec)
Server CPU overhead
Errors (by error code)
I would also alert on the following:
Application/page not responding
Excessive response time (this depends upon your app, you'll have to figure out the normal SLA)
Excessive throughput (this will alert you to a DOS attack so that you can take action)
50x errors (such as 500, 503, etc.)
Server CPU load factor excessive (again, you'll have to determine what typical is, and configure your tool to alert you when things are abnormal, another indicator of DOS or a runaway process)
Errors in log files (if your tools supports it, configure it to send alerts when errors/exceptions pop up in log files)