Application TPS getting dropped while taking JFR

Application TPS getting dropped while taking JFR - java

My Springboot application is designed for supporting 500 tps.
The application giving 500 TPS continuously, but when we take the JFR using below command, the application TPS is getting dropped to less than 100.
JFR command:
/opt/drutt/local/jdk1.8.0_112/bin/jcmd JFR.start settings=profile duration=1m filename=/tmp/my_file_1.jfr
Is there any problem in JFR command?
Does JFR contribute in perfromance drop?

The profile setting could create significant overhead in certain applications, typically due to TLAB allocation event (or possibly exceptions), and is not recommended to have always on in production environments.
Remove settings=profile and the default configuration is used, which is safe in production (< 1% overhead)

Related

How to prevent a Spring Boot / Tomcat (Java8) process be OOM-killed?

Since moving to Tomcat8/Java8, now and then the Tomcat server is OOM-killed. OOM = Out-of-memory kill by the Linux kernel.
How can I prevent the Tomcat server be OOM-killed?
Can this be the result of a memory leak? I guess I would get a normal Out-of-memory message, but no OOM-kill. Correct?
Should I change settings in the HEAP size?
Should I change settings for the MetaSpace size?
Knowing which Tomcat process is killed, how to retrieve info so that I can reconfigure the Tomcat server?

Firstly check that the oomkill isn't being triggered by another process in the system, or that the server isn't overloaded with other processes. It could be that Tomcat is being unfairly targeted by oomkill when some other greedy process is the culprit.
Heap should be set as a maximum size (-Xmx) to be smaller than the physical RAM on the server. If it is more than this, then paging will cause desperately poor performance when garbage collecting.
If it's caused by the metaspace growing in an unbounded fashion, then you need to find out why that is happening. Simply setting the maximum size of metaspace will cause an outofmemory error once you reach the limit you've set. And raising the limit will be pointless, because eventually you'll hit any higher limit you set.
Run your application and before it crashes (not easy of course but you'll need to judge it), kill -3 the tomcat process. Then analyse the heap and try to find out why metaspace is growing big. It's usually caused by dynamically loading classes. Is this something your application is doing? More likely, it's some framework doing this. (Nb oom killer will kill -9 the tomcat process, and you won't be able to diagnostics after that, so you need to let the app run and intervene before this happens).
Check out also this question - there's an intriguing answer which claims an obscure fix to an XML binding setting cleared the problem (highly questionable but may be worth a try) java8 "java.lang.OutOfMemoryError: Metaspace"

Another very good solution is transforming your application to a Spring Boot JAR (Docker) application. Normally this application has a lot less memory consumption.
So steps the get huge improvements (if you can move to Spring Boot application):
Migrate to Spring Boot application. In my case, this took 3 simple actions.
Use a light-weight base image. See below.
VERY IMPORTANT - use the Java memory balancing options. See the last line of the Dockerfile below. This reduced my running container RAM usage from over 650MB to ONLY 240MB. Running smoothly. So, SAVING over 400MB on 650MB!!
This is my Dockerfile:
FROM openjdk:8-jdk-alpine
ENV JAVA_APP_JAR your.jar
ENV AB_OFF true
EXPOSE 8080
ADD target/$JAVA_APP_JAR /deployments/
CMD ["java","-XX:+UnlockExperimentalVMOptions", "-XX:+UseCGroupMemoryLimitForHeap", "-jar","/deployments/your.jar"]

Jenkins java.lang.OutOfMemoryError: GC overhead limit exceeded

I am currently working on creating a performance framework using jenkins and execute the performance test from Jenkins. I am using https://github.com/jmeter-maven-plugin/jmeter-maven-plugin this plugin. The sanity test with single user in this performance framework worked well and went ahead with an actual performance test of 200 users and within 2 mins received the error
java.lang.OutOfMemoryError: GC overhead limit exceeded
I tried the following in jenkins.xml
<arguments>-Xrs -Xmx2048m -XX:MaxPermSize=512m -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -jar "%BASE%\jenkins.war" --httpPort=8080 --prefix=/jenkins --webroot="%BASE%\war"</arguments>
but it didn't work and also noted that whenever I increased the memory the jenkins service stops and had to reduce the memory to 1Gb and then the service restarts.
Had increased the memory for jmeter and java as well but no help.
In the .jmx file view results tree and every other listener is disabled but still the issue persists.
Since I am doing a POC jenkins is hosted in my laptop and high level specs as follows
System Model : Latitude E7270 Processor : Intel(R) Core(TM) i5-6300U CPU # 2.40GHZ(4CPU's), ~2.5GHZ Memory : 8192MB RAM
Any help please ?

The error about GC overhead implies that Jenkins is thrashing in Garbage Collection. This means it's probably spending more time doing Garbage Collection than doing useful work.
This situation normally comes about when the heap is too small for the application. With modern multi generational heap layouts it's difficult to say what exactly needs changing.
I would suggest you enable Verbose GC with the following options "-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
Then follow the advice here: http://www.oracle.com/technetwork/articles/javase/gcportal-136937.html

Few points to note
You are using the integrated maven goal to run your jmeter tests. This will use Jenkins as the container to launch your jmeter tests thereby not only impacting your work but also other users of jenkins
It is better to defer the execution to a different client machine like a dedicated jmeter machine which uses its own JVM with parameters to launch your tests (OR) use the one that you provide
In summary,
1. Move the test execution out of jenkins
2. Provide the output of the report as an input to your performance plug-in [ this can also crash since it will need more JVM memory when you process endurance test results like an 8 hour result file]
This way, your tests will have better chance of scaling. Also, you haven't mentioned what type of scripting engine that you are using. AS per Jmeter documentation, JSR223 with groovy has a memory leak. Please refer
http://jmeter.apache.org/usermanual/component_reference.html#JSR223_Sampler
Try adding -Dgroovy.use.classvalue=true to see if that helps (provided you are using groovy). If you are using Java 8, there is a high chance that it is creating unique class for all your scripts in jmeter and it is increasing the meta space which is outside your JVM. In that case, restrict the meta space and use class unloading and a 64 bit JVM like
-d64 -XX:+CMSClassUnloadingEnabled.
Also, what is your new generation size. -XX:NewSize=1024m -XX:MaxNewSize=1024m ? Please note jmeter loads all the files permanently and it will go directly to the old generation thereby shrinking any available space for new generation.

Error in logs when ramp up period and/or number of threads is large

I am doing a performance test using JMeter and I have the following configuration:
Threads: 100
Loop Count:1
If my Ramp-Up period is 100, not all users are being logged in (the test script involves logging in and doing a transaction); that is, only 91 threads are successfully logged in. Also, error messages are being printed out in the logs such as NullPointerException. But if my Ramp-Up period is 500, all of them are successfully logged in. I'm just confused. What is the reason behind this?

Maybe an issue with Java Heap Space. check in jmeter.log file for OutOfMemoryError, which tells that JMeter does not have sufficient memory to perform its tasks.
Increase it, so that JMeter can accumulate more threads. (when you give more ramp-up time, number of threads running will be lower, so JMeter may not have any issue with dealing those threads.)
in jmeter.bat file:
default values:
set HEAP=-Xms512m -Xmx512m
increase heap space (to 1 GB or more based on available memory):
set HEAP=-Xms512m -Xmx1024m
Restart the JMeter and conduct the test.
if still, the issue persists, then It might be the reason that the server can not handle more than x number of parallel clients/threads at the same time, which is called the breaking point of the system.
Possible Reasons:
Improper configuration of the server (minThreads, connectTimeOut etc.)
lack of resources (CPU, Memory, Disk, Network etc). monitor server during the load test for these resources. Nmon tool for Unis based server & PerfMon for Windows based servers.
Possible Solutions:
Tweak the server configuration to match your needs.
Scale in or scale out to add additional resources.

Java profiling - detect what causes a spike

I am trying to to detect what causes massive spikes in our Java struts bases web application deployed in Jboss. I have used Yourkit and visualVM to take dumps and have analysed dumps but these spikes are momentary and by the time the dump is taken nothing remains.
Question is - is there a way to detect what is causing a spike in the runtime?

Here are a couple of ideas:
Examine your request logs to see if there is any correlation with the spikes and either request volumes or specific request types.
Run the JVM with GC logging enabled and look for correlations.
Enable your debug-level logging in your application and look for correlations. (Be cautious with this one because turning on more application logging could change performance characteristics.)
(On Linux / Unix) run vmstat and iostat and look for correlations with extra disc activity or swapping/paging.
If you have a spike in the object creation rate or in the number / size of non-garbage objects, this is most likely caused by your application rather than the JVM or operating system. There is a good chance that it is due to a transient change in the nature of the application's workload; e.g. it is getting a spike in the requests, or it there is some unusual request that involves creating a lot of objects. Focus on the request and application logs.

As most likely garbage collection can cause such an issue, I'd recommend enabling the garbage collection logging in the JVM using these command line options:
-Xloggc:<path and filename to log to>
-XX:+PrintGCDetails

Alfresco Community on Tomcat starts very slow

We're currently testing out Alfresco Community on an old server (only 1GB of RAM). Because this is the Community version we need to restart it every time we change the configuration (we're trying to add some features like generating previews of DWG files etc). However, restarting takes a very long time (about 4 minutes I think). This is probably due to the limit amount of memory available. Does anybody know some features or settings that can improve this restart time?

As with all performance issues there is rarely a magic bullet.
Memory pressure - the app is starting up but the 512m heap is only just enough to fit the applications in and it is spending half of the start up time running GC.
Have a look at any of the following:
1. -verbose:gc
2. jstat -gcutil
2. jvisualvm - much nicer UI
You are trying to see how much time is being spent in GC, look for many full garbage collection events that don't reclaim much of the heap ie 99% -> 95%.
Solution - more heap, nothing else for it really.
You may want to try -XX:+AggressiveHeap in order to get the JVM to max out it's memory usage on the box, only trouble is with only 1gb of memory it's going to be limited. List of all JVM options
Disk IO - the box it's self is not running at close to 100% CPU during startup (assuming 100% of a single core, startup is normally single threaded) then there may be some disk IO that the application is doing that is the bottle neck.
Use the operating system tools such as Windows Performance monitor to check for disk IO. It maybe that it isn't the application causing the IO it could be swap activity (page faulting)
Solution: either fix the app (not to likely) or get faster disks/computer or more physical memory for the box

Two of the most common reasons why Tomcat loads slowly:
You have a lot of web applications. Tomcat takes some time to create the web context for each of those.
Your webapp have a large number of files in a web application directory. Tomcat scans the web application directories at startup

also have a look at java performance tuning whitepaper, further I would recomend to you Lambda Probe www.lambdaprobe.org/d/index.htm to see if you are satisfied with your gcc settings, it has nice realtime gcc and memory tracking for tomcat.
I myself have Alfresco running with the example 4.2.6 from java performance tuning whitepaper:
4.2.6 Tuning Example 6: Tuning for low pause times and high throughput
Memory settings are also very nicely explained in that paper.
kind regards Mahatmanich

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.