java server application performance issues - java

i am monitoring a java process that runs for a long period of time, the process uses G1 garbage collector.
those are my java options(runs in server mode):
-Xmx2048m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=32M -XX:MetaspaceSize=100m -XX:CompressedClassSpaceSize=400m
the process is a java websocket server which involves a lot of I/O with embedded tomcat in front of it.
according to java mission control the top heap usage reaches 1.7GB and after every GC cycle it decreases back to ~900MB.
the process runs in a docker container with 3.8GB of memory restriction.
when using top -o %MEM it shows that total RES is 2.6GB and never more.
i am noticing that after a heavy load of websockets connect/disconnect the process becomes really slow and unresponsive.
when i try to load this process again after 6-7 hours, while it's idle and "clean" of connections it responses after 6-7 seconds, while in the first load the response time was much lower ~2-3 seconds.
i thought it's file descriptors related but while checking with:
ls /proc/1/fd | wc -l
from inside the docker it shows all file descriptors released.
java version: 8u131
tomcat version: 8.5.60
this is how my heap looks like after 6 hours of load with no connections on the server:
and this is from JMC:
how can i investigate it further?

Don't focus on the memory & garbage collection (yet) but identify what takes time in the request processing.
Is the time spent in your request handler code? Is it somewhere else?
If you can modify the code, you can simply add some println statements to start.
And/Or you can use a lightweight cpu profiler like https://github.com/jvm-profiling-tools/async-profiler to get a CPU flame graph - wall-clock profiling might be especially helpful for "off-cpu" analysis.

Related

Java service auto kill on linux

I have ran nohup java -jar sample.jar & on my linux server as a backend service of web application but service is automatically killed after few hours and need to re-run the command.
I want to know what is the reason and how to resolve the issue
On Linux, one possibility is that it is the oomkiller process. This watches for processes that appears to be triggering excessive paging activity, and sends them a SIGKILL signal. The oom killer's purpose is to prevent the paging system from thrashing ... which can bring the entire system to its knees.
If it is the oom killer killing your Java app, you should see a log message in /var/log/messages or /var/log/syslog.
The solution may be as simple as reducing the Java app's heap size so that it all fits in the available physical memory. Alternatively, if your Java app's heap usage is growing all of the time, you may need to look for a possible memory leak in your app. (Search for articles on finding Java memory leaks ...)
I run Spring Boot via the same command and only a handful of times have I seen this behavior. It turned out that the JVM ran out of memory and it crashed. When I changed the heap size settings it was rock solid again.

Application TPS getting dropped while taking JFR

My Springboot application is designed for supporting 500 tps.
The application giving 500 TPS continuously, but when we take the JFR using below command, the application TPS is getting dropped to less than 100.
JFR command:
/opt/drutt/local/jdk1.8.0_112/bin/jcmd JFR.start settings=profile duration=1m filename=/tmp/my_file_1.jfr
Is there any problem in JFR command?
Does JFR contribute in perfromance drop?
The profile setting could create significant overhead in certain applications, typically due to TLAB allocation event (or possibly exceptions), and is not recommended to have always on in production environments.
Remove settings=profile and the default configuration is used, which is safe in production (< 1% overhead)

Problems with jetty crashing intermittently

I'm having problems with jetty crashing intermittently, I'm using Jetty 6.1.24.
I'm running a neo4j Spring MVC webapp, Jetty will stay running for approx 1 hour and then I have to restart Jetty. It is running on small amazon ec2 instance, debian with 1.7gb of RAM.
I start Jetty using java -Xmx900m -server -jar start.jar
I am connecting to the server using putty, when Jetty crashes the putty session disconnects, I cannot see what error caused it to crash.
I would like to be able to see if it is an error generated by Spring, I'm not sure how to log the output from the spring app with Jetty. Or if it is Jetty or a memory issue, what would be the best way to monitor Jetty? I cannot recreate this on my local machine running windows. What do you think would be the best way to approach this? Thanks
This isn't really a programmer question; perhaps it'll be moved over to ServerFault.
You didn't specifically state which operating system you're using, but I'm hazarding a guess at some Linux distribution. You have two options of figuring out what's wrong:
Start your session in screen. Screen will live for as long as the actual machine is powered on, until you reboot the operating system (or you exit screen).
you start screen like this
screen
and you get a new prompt where you can start your program (cd foo, jetty, etc). When you're happy and you just need to go somewhere, you can disconnect the screen by hitting CTRL+A and then CTRL+D. you'll drop back to the place you were before you invoked screen.
To get back to seeing the screen you type screen -R which means to resume an existing screen. you should see jetty again.
The nice thing is that if you lose connection (or you close putty by accident or whatever) then you can use screen -list to get a list of running screens, and then forcibly detach them -D and reattach them to the current putty -R, no harm done!
Use nohup. Nohup more or less detaches the process you're running from the console, so none of its output comes to the terminal. You start your program in the normal fashion, but you add the word nohup to your command.
For example:
nohup ls -l &
After ls -l is complete, your output is stored in nohup.out.
When you say crash do you mean the JVM segfaults and disappears? If that's the case I'd check and make sure you aren't exhausting the machine's available memory. Java on linux will crash when the system memory gets so low the JVM cannot allocate up to its maximum memory. For example, you've set the max JVM memory to 500MB of which it's using 250MB at the moment. However, the Linux OS only has 128MB available. This produces unstable results and the JVM will segfault.
On windows the JVM is more well behaved in this scenario and throws OutOfMemoryError when the system is running low on memory.
Validate how much system memory is available around the time of your crashes.
Verify if other processes on your box are eating up a lot of memory. Turn off anything that could be competing with the JVM.
Run jconsole and connect it to your JVM. That will tell you how memory is being used in your JVM process and give you a history to look back through when it does crash.
Eliminate any native code you might be loading into the JVM when doing this type of testing.
I believe Jetty has some native code to do high volume request processing. Make sure that's not being used. You want to isolate the crashes to Java and NOT some strange native lib. If you take out the native stuff and find it works then you have your answer as to what's causing it. If it continues to crash then it very well could be what I'm describing.
You can force the JVM to allocate all the memory at startup with -Xms900m that can make sure the JVM doesn't fight with other processes for memory. Once it has the full Xmx amount allocated it won't crash. Not a solution, but you can easily test it this way.
When you start java, redirect both outputs (stdout and stderr) to a file:
Using Bash:
java -Xmx900m -server -jar start.jar > stdout.txt 2> stderr.txt
After the crash, inspect those files.
If the crash is due to a signal (like SEGV=segmentation fault), there should be a file dump by the JVM at the location you've started java. For Sun VM (hotspot), it's something like hs_err_pid12121.log (here 12121 is the process ID).
Putty disconnecting STRONGLY hints that the server is running out of memory and starts shutting down processes left and right. It is probably your jetty instance growing too big.
The easiest thing to do now, is adding 1-2 Gb more swap space and do it again. Also note that you can use the jvisualvm to attach to the jetty instance to get runtime information directly.

Alfresco Community on Tomcat starts very slow

We're currently testing out Alfresco Community on an old server (only 1GB of RAM). Because this is the Community version we need to restart it every time we change the configuration (we're trying to add some features like generating previews of DWG files etc). However, restarting takes a very long time (about 4 minutes I think). This is probably due to the limit amount of memory available. Does anybody know some features or settings that can improve this restart time?
As with all performance issues there is rarely a magic bullet.
Memory pressure - the app is starting up but the 512m heap is only just enough to fit the applications in and it is spending half of the start up time running GC.
Have a look at any of the following:
1. -verbose:gc
2. jstat -gcutil
2. jvisualvm - much nicer UI
You are trying to see how much time is being spent in GC, look for many full garbage collection events that don't reclaim much of the heap ie 99% -> 95%.
Solution - more heap, nothing else for it really.
You may want to try -XX:+AggressiveHeap in order to get the JVM to max out it's memory usage on the box, only trouble is with only 1gb of memory it's going to be limited. List of all JVM options
Disk IO - the box it's self is not running at close to 100% CPU during startup (assuming 100% of a single core, startup is normally single threaded) then there may be some disk IO that the application is doing that is the bottle neck.
Use the operating system tools such as Windows Performance monitor to check for disk IO. It maybe that it isn't the application causing the IO it could be swap activity (page faulting)
Solution: either fix the app (not to likely) or get faster disks/computer or more physical memory for the box
Two of the most common reasons why Tomcat loads slowly:
You have a lot of web applications. Tomcat takes some time to create the web context for each of those.
Your webapp have a large number of files in a web application directory. Tomcat scans the web application directories at startup
also have a look at java performance tuning whitepaper, further I would recomend to you Lambda Probe www.lambdaprobe.org/d/index.htm to see if you are satisfied with your gcc settings, it has nice realtime gcc and memory tracking for tomcat.
I myself have Alfresco running with the example 4.2.6 from java performance tuning whitepaper:
4.2.6 Tuning Example 6: Tuning for low pause times and high throughput
Memory settings are also very nicely explained in that paper.
kind regards Mahatmanich

Why does the Sun JVM continue to consume ever more RSS memory even when the heap, etc sizes are stable?

Over the past year I've made huge improvements in my application's Java heap usage--a solid 66% reduction. In pursuit of that, I've been monitoring various metrics, such as Java heap size, cpu, Java non-heap, etc. via SNMP.
Recently, I've been monitoring how much real memory (RSS, resident set) by the JVM and am somewhat surprised. The real memory consumed by the JVM seems totally independent of my applications heap size, non-heap, eden space, thread count, etc.
Heap Size as measured by Java SNMP
Java Heap Used Graph http://lanai.dietpizza.ch/images/jvm-heap-used.png
Real Memory in KB. (E.g.: 1 MB of KB = 1 GB)
Java Heap Used Graph http://lanai.dietpizza.ch/images/jvm-rss.png
(The three dips in the heap graph correspond to application updates/restarts.)
This is a problem for me because all that extra memory the JVM is consuming is 'stealing' memory that could be used by the OS for file caching. In fact, once the RSS value reaches ~2.5-3GB, I start to see slower response times and higher CPU utilization from my application, mostly do to IO wait. As some point paging to the swap partition kicks in. This is all very undesirable.
So, my questions:
Why is this happening? What is going on "under the hood"?
What can I do to keep the JVM's real memory consumption in check?
The gory details:
RHEL4 64-bit (Linux - 2.6.9-78.0.5.ELsmp #1 SMP Wed Sep 24 ... 2008 x86_64 ... GNU/Linux)
Java 6 (build 1.6.0_07-b06)
Tomcat 6
Application (on-demand HTTP video streaming)
High I/O via java.nio FileChannels
Hundreds to low thousands of threads
Low database use
Spring, Hibernate
Relevant JVM parameters:
-Xms128m
-Xmx640m
-XX:+UseConcMarkSweepGC
-XX:+AlwaysActAsServerClassMachine
-XX:+CMSIncrementalMode
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime
-XX:+CMSLoopWarn
-XX:+HeapDumpOnOutOfMemoryError
How I measure RSS:
ps x -o command,rss | grep java | grep latest | cut -b 17-
This goes into a text file and is read into an RRD database my the monitoring system on regular intervals. Note that ps outputs Kilo Bytes.
The Problem & Solutions:
While in the end it was ATorras's answer that proved ultimately correct, it kdgregory who guided me to the correct diagnostics path with the use of pmap. (Go vote up both their answers!) Here is what was happening:
Things I know for sure:
My application records and displays data with JRobin 1.4, something I coded into my app over three years ago.
The busiest instance of the application currently creates
Over 1000 a few new JRobin database files (at about 1.3MB each) within an hour of starting up
~100+ each day after start-up
The app updates these JRobin data base objects once every 15s, if there is something to write.
In the default configuration JRobin:
uses a java.nio-based file access back-end. This back-end maps MappedByteBuffers to the files themselves.
once every five minutes a JRobin daemon thread calls MappedByteBuffer.force() on every JRobin underlying database MBB
pmap listed:
6500 mappings
5500 of which were 1.3MB JRobin database files, which works out to ~7.1GB
That last point was my "Eureka!" moment.
My corrective actions:
Consider updating to the latest JRobinLite 1.5.2 which is apparently better
Implement proper resource handling on JRobin databases. At the moment, once my application creates a database and then never dumps it after the database is no longer actively used.
Experiment with moving the MappedByteBuffer.force() to database update events, and not a periodic timer. Will the problem magically go away?
Immediately, change the JRobin back-end to the java.io implementation--a line line change. This will be slower, but it is possibly not an issue. Here is a graph showing the immediate impact of this change.
Java RSS memory used graph http://lanai.dietpizza.ch/images/stackoverflow-rss-problem-fixed.png
Questions that I may or may not have time to figure out:
What is going on inside the JVM with MappedByteBuffer.force()? If nothing has changed, does it still write the entire file? Part of the file? Does it load it first?
Is there a certain amount of the MBB always in RSS at all times? (RSS was roughly half the total allocated MBB sizes. Coincidence? I suspect not.)
If I move the MappedByteBuffer.force() to database update events, and not a periodic timer, will the problem magically go away?
Why was the RSS slope so regular? It does not correlate to any of the application load metrics.
Just an idea: NIO buffers are placed outside the JVM.
EDIT:
As per 2016 it's worth considering #Lari Hotari comment [ Why does the Sun JVM continue to consume ever more RSS memory even when the heap, etc sizes are stable? ] because back to 2009, RHEL4 had glibc < 2.10 (~2.3)
Regards.
RSS represents pages that are actively in use -- for Java, it's primarily the live objects in the heap, and the internal data structures in the JVM. There's not much that you can do to reduce its size except use fewer objects or do less processing.
In your case, I don't think it's an issue. The graph appears to show 3 meg consumed, not 3 gig as you write in the text. That's really small, and is unlikely to be causing paging.
So what else is happening in your system? Is it a situation where you have lots of Tomcat servers, each consuming 3M of RSS? You're throwing in a lot of GC flags, do they indicate the process is spending most of its time in GC? Do you have a database running on the same machine?
Edit in response to comments
Regarding the 3M RSS size - yeah, that seemed too low for a Tomcat process (I checked my box, and have one at 89M that hasn't been active for a while). However, I don't necessarily expect it to be > heap size, and I certainly don't expect it to be almost 5 times heap size (you use -Xmx640) -- it should at worst be heap size + some per-app constant.
Which causes me to suspect your numbers. So, rather than a graph over time, please run the following to get a snapshot (replace 7429 by whatever process ID you're using):
ps -p 7429 -o pcpu,cutime,cstime,cmin_flt,cmaj_flt,rss,size,vsize
(Edit by Stu so we can have formated results to the above request for ps info:)
[stu#server ~]$ ps -p 12720 -o pcpu,cutime,cstime,cmin_flt,cmaj_flt,rss,size,vsize
%CPU - - - - RSS SZ VSZ
28.8 - - - - 3262316 1333832 8725584
Edit to explain these numbers for posterity
RSS, as noted, is the resident set size: the pages in physical memory. SZ holds the number of pages writable by the process (the commit charge); the manpage describes this value as "very rough". VSZ holds the size of the virtual memory map for the process: writable pages plus shared pages.
Normally, VSZ is slightly > SZ, and very much > RSS. This output indicates a very unusual situation.
Elaboration on why the only solution is to reduce objects
RSS represents the number of pages resident in RAM -- the pages that are actively accessed. With Java, the garbage collector will periodically walk the entire object graph. If this object graph occupies most of the heap space, then the collector will touch every page in the heap, requiring all of those pages to become memory-resident. The GC is very good about compacting the heap after each major collection, so if you're running with a partial heap, there most of the pages should not need to be in RAM.
And some other options
I noticed that you mentioned having hundreds to low thousands of threads. The stacks for these threads will also add to the RSS, although it shouldn't be much. Assuming that the threads have a shallow call depth (typical for app-server handler threads), each should only consume a page or two of physical memory, even though there's a half-meg commit charge for each.
Why is this happening? What is going on "under the hood"?
JVM uses more memory than just the heap. For example Java methods, thread stacks and native handles are allocated in memory separate from the heap, as well as JVM internal data structures.
In your case, possible causes of troubles may be: NIO (already mentioned), JNI (already mentioned), excessive threads creation.
About JNI, you wrote that the application wasn't using JNI but... What type of JDBC driver are you using? Could it be a type 2, and leaking? It's very unlikely though as you said database usage was low.
About excessive threads creation, each thread gets its own stack which may be quite large. The stack size actually depends on the VM, OS and architecture e.g. for JRockit it's 256K on Linux x64, I didn't find the reference in Sun's documentation for Sun's VM. This impacts directly the thread memory (thread memory = thread stack size * number of threads). And if you create and destroy lots of thread, the memory is probably not reused.
What can I do to keep the JVM's real memory consumption in check?
To be honest, hundreds to low thousands of threads seems enormous to me. That said, if you really need that much threads, the thread stack size can be configured via the -Xss option. This may reduce the memory consumption. But I don't think this will solve the whole problem. I tend to think that there is a leak somewhere when I look at the real memory graph.
The current garbage collector in Java is well known for not releasing allocated memory, although the memory is not required anymore. It's quite strange however, that your RSS size increases to >3GB although your heap size is limited to 640MB. Are you using any native code in your application or are you having the native performance optimization pack for Tomcat enabled? In that case, you may of course have a native memory leak in your code or in Tomcat.
With Java 6u14, Sun introduced the new "Garbage-First" garbage collector, which is able to release memory back to the operating system if it's not required anymore. It's still categorized as experimental and not enabled by default, but if it is a feasible option for you, I would try to upgrade to the newest Java 6 release and enable the new garbage collector with the command line arguments "-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC". It might solve your problem.

Categories