I often run workloads against my own web applications to try find performance issues.
Sometimes I see memory leaks etc at variable duration.
So I created a bash script to take javacores, kill -3 pid , every minute for 10 minutes and script is executed on the hour.
For a workload that runs for 120 hours, this will produce 1200 javacores.
I'm wondering,
Is this overkill? I'd like a continuous view of system (javacore every 5 minutes, for 120 hours), but don't want to impact perf
what is a reasonable frequency to automatically capture javacores against servlet based app?
Looks like we are looking at two issues:
Performance
OutOfMemoryError
Performance: for performance, determine the longest request you can tolerate and generate the javacores when its 3 to 5 times that amount. (Anything below 5 minutes to me is fine tuning and can difficult)
Let's say your longest request you want is 3 minutes, I would generate 3 javacores at evenly from 9 minutes to 15 minutes.
I usually suggest the link (collect manually) below but if you already wrote your own script use it
"MustGather: Performance, Hang or High CPU Issues on Linux"
http://www.ibm.com/support/docview.wss?rs=180&uid=swg21115785
OutOfMemoryError: see if your product is leaking, follow the steps in URL below and go to collect manually and google IBM heap analyzer (stand alone and free) and review the heap dump for potential leak suspects.
"MustGather: Native Memory Issues on Linux"
http://www.ibm.com/support/docview.wss?rs=180&uid=swg21138462
Personnaly, I prefer looking at heap dumps memory use to equal the XMX or nearly that.
Since this is an IBM JVM you could try using Health Center instead of taking javacores regularly:
http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/
This has profiling and memory monitoring views so should give you the data you are looking and save you analysing the javacore files yourself.
Related
My goal
I am trying to understand and figure out best practices or optimal ways or best methods how to properly set up resources for low performance-hungry and high performance-hungry Java Springboot app.
Examples
First example:
Let have low performance-hungry Springboot App which only computes a key for cache and calls Redis for data with the key. I have tried two configurations of resources and java opts.
replicaCount: 4
memory: 1.9Gi
cpuRequest: 800m
cpuLimit: 1200m
JAVA_OPTS: "-XX:+UseG1GC -XX:MaxRAM=1792m -Xmx768m -Xms768m -XX:MaxMetaspaceSize=1024m -XshowSettings:vm -XX:ActiveProcessorCount=2"
Performance is good and the application has a response time around 3.5 ms in the median. And for 0.99 percentile it is 90ms. GC pause count was 0.4 per minute and pause duration 20ms. Also little throttling.
Then I have tried this setup:
replicaCount: 4
memory: 3Gi
cpuRequest: 800m
cpuLimit: 10000m
JAVA_OPTS: "-XX:+UseG1GC -XX:MaxRAMPercentage=80 -XX:InitialRAMPercentage=80 -XshowSettings:vm"
The application was more memory-hungry but the response time was the same, 3.5ms in the median. But only on 0.99 percentile was 72 ms. GC pause count was lower, something about 0.1 per minute, and pause duration 5 ms. Throttling during startup but no throttling during the run.
Second example:
Performance hungry Springboot application which loads data from DB calculates multiple things like the distance between two points, calculation of price for multiple deliveries, filtering possible transport or pickup points for the package.
It was running on 4 VPS with 4 CPUs and 4 GB. But on Kubernetes, it needs more resources than I expected.
replicaCount: 4
memory: 7.5Gi
cpuRequest: 2000m
cpuLimit: 50000m
JAVA_OPTS: "-XX:+UseG1GC -XX:MaxRAMPercentage=80 -XX:InitialRAMPercentage=80 -XX:+UseStringDeduplication -XshowSettings:vm"
Performance is good, but vertical scaling was doing nothing, but only horizontal scaling provided better performance. CPU usage reported by Kubernetes is about 1 and has no throttling.
What I have done so far
Read multiple articles found via Google, but any not give me a proper
answer or explanation.
I have tried various settings for CPU and
memory resource limits on Kubernetes spec, but it was not doing what I expected. I expected, which is lowering response time and the ability to process more requests.
Scaling vertically does not help either, everything was still slow.
Scaled horizontally pods with low CPU and memory with specified Xms, Xmx, ... it does that, pods were stable, but not performant as possible. (I think.) And also some throttling of CPU also if CPU was not fully used.
Question
I have a big problem with properly setting memory and CPU on Kubernetes. I do not understand why memory usage is increasing when I give it more CPU (Xss is the default value) and usage is the same. The pod is not OOM killed only if a gap between committed memory and used memory is about 1 GB (for the second application example) or 500MB (for the first application example).
If I set Xmx, Xms, MetspaceSite, and Xss, then I can achieve lower memory usage and CPU usage. But in this case, increasing the pod memory limit is complicated because it is not defined as a percentage and I must every time calculate each Java opt.
Also If give application too much memory, at the begging it will start at some level, but after some time it every times goes to limit (until gap between committed and heap memory is 1GB - 500MB) and stays there.
So, how is the proper way to find the best resource settings and Java opts for Springboot applications running on Kubernetes? Should I give the application big resources and after some time, something like 7 days, lower it by maximal values on metrics?
I have big list (up to 500 000) of some functions.
My task is to generate some graph for each function (it can be do independently from other functions) and dump output to the file (it can be several files).
The process of generating graphs can be time consuming.
I also have server with 40 physical cores and 128GB ram.
I have tried to implement parallel processing using java Threads/ExecutorPool, but it seems not to use processors all resources.
On some inputs the program takes up to 25 hours to run and only 10-15 cores are working according to htop.
So the second thing I've tried is to create 40 distinct processes (using Runtime.exec) and split the list among them.
This method uses processor all resources (100% load on all 40 cores) and speedups performance up to 5 times on previous example (it takes only 5 hours which is reasonable for my task).
But the problem of this method is that, each java process runs separately and consumes memory independently from others. Is some scenarios all 128gb of ram is consumed after 5 minutes of parallel work. One solution that I am using now is to call System.gc() for each process if Runtime.totalMemory > 2GB. This slows down overall performance a bit (8 hours on previous input) but lefts memory usage in reasonable boundaries.
But this configuration works only for my server. If you run it on the server with 40 core and 64GB run, you need to tune Runtime.totalMemory > 2GB condition.
So the question is what is the best way to avoid such aggressive memory consuming?
Is it normal practice to run separate processes to do parallel jobs?
Is there any other parallel method in Java (maybe fork/join?) which uses 100% physical resources of processor.
You don't need to call System.gc() explicitly! The JVM will do it automatically when needed, and almost always does it better. You should, however, set the max heap size (-Xmx) to a number that works well.
If your program won't scale further you have some kind of congestion. You can either analyse your program and your java- and system settings and figure out why, or run it as multiple processes. If each process is multi-threaded, then you may get better performance using 5-10 processes instead of 40.
Note that you may get higher performance with more than one thread per core. Fiddle around with 1-8 threads per core and see if throughput increases.
From your description it sounds like you have 500,000 completely independent items of work and that each work item doesn't really need a lot of memory. If that is true, then memory consumption isn't really an issue. As long as each process has enough memory so it doesn't have to gc very often then gc isn't going to affect the total execution time by much. Just make sure you don't have any dangling references to objects you no longer need.
One of the problems here: it is still very hard to understand how many threads, cores, ... are actually available.
My personal suggestion: there are several articles on the java specialist newsletter which do a very deep dive into this subject.
For example this one: http://www.javaspecialists.eu/archive/Issue135.html
or a more recent new, on "the number of available processors": http://www.javaspecialists.eu/archive/Issue220.html
I have a java app. Its use is to automate login to websites and create wifi hotspots. It is a GUI app with many features such as a notification manager and system tray. My JAR file has a size of 3 MB but, it consumes about 100 MB of RAM. Should I be worried?
I checked if any of my methods were recursive and I could not find any.
My java app's code can be found here : https://github.com/mavrk/bitm-cyberoam-client/tree/master/JavaApplication13
A one line program can use 8 GB of memory.
If you are concerned about the size of the heap you can either
reduce it further, though this might slow the application or prevent it from working.
use a memory profiler to see where the memory is being utilised.
not worry about about 50 cents worth of memory. If you are minimum wage you shouldn't spend more then 6 minutes on it or your time will be worth more than the memory you save.
I'm using Jmeter to inject workload to an application deployed on an AWS EC2 instance. The test has to be very huge: it lasts for 10 hours and the workload profile has a bimodal shapes with a pitch of about 2600 requests in 5 minutes. Actually I have one m3.xlarge instance in which the application is deployed and 8 m3.xlarge instances each one running a jmeter instance. With a python script the workload to inject is splitted among the 8 client instances so in example if the original workload as to inject 800 requests, each jmeter instance will inject 100 requests. The full test as I said lasts for 10 hours and is divided into timesteps of 5 min each. Every 5 min a little workload variation is applied. Actually I get from each jmeter instance the java.lang.OutOfMemoryError: GC overhead limit exceeded error immediatly after the test is started and no request arrive to the application. I read a lot online and on stackoverflow, and I concluded the possible mistake could be:
JMV heap size too low-> I solved setting the following in the jmeter.bat files in each jmeter instance:
set HEAP=-Xms4g -Xmx4g
set NEW=-XX:NewSize=4g -XX:MaxNewSize=4g
some mistakes in the code that results in a continue unuseful usage of the garbage collector. So I remove from my test all the jmeter listeners. In particular I was using TableVisualizer, ViewResultsFullVisualizer, StatVisualizer, and GraphVisualizer.
Anyway the problem persists. I really have no idea about how to solve it. I know 10 hours of test with 2600 pitch request could be a very heavy test, but I think there should be a way to perform this. I'm using EC2 m3.xlarge instance so I could even raise the heap size to 8G if it could be useful, or splitting the workload among even more clients since I'm using spot instances so I will not pay so much more, but since I have already doubled the number of client instance from 4 to 8 in order to solve the problem and is doesn't work I'm a little bit confused and I want to know r suggestions before continue to get more and more resources.
Thank you a lot in advance.
Your heap settings look wrong:
set HEAP=-Xms4g -Xmx4g
set NEW=-XX:NewSize=4g -XX:MaxNewSize=4g
Your new is equal to Heap size, this is wrong. Comment NEW part first.
Can you do a ps -eaf|grep java and show the output ?
And also check you respect these recommendations:
http://jmeter.apache.org/usermanual/best-practices.html
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Finally, show an overview of your Test plan and , number of threads that you start.
I am doing machine learning in java using GATE Learning. I have a huge data set of documents to learn from. While using netbeans, I was getting java heap space error. So I provided 1600MB in the -Xmx parameter. Now, I do not get the heap space error but it takes ample of time to run!! (runs for 90 mins and I had to stop the process since I lost my patience!).
I do not understand whether I should increase my RAM(currently 4GB) or upgrade my OS(currently XP SP3, I have heard vista and win 7 better utilize RAM and Processor) or upgrade my processor(currently Dual Core E5500 2.80 GHz)?
Please throw some insight into what I can do to make this process run faster!
Thanks Rishabh
Before you can answer what will make it run faster, you have to find the bottleneck.
I'm not very familiar with Windows, but there is some sort of system load monitoring widget, IIRC.
What I would do is as follows:
Create some datasets of increasing sizes (more documents)
Run your program against those datasets
On each run, work out if the CPU maxes out, or if the memory maxes out and starts swapping, or if the whole thing is IO bound
Then fix the one that is causing the problem.
Just for context, it's not that unusual for ML algorithms to take a long time to run on large data sets. You can use the above approach to plot out the run time as the size of the input datasets increase, at least then you'll know if your program would have stopped in 100 minutes or 100 centuries.
Get a Profiler such as VisualVM or YourKit - start your programm - connect the Profiler to your running program - Find out, which methods and objects are your bottleneck - then at least you know where to start improving your program.