I'm using Jmeter to inject workload to an application deployed on an AWS EC2 instance. The test has to be very huge: it lasts for 10 hours and the workload profile has a bimodal shapes with a pitch of about 2600 requests in 5 minutes. Actually I have one m3.xlarge instance in which the application is deployed and 8 m3.xlarge instances each one running a jmeter instance. With a python script the workload to inject is splitted among the 8 client instances so in example if the original workload as to inject 800 requests, each jmeter instance will inject 100 requests. The full test as I said lasts for 10 hours and is divided into timesteps of 5 min each. Every 5 min a little workload variation is applied. Actually I get from each jmeter instance the java.lang.OutOfMemoryError: GC overhead limit exceeded error immediatly after the test is started and no request arrive to the application. I read a lot online and on stackoverflow, and I concluded the possible mistake could be:
JMV heap size too low-> I solved setting the following in the jmeter.bat files in each jmeter instance:
set HEAP=-Xms4g -Xmx4g
set NEW=-XX:NewSize=4g -XX:MaxNewSize=4g
some mistakes in the code that results in a continue unuseful usage of the garbage collector. So I remove from my test all the jmeter listeners. In particular I was using TableVisualizer, ViewResultsFullVisualizer, StatVisualizer, and GraphVisualizer.
Anyway the problem persists. I really have no idea about how to solve it. I know 10 hours of test with 2600 pitch request could be a very heavy test, but I think there should be a way to perform this. I'm using EC2 m3.xlarge instance so I could even raise the heap size to 8G if it could be useful, or splitting the workload among even more clients since I'm using spot instances so I will not pay so much more, but since I have already doubled the number of client instance from 4 to 8 in order to solve the problem and is doesn't work I'm a little bit confused and I want to know r suggestions before continue to get more and more resources.
Thank you a lot in advance.
Your heap settings look wrong:
set HEAP=-Xms4g -Xmx4g
set NEW=-XX:NewSize=4g -XX:MaxNewSize=4g
Your new is equal to Heap size, this is wrong. Comment NEW part first.
Can you do a ps -eaf|grep java and show the output ?
And also check you respect these recommendations:
http://jmeter.apache.org/usermanual/best-practices.html
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Finally, show an overview of your Test plan and , number of threads that you start.
Related
So I am wondering if someone can help me please, I am trying to load test a java rest application with thousands of requests per minute but something is a miss and I can't quite figure out what's happening.
I am executing the load test from my laptop (via terminal) and it's hitting a number of servers in AWS.
I am using the infinite loop function to fire in a lot of requests repeatedly. The 3 thread groups have the same config.
The problem I am having is the CPU is rising very high and the numbers do not match on what I have in production with the same enviornment with regards to CPU etc, the JMeter load test seems to be making the CPU work harder on my test enviorment.
Question 1 - Is my load test too fast for the server?
Question 2 - Is there a way to space out the load test so that I can say 10k rpm exactly?
Question 1 - Is my load test too fast for the server? - we don't know, if you see high CPU usage using 60 threads only and application responds slowly due to high CPU usage it looks like a bottleneck. Another question is the size of the machine, the number of processors and their frequency. So you need to find which function is consuming the CPU cycles using a profiler tool and look for the way to optimize the function
Question 2 - Is there a way to space out the load test so that I can say 10k rpm exactly? - it is, check out Constant Throughput Timer, but be aware of the next 2 facts:
Constant Throughput Timer can only pause the threads to limit JMeter's requests execution speed to the specified number of requests per minute. So you need to make sure to create sufficient number of threads in Thread Group(s) to produce the desired load, 60 might be not enough.
Application needs to be able to respond fast enough, i.e. 10000 requests per minute is approx 166 requests per second, with 60 threads it means that each thread needs to execute 2.7 requests per second which means that response time needs to be 370 ms or less
There are different aspects before we got for 10k requests.
Configure the tests for one user(/thread) and execute. Check for all
the request we are getting a proper response.
Incrementally increase the number of threads from 1 user, 5 users, 10 users, 20
users, 50 users etc.
Try for different duration scenarios like 10mins, 20 mins, 30 mins, 1 hour etc.
Collect metrics like error %, response time, number of request etc..
You can check probable breakpoints like:
CPU utilisaztion of machine getting high(100 %) from where you are executing the tests. in this case, you can setup few machines in master-slave configuration
error % getting high. Server may not be able to respond, so it might have crashed.
response time getting high. server may be getting busy due to the load.
Also, make sure, you have a reliable connectivity and bandwidth. Just imagine, you want to give a huge load, but the connection you have in few kbps. your tests will fail due to this.
I am trying to use JMeter to test an ActiveMQ cluster. As per requirements, I need to get at least 2k messages per second as a test. The issue is that I can't get to the required number of messages.
I am trying to test it against a local queue before going into the cluster, and the results are not good. In a PC (quite beefy) with Windows 10 installed, the best I can do is a few hundred messages per second. In a Mac (Macbook Pro) with OSX 10, I can pump it up to around 1.5k.
I have tried different configurations in JMeter: varying the number of threads, size of messages, Request&Response mode vs Request only... But nothing does the trick.
When I run custom code, I can push around 10k messages into the queue in a second. Are there any particular configurations that I might be missing? I have been through the tutorials online, but I can't find anything that fixes the issue.
JMeter default configuration is good for tests development and debugging, but when it comes to conducting the high load you need to remember several important points:
Don't use GUI for tests execution, you are supposed to be running tests using non-GUI mode
Default JVM Heap allocation is 512 Mb only, you will definitely need to raise this setting in JMeter startup script. Same applies to stack size and garbage collector settings. See JVM Tuning: Heapsize, Stacksize and Garbage Collection Fundamental article to learn more about JVM internals.
Don't use Listeners during the load test, they cause huge overhead in terms of resources utilization and don't add any value.
Reduce usage of Pre/Post Processors and Assertions to the absolute minimum.
See 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure for above points explained and few more tips.
As a last resort in case you hit the hardware limits of a single load generator machine you can always consider running JMeter in distributed mode and add more JMeter engines.
I found the answer after fiddling with it for hours. Turns out there is a checkbox that is unticked by default which makes all messages persistent. When I ticked it, I got the throughput that I was looking for.
I have big list (up to 500 000) of some functions.
My task is to generate some graph for each function (it can be do independently from other functions) and dump output to the file (it can be several files).
The process of generating graphs can be time consuming.
I also have server with 40 physical cores and 128GB ram.
I have tried to implement parallel processing using java Threads/ExecutorPool, but it seems not to use processors all resources.
On some inputs the program takes up to 25 hours to run and only 10-15 cores are working according to htop.
So the second thing I've tried is to create 40 distinct processes (using Runtime.exec) and split the list among them.
This method uses processor all resources (100% load on all 40 cores) and speedups performance up to 5 times on previous example (it takes only 5 hours which is reasonable for my task).
But the problem of this method is that, each java process runs separately and consumes memory independently from others. Is some scenarios all 128gb of ram is consumed after 5 minutes of parallel work. One solution that I am using now is to call System.gc() for each process if Runtime.totalMemory > 2GB. This slows down overall performance a bit (8 hours on previous input) but lefts memory usage in reasonable boundaries.
But this configuration works only for my server. If you run it on the server with 40 core and 64GB run, you need to tune Runtime.totalMemory > 2GB condition.
So the question is what is the best way to avoid such aggressive memory consuming?
Is it normal practice to run separate processes to do parallel jobs?
Is there any other parallel method in Java (maybe fork/join?) which uses 100% physical resources of processor.
You don't need to call System.gc() explicitly! The JVM will do it automatically when needed, and almost always does it better. You should, however, set the max heap size (-Xmx) to a number that works well.
If your program won't scale further you have some kind of congestion. You can either analyse your program and your java- and system settings and figure out why, or run it as multiple processes. If each process is multi-threaded, then you may get better performance using 5-10 processes instead of 40.
Note that you may get higher performance with more than one thread per core. Fiddle around with 1-8 threads per core and see if throughput increases.
From your description it sounds like you have 500,000 completely independent items of work and that each work item doesn't really need a lot of memory. If that is true, then memory consumption isn't really an issue. As long as each process has enough memory so it doesn't have to gc very often then gc isn't going to affect the total execution time by much. Just make sure you don't have any dangling references to objects you no longer need.
One of the problems here: it is still very hard to understand how many threads, cores, ... are actually available.
My personal suggestion: there are several articles on the java specialist newsletter which do a very deep dive into this subject.
For example this one: http://www.javaspecialists.eu/archive/Issue135.html
or a more recent new, on "the number of available processors": http://www.javaspecialists.eu/archive/Issue220.html
I often run workloads against my own web applications to try find performance issues.
Sometimes I see memory leaks etc at variable duration.
So I created a bash script to take javacores, kill -3 pid , every minute for 10 minutes and script is executed on the hour.
For a workload that runs for 120 hours, this will produce 1200 javacores.
I'm wondering,
Is this overkill? I'd like a continuous view of system (javacore every 5 minutes, for 120 hours), but don't want to impact perf
what is a reasonable frequency to automatically capture javacores against servlet based app?
Looks like we are looking at two issues:
Performance
OutOfMemoryError
Performance: for performance, determine the longest request you can tolerate and generate the javacores when its 3 to 5 times that amount. (Anything below 5 minutes to me is fine tuning and can difficult)
Let's say your longest request you want is 3 minutes, I would generate 3 javacores at evenly from 9 minutes to 15 minutes.
I usually suggest the link (collect manually) below but if you already wrote your own script use it
"MustGather: Performance, Hang or High CPU Issues on Linux"
http://www.ibm.com/support/docview.wss?rs=180&uid=swg21115785
OutOfMemoryError: see if your product is leaking, follow the steps in URL below and go to collect manually and google IBM heap analyzer (stand alone and free) and review the heap dump for potential leak suspects.
"MustGather: Native Memory Issues on Linux"
http://www.ibm.com/support/docview.wss?rs=180&uid=swg21138462
Personnaly, I prefer looking at heap dumps memory use to equal the XMX or nearly that.
Since this is an IBM JVM you could try using Health Center instead of taking javacores regularly:
http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/
This has profiling and memory monitoring views so should give you the data you are looking and save you analysing the javacore files yourself.
I'm trying to speed test jetty (to compare it with using apache) for serving dynamic content.
I'm testing this using three client threads requesting again as soon as a response comes back.
These are running on a local box (OSX 10.5.8 mac book pro). Apache is pretty much straight out of the box (XAMPP distribution) and I've tested Jetty 7.0.2 and 7.1.6
Apache is giving my spikey times : response times upto 2000ms, but an average of 50ms, and if you remove the spikes (about 2%) the average is 10ms per call. (This was to a PHP hello world page)
Jetty is giving me no spikes, but response times of about 200ms.
This was calling to the localhost:8080/hello/ that is distributed with jetty, and starting jetty with java -jar start.jar.
This seems slow to me, and I'm wondering if its just me doing something wrong.
Any sugestions on how to get better numbers out of Jetty would be appreciated.
Thanks
Well, since I am successfully running a site with some traffic on Jetty, I was pretty surprised by your observation.
So I just tried your test. With the same result.
So I decompiled the Hello Servlet which comes with Jetty. And I had to laugh - it really includes following line:
Thread.sleep(200L);
You can see for yourself.
My own experience with Jetty performance: I ran multi threaded load tests on my real-world app where I had a throughput of about 1000 requests per second on my dev workstation...
Note also that your speed test is really just a latency test, which is fine so long as you know what you are measuring. But Jetty does trade off latency for throughput, so often there are servers with lower latency, but also lower throughput as well.
Realistic traffic for a webserver is not 3 very busy connections - 1 browser will open 6 connections, so that represents half a user. More realistic traffic is many hundreds or thousands of connections, each of them mostly idle.
Have a read of my blogs on this subject:
https://webtide.com/truth-in-benchmarking/
and
https://webtide.com/lies-damned-lies-and-benchmarks-2/
You should definitely check it with profiler. Here are instructions how to setup remote profiling with Jetty:
http://sujitpal.sys-con.com/node/508048/mobile
Speedup or performance tune any application or server is really hard to get done in my experience. You'll need to benchmark several times with different work models to define what your peak load is. Once you define the peak load for the configuration/environment mixture you need to tune and benchmark, you might have to run 5+ iterations of your benchmark. Check the configuration of both apache/jetty in terms of number of working threads to process the request and get them to match if possible. Here are some recommendations:
Consider the differences of the two environments (GC in jetty, consider tuning you min and max memory threshold to the same size and later proceed to execute your test)
The load should come from another box. If you don't have a second box/PC/server take your CPU/core into count and setup your the test to a specific CPU, do the same for jetty/apache.
This is given that you cant get another machine to be the stress agent.
Run several workload model
Moving to modeling the test do the following 2 stages:
One Thread for each configuration for 30 minutes.
Start with 1 thread and going up to 5 with a 10 minutes interval to increase the count,
Base on the metrics Stage 2 define a number of threads for the test. and run that number of thread concurrent for 1 hour.
Correlate the metrics (response times) from your testing app to the server hosting the application resources (use sar, top and other unix commands to track cpu and memory), some other process might be impacting you app. (memory is relevant for apache jetty will be constraint to the JVM memory configuration so it should not change the memory usage once the server is up and running)
Be aware of the Hotspot Compiler.
Methods have to be called several times (1000 times ?), before the are compiled into native code.