I have a java+camel bases application which is consuming huge amount of CPU. it various from 40-90% of CPU in production. I tried to replicate the issue in my local environment and started 700 routes (file endpoints) and it is consistently taking 70-80% CPU.
I want to know is there any way I can reduce the CPU utilization by configuring some setting while starting up the routes?
regards
sanjay
Related
My goal
I am trying to understand and figure out best practices or optimal ways or best methods how to properly set up resources for low performance-hungry and high performance-hungry Java Springboot app.
Examples
First example:
Let have low performance-hungry Springboot App which only computes a key for cache and calls Redis for data with the key. I have tried two configurations of resources and java opts.
replicaCount: 4
memory: 1.9Gi
cpuRequest: 800m
cpuLimit: 1200m
JAVA_OPTS: "-XX:+UseG1GC -XX:MaxRAM=1792m -Xmx768m -Xms768m -XX:MaxMetaspaceSize=1024m -XshowSettings:vm -XX:ActiveProcessorCount=2"
Performance is good and the application has a response time around 3.5 ms in the median. And for 0.99 percentile it is 90ms. GC pause count was 0.4 per minute and pause duration 20ms. Also little throttling.
Then I have tried this setup:
replicaCount: 4
memory: 3Gi
cpuRequest: 800m
cpuLimit: 10000m
JAVA_OPTS: "-XX:+UseG1GC -XX:MaxRAMPercentage=80 -XX:InitialRAMPercentage=80 -XshowSettings:vm"
The application was more memory-hungry but the response time was the same, 3.5ms in the median. But only on 0.99 percentile was 72 ms. GC pause count was lower, something about 0.1 per minute, and pause duration 5 ms. Throttling during startup but no throttling during the run.
Second example:
Performance hungry Springboot application which loads data from DB calculates multiple things like the distance between two points, calculation of price for multiple deliveries, filtering possible transport or pickup points for the package.
It was running on 4 VPS with 4 CPUs and 4 GB. But on Kubernetes, it needs more resources than I expected.
replicaCount: 4
memory: 7.5Gi
cpuRequest: 2000m
cpuLimit: 50000m
JAVA_OPTS: "-XX:+UseG1GC -XX:MaxRAMPercentage=80 -XX:InitialRAMPercentage=80 -XX:+UseStringDeduplication -XshowSettings:vm"
Performance is good, but vertical scaling was doing nothing, but only horizontal scaling provided better performance. CPU usage reported by Kubernetes is about 1 and has no throttling.
What I have done so far
Read multiple articles found via Google, but any not give me a proper
answer or explanation.
I have tried various settings for CPU and
memory resource limits on Kubernetes spec, but it was not doing what I expected. I expected, which is lowering response time and the ability to process more requests.
Scaling vertically does not help either, everything was still slow.
Scaled horizontally pods with low CPU and memory with specified Xms, Xmx, ... it does that, pods were stable, but not performant as possible. (I think.) And also some throttling of CPU also if CPU was not fully used.
Question
I have a big problem with properly setting memory and CPU on Kubernetes. I do not understand why memory usage is increasing when I give it more CPU (Xss is the default value) and usage is the same. The pod is not OOM killed only if a gap between committed memory and used memory is about 1 GB (for the second application example) or 500MB (for the first application example).
If I set Xmx, Xms, MetspaceSite, and Xss, then I can achieve lower memory usage and CPU usage. But in this case, increasing the pod memory limit is complicated because it is not defined as a percentage and I must every time calculate each Java opt.
Also If give application too much memory, at the begging it will start at some level, but after some time it every times goes to limit (until gap between committed and heap memory is 1GB - 500MB) and stays there.
So, how is the proper way to find the best resource settings and Java opts for Springboot applications running on Kubernetes? Should I give the application big resources and after some time, something like 7 days, lower it by maximal values on metrics?
We are using Groovy, Grill, web socket and REST API in our application. Our production server shows high CPU as soon as there are 10+ users simultaneously accessing the app. On checking the server health I can see some logging activity showing high CPU (screenshot attached). I wanted to know if excess of logging contributes to high CPU utilization. Definitely there could be more other reasons for high CPU.
I'm running hadoop and have 2 identically configured servers in the cluster. They're running the same task, same configuration, same everything, and both are totally dedicated as hadoop task nodes (workers).
The job I'm running through this cluster is highly IO bound.
On one server I see 60-100MB/sec of IO and a CPU load of 5-10, on the other server I see 40-60MB/sec of IO and a CPU load of 60-90 (and the box is almost unusable in terms of even running a simple shell).
I've run smartctl and don't get any disk warnings.
Any suggestsions on what I might do next to identify the root difference between these boxes? These results have been consistent over many hours of processing.
It smells of partition misalignment on 4096-byte physical / 512-byte logical disk sectors.
I'm trying to speed test jetty (to compare it with using apache) for serving dynamic content.
I'm testing this using three client threads requesting again as soon as a response comes back.
These are running on a local box (OSX 10.5.8 mac book pro). Apache is pretty much straight out of the box (XAMPP distribution) and I've tested Jetty 7.0.2 and 7.1.6
Apache is giving my spikey times : response times upto 2000ms, but an average of 50ms, and if you remove the spikes (about 2%) the average is 10ms per call. (This was to a PHP hello world page)
Jetty is giving me no spikes, but response times of about 200ms.
This was calling to the localhost:8080/hello/ that is distributed with jetty, and starting jetty with java -jar start.jar.
This seems slow to me, and I'm wondering if its just me doing something wrong.
Any sugestions on how to get better numbers out of Jetty would be appreciated.
Thanks
Well, since I am successfully running a site with some traffic on Jetty, I was pretty surprised by your observation.
So I just tried your test. With the same result.
So I decompiled the Hello Servlet which comes with Jetty. And I had to laugh - it really includes following line:
Thread.sleep(200L);
You can see for yourself.
My own experience with Jetty performance: I ran multi threaded load tests on my real-world app where I had a throughput of about 1000 requests per second on my dev workstation...
Note also that your speed test is really just a latency test, which is fine so long as you know what you are measuring. But Jetty does trade off latency for throughput, so often there are servers with lower latency, but also lower throughput as well.
Realistic traffic for a webserver is not 3 very busy connections - 1 browser will open 6 connections, so that represents half a user. More realistic traffic is many hundreds or thousands of connections, each of them mostly idle.
Have a read of my blogs on this subject:
https://webtide.com/truth-in-benchmarking/
and
https://webtide.com/lies-damned-lies-and-benchmarks-2/
You should definitely check it with profiler. Here are instructions how to setup remote profiling with Jetty:
http://sujitpal.sys-con.com/node/508048/mobile
Speedup or performance tune any application or server is really hard to get done in my experience. You'll need to benchmark several times with different work models to define what your peak load is. Once you define the peak load for the configuration/environment mixture you need to tune and benchmark, you might have to run 5+ iterations of your benchmark. Check the configuration of both apache/jetty in terms of number of working threads to process the request and get them to match if possible. Here are some recommendations:
Consider the differences of the two environments (GC in jetty, consider tuning you min and max memory threshold to the same size and later proceed to execute your test)
The load should come from another box. If you don't have a second box/PC/server take your CPU/core into count and setup your the test to a specific CPU, do the same for jetty/apache.
This is given that you cant get another machine to be the stress agent.
Run several workload model
Moving to modeling the test do the following 2 stages:
One Thread for each configuration for 30 minutes.
Start with 1 thread and going up to 5 with a 10 minutes interval to increase the count,
Base on the metrics Stage 2 define a number of threads for the test. and run that number of thread concurrent for 1 hour.
Correlate the metrics (response times) from your testing app to the server hosting the application resources (use sar, top and other unix commands to track cpu and memory), some other process might be impacting you app. (memory is relevant for apache jetty will be constraint to the JVM memory configuration so it should not change the memory usage once the server is up and running)
Be aware of the Hotspot Compiler.
Methods have to be called several times (1000 times ?), before the are compiled into native code.
We're currently testing out Alfresco Community on an old server (only 1GB of RAM). Because this is the Community version we need to restart it every time we change the configuration (we're trying to add some features like generating previews of DWG files etc). However, restarting takes a very long time (about 4 minutes I think). This is probably due to the limit amount of memory available. Does anybody know some features or settings that can improve this restart time?
As with all performance issues there is rarely a magic bullet.
Memory pressure - the app is starting up but the 512m heap is only just enough to fit the applications in and it is spending half of the start up time running GC.
Have a look at any of the following:
1. -verbose:gc
2. jstat -gcutil
2. jvisualvm - much nicer UI
You are trying to see how much time is being spent in GC, look for many full garbage collection events that don't reclaim much of the heap ie 99% -> 95%.
Solution - more heap, nothing else for it really.
You may want to try -XX:+AggressiveHeap in order to get the JVM to max out it's memory usage on the box, only trouble is with only 1gb of memory it's going to be limited. List of all JVM options
Disk IO - the box it's self is not running at close to 100% CPU during startup (assuming 100% of a single core, startup is normally single threaded) then there may be some disk IO that the application is doing that is the bottle neck.
Use the operating system tools such as Windows Performance monitor to check for disk IO. It maybe that it isn't the application causing the IO it could be swap activity (page faulting)
Solution: either fix the app (not to likely) or get faster disks/computer or more physical memory for the box
Two of the most common reasons why Tomcat loads slowly:
You have a lot of web applications. Tomcat takes some time to create the web context for each of those.
Your webapp have a large number of files in a web application directory. Tomcat scans the web application directories at startup
also have a look at java performance tuning whitepaper, further I would recomend to you Lambda Probe www.lambdaprobe.org/d/index.htm to see if you are satisfied with your gcc settings, it has nice realtime gcc and memory tracking for tomcat.
I myself have Alfresco running with the example 4.2.6 from java performance tuning whitepaper:
4.2.6 Tuning Example 6: Tuning for low pause times and high throughput
Memory settings are also very nicely explained in that paper.
kind regards Mahatmanich