We have linux servers running about 200 microservices in java each, using c-groups for isolation for cpu and memory isolation. One thing we have discovered is that java tends to require a full core of CPU to perform (major) garbage collection efficiently.
However obviously with 200 apps running and only 24 CPUs if they all decided to GC at the same time they would be limited by the c-groups. Since typical application CPU usage is relatively small (say about 15% of 1 cpu peak), it would be nice to find a way to ensure they don't all GC at the same time.
I'm looking into how we can schedule GCs so that each microserevice does not GC at the same time so that we can still run over 200 apps per host, but was wondering if anybody had some suggestions or experience on this topic before trying to re invent the wheel.
I found that there are command line methods that we can use, as well as using MBeans to actually trigger the GC, but read that it is not advised to do so as this will mess up the non-deterministic procedure java uses for GC.
Something that I'm thinking about is using performance metrics to monitor cpu, memory, and traffic to try and predict a GC, then if multiple are about to GC, perhaps we could trigger them one at a time, however this might be impractical or also bad idea.
We are running java 7, and 8.
You can't schedule a GC, since it depends on the allocation rate - i.e. it depends on the load and application logic. Some garbage collectors try to control major GC duration, total time consumed by GC, but not the rate. There is no guarantee as well that an externally triggered GC (e.g. via MBean) will actually run, and if it will run it might be ran later then it was triggered.
As other guys pointed, it is a very rare possibility (you can calculate it by gathering an average period in seconds of a major GC from all of your apps and bulding a histogram) to face it under the "normal" load. Under the "heavy" load you'll likely face CPU shortage much earlier than a probable simultaneous GC from an increased allocation rate will happen - because you would need to have a "lot" (depending on the size of your objects) of "long"-living objects to pollute the old generation to trigger a major GC.
We are using IBM AIX 5.3 OS.
Using EBS Application 12.1.3 with oracle 10 g 10.2.0.4 with RAC environment.
The EBS Application is slow even though CPU and Memory Utilization is normal and in DB also the CPU and Memory Utilization is normal.
There are no locking sessions and blocking session.
The Concurrent Programs are running slowly and even the whole application is slow.
Please anyone let me know why this kind of issue happens and what should be done in such kind of issues.
Thanks !
I have a java+camel bases application which is consuming huge amount of CPU. it various from 40-90% of CPU in production. I tried to replicate the issue in my local environment and started 700 routes (file endpoints) and it is consistently taking 70-80% CPU.
I want to know is there any way I can reduce the CPU utilization by configuring some setting while starting up the routes?
regards
sanjay
I'm reading about virtual memory swapping and it says that pages of memory can be swapped when an application becomes idle. I've tried to google what that means but haven't found much elaborate information except for this stackoverflow answer:
Your WinForms app is driven by a message loop that pulls messages out
of a queue. When that queue is emptied, the message loop enters a
quiet state, sleeping efficiently until the next message appears in
the message queue. This helps conserve CPU processing resources
(cycles wasted spinning in a loop takes CPU time away from other
processes running on the machine, so everything feels slower) and also
helps reduce power consumption / extend laptop battery life.
So does the application become idle when there's no messages in the message queue?
The operating system decides what idle means. In general, that means that the application doesn't actively utilize system resources (like processor cycles, IO operations, etc).
However, that doesn't mean that application's pages in memory will not be swapped if the application isn't 'idle'. There may be many 'active' applications which contend for the same limited physical memory and the OS may be forced to swap some pages belonging to an active application to make room for another active application.
One of our servers is experiencing a very high CPU load with our application. We've looked at various stats and are having issues finding the source of the problem.
One of the current theories is that there are too many threads involved and that we should try to reduce the number of concurrently executing threads. There's just one main thread pool, with 3000 threads, and a WorkManager working with it (this is Java EE - Glassfish). At any given moment, there are about 620 separate network IO operations that need to be conducted in parallel (use of java.NIO is not an option either). Moreover, there are roughly 100 operations that have no IO involved and are also executed in parallel.
This structure is not efficient and we want to see if it is actually causing damage, or is simply bad practice. Reason being that any change is quite expensive in this system (in terms of man hours) so we need some proof of an issue.
So now we're wondering if context switching of threads is the cause, given there are far more threads than the required concurrent operations. Looking at the logs, we see that on average there are 14 different threads executed in a given second. If we take into account the existence of two CPUs (see below), then it is 7 threads per CPU. This doesn't sound like too much, but we wanted to verify this.
So - can we rule out context switching or too-many-threads as the problem?
General Details:
Java 1.5 (yes, it's old), running on CentOS 5, 64-bit, Linux kernel 2.6.18-128.el5
There is only one single Java process on the machine, nothing else.
Two CPUs, under VMware.
8GB RAM
We don't have the option of running a profiler on the machine.
We don't have the option of upgrading the Java, nor the OS.
UPDATE
As advised below, we've conducted captures of load average (using uptime) and CPU (using vmstat 1 120) on our test server with various loads. We've waited 15 minutes between each load change and its measurements to ensure that the system stabilized around the new load and that the load average numbers are updated:
50% of the production server's workload: http://pastebin.com/GE2kGLkk
34% of the production server's workload: http://pastebin.com/V2PWq8CG
25% of the production server's workload: http://pastebin.com/0pxxK0Fu
CPU usage appears to be reduced as the load reduces, but not on a very drastic level (change from 50% to 25% is not really a 50% reduction in CPU usage). Load average seems uncorrelated with the amount of workload.
There's also a question: given our test server is also a VM, could its CPU measurements be impacted by other VMs running on the same host (making the above measurements useless)?
UPDATE 2
Attaching the snapshot of the threads in three parts (pastebin limitations)
Part 1: http://pastebin.com/DvNzkB5z
Part 2: http://pastebin.com/72sC00rc
Part 3: http://pastebin.com/YTG9hgF5
Seems to me the problem is 100 CPU bound threads more than anything else. 3000 thread pool is basically a red herring, as idle threads don't consume much of anything. The I/O threads are likely sleeping "most" of the time, since I/O is measured on a geologic time scale in terms of computer operations.
You don't mention what the 100 CPU threads are doing, or how long they last, but if you want to slow down a computer, dedicating 100 threads of "run until time slice says stop" will most certainly do it. Because you have 100 "always ready to run", the machine will context switch as fast as the scheduler allows. There will be pretty much zero idle time. Context switching will have impact because you're doing it so often. Since the CPU threads are (likely) consuming most of the CPU time, your I/O "bound" threads are going to be waiting in the run queue longer than they're waiting for I/O. So, even more processes are waiting (the I/O processes just bail out more often as they hit an I/O barrier quickly which idles the process out for the next one).
No doubt there are tweaks here and there to improve efficiency, but 100 CPU threads are 100 CPU threads. Not much you can do there.
I think your constraints are unreasonable. Basically what you are saying is:
1.I can't change anything
2.I can't measure anything
Can you please speculate as to what my problem might be?
The real answer to this is that you need to hook a proper profiler to the application and you need to correlate what you see with CPU usage, Disk/Network I/O, and memory.
Remember the 80/20 rule of performance tuning. 80% will come from tuning your application. You might just have too much load for one VM instance and it could be time to consider solutions for scaling horizontally or vertically by giving more resources to the machine. It could be any one of the 3 billion JVM settings are not inline with your application's execution specifics.
I assume the 3000 thread pool came from the famous more threads = more concurrency = more performance theory. The real answer is a tuning change isn't worth anything unless you measure throughput and response time before/after the change and compared the results.
If you can't profile, I'd recommend taking a thread dump or two and seeing what your threads are doing. Your app doesn't have to stop to do it:
http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/threads.html
http://java.net/projects/tda/
http://java.sys-con.com/node/1611555
So - can we rule out context switching or too-many-threads as the problem?
I think you concerns over thrashing are warranted. A thread pool with 3000 threads (700+ concurrent operations) on a 2 CPU VMware instance certainly seems like a problem that may be causing context switching overload and performance problems. Limiting the number of threads could give you a performance boost although determining the right number is going to be difficult and probably will use a lot of trial and error.
we need some proof of an issue.
I'm not sure the best way to answer but here are some ideas:
Watch the load average of the VM OS and the JVM. If you are seeing high load values (20+) then this is an indicator that there are too many things in the run queues.
Is there no way to simulate the load in a test environment so you can play with the thread pool numbers? If you run simulated load in a test environment with pool size of X and then run with X/2, you should be able to determine optimal values.
Can you compare high load times of day with lower load times of day? Can you graph number of responses to latency during these times to see if you can see a tipping point in terms of thrashing?
If you can simulate load then make sure you aren't just testing under the "drink from the fire hose" methodology. You need simulated load that you can dial up and down. Start at 10% and slowing increase simulated load while watching throughput and latency. You should be able to see the tipping points by watching for throughput flattening or otherwise deflecting.
Usually, context switching in threads is very cheap computationally, but when it involves this many threads... you just can't know. You say upgrading to Java 1.6 EE is out of the question, but what about some hardware upgrades ? It would probably provide a quick fix and shouldn't be that expensive...
e.g. run a profiler on a similar machine.
try a newer version of Java 6 or 7. (It may not make a difference, in which case don't bother upgrading production)
try Centos 6.x
try not using VMware.
try reducing the number of threads. You only have 8 cores.
You many find all or none of the above options make a difference, but you won't know until you have a system you can test on with a known/repeatable work load.