Analyze the "runnable" thread dump under high load server side

Analyze the "runnable" thread dump under high load server side - java

The thread dump from a java based application is easy to get but hard to analyze!
There is something interesting we can see from the thread dump.
Suppose we are in a heavily load java web app. And I often take 10 or 15 thread dump files during the peak time ( under the high loads) to make the wide data. So first, there is no doubt that we need tune up the codes whose status is Blocked and Monitor. I can't dig into more with the rest Runnable threads.
So, if the "method" appears from the thread dump many time, can we say it is slower or faster than the other under high load server side? Of course, we can use more profiling tools to check that but the thread dump may give us the same useful information, especially we are in the production env.
Thank you in adv!
Vance

I would look carefully at the call stack of the thread(s) in each dump, regardless of the thread's state, and ask "What exactly is it doing or waiting for at that point in time, and why?"
You don't just look at functions on the call stack, you look at the lines of code where the functions are called. That tells you the local reason for the call. If you combine the local reasons for the calls on the stack, that gives you the complete reason (the "why chain") for what the thread was doing at that time.
What you're looking for is bad reasons that appear on more than one snapshot.
(It only takes one unnecessary call on the stack to make that whole sample improvable, so the deeper the stack, the better the hunting.)
Since they're bad, they can be fixed and you will get improved performance. The amount of the performance improvement is roughly the fraction of snapshots that show them.
That's this technique.

I'd say that if a method appears very often in a thread dump you'd have to either
optimize that method since it is called many times or
check whether that method is called too often
If you see that the thread runs spend lots of time in a particluar method, there might also be some bug (like we had with using a special regex that suffered from a bug in the regex engine). So you'd need to investigate that.

Related

Find the most executed section of java source code

I am using netbeans IDE 7.4. I want to find the lines of code that the most of running time is spent. I heard a little about profilers that can used to thread monitoring and etc... .
But I don't know (exactly!) how to find the section{s} of code that frequently used in my program. I want to find out the mechanism and equipments provided by JVM for that- not only using the third party packages(profilers and etc...).

You can profile the CPU with visualVM and you 'll find which methods are CPU consumming. You have to make filter (regex) to focus on your classes.

Suppose line L of your code, if removed, would cut the overall wall-clock time in its thread by 50%. Then if you, at a random time, dump the stacks of all running threads, locate your thread, and disregard all the levels that are not your code, there is a 50% chance that you will see line L in the remaining lines.
So if you do this 10 times, you will see line L about 5 times, give or take.
In fact, any line of your code, if you see it on more than one stack sample, if you can remove or bypass it, will save you a healthy fraction of time, guaranteed.
What's more, this method (while crude) will find any speedup that profilers can find, and more that they can't.
The math behind it can be found here.
Example: A worker thread is spending 80% of its time doing I/O and allocating memory in the process of parsing XML to build a data structure. You can see that the XML comes from a data structure in a different piece of code in the same thread. It's big code - you wouldn't have known this without the samples pointing it out. You only have to take two or three samples before you see it twice. Just bypass the XML - 5x speedup.

Blocking Behaviour - Concurrent Data Structures Java

I'm currently running a highly concurrent benchmark which accesses a ConcurrentSkipList from the Java collections. I'm finding that threads are getting blocked within that method, more precisely here:
java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:828)
java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
(This is obtained through, over 10 seconds interval, printing the stack trace of each individual thread). This is still not resolved after minutes
Is this is an expected behaviour of collections? What are the concurrent other collections likely to experience blocking?
Having tested, it, I exhibit similar behaviour with ConcurrentHashMaps:
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:994)

This could well be a spurious result.
When you ask Java for a dump of all its current stack traces, it tells each thread to wait when it gets to a yield point, then it captures the traces, and then it resumes all the threads. As you can imagine, this means that yield points are over-represented in these traces; these include synchronized methods, volatile accesses, etc. ConcurrentSkipListMap.head, a volatile field, is accessed in doGet.
See this paper for a more detailed analysis.
Solaris Studio has a profiler that captures stack traces from the OS and translates them to Java stack traces. This does away with the bias toward yield points and gives you more accurate results; you might find that doGet goes away almost entirely. I've only had luck running it on Linux, and even then it's not out-of-the-box. If you're interested, ask me in the comments how to set it up, I'd be happy to help.
As an easier approach, you could wrap your calls to ConcurrentSkipList.get with System.nanoTime() to get a sanity check on whether this is really where your time is going. Figure out how much time you're spending in that method, and confirm whether it's about what you'd expect given that the profiler says you're spending such-and-such percent of your time in that method.
Shameless self-plug: I created a simple program that demonstrates this a few months ago for a presentation at work. If you run it against a profiler, it should show that SpinWork.work appears a lot, while HardWork.work doesn't show up at all -- even though the latter actually takes a lot more time. It doesn't contain yield points.

Well, it isn't blocking in its truest form. Blocking implies the suspension of thread activity. ConcurrentSkipListMap is non-blocking, it will spin until is succeeds. But it also guarantees it will eventually succeed (that is it shouldn't get into an infinite loop)
That being said, unless you are doing many many gets and many many puts asynchronously I don't see how you can be spending so much time in this method.
If you can re-create it with an example and share with us that may help.

ConcurrentHashMap.get is a volatile read, which means, the CPU must finish all outstanding writes before it can perform the read. This is called a STORE/LOAD barrier. Depending on how much is going on in the other thread/cores, this can take a long time. See https://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html.

How to avoid 100% CPU utilization without removing while(true)

I am working on a system where I need a while(true) where the loop constantly listens to a queue and increments counts in memory.
The data is constantly coming in the queue, so I cannot avoid a while(true) condition. But naturally it increases my CPU utilization to 100%.
So, how can I keep a thread alive which listens to the tail of queue and performs some action, but at the same time reduce the CPU utilization to 100%?

Blocking queues were invented exactly for this purpose.
Also see this: What are the advantages of Blocking Queue in Java?

LinkedBlockingQueue.take() is what you should be using. This waits for an entry to arrive on the queue, with no additional synchronization mechanism needed.
(There are one or two other blocking queues in Java, IIRC, but they have features that make them unsuitable in the general case. Don't know why such an important mechanism is buried so deeply in arcane classes.)

usually a queue has a way to retrieve an item from it and your thread will be descheduled (thus using 0% cpu) until something arrives in the queue...

Based on your comments on another answer, you want to have a queue that is based on changes in hsqldb
Some quick googling turns up:
http://hsqldb.org/doc/guide/triggers-chapt.html
It appears you can set it up so that changes cause a trigger to occur, which will notify a class you write implementing the org.hsqldb.Trigger interface. Have that class contain a reference to a LinkedBlockingDequeue from the Concurrent package in Java and have the trigger add the change to the queue.
You now have a blocking queue that your reading thread will block on until hsqldb fires a trigger (from an update by a writer) which will put something in the queue. The waiting thread will then unblock and have the item off the queue.

lbalazscs and Brain have excellent answers. I couldn’t share my code it was hard for them to give them the exact fix for my issue. And having a while(true) which constantly polls a queue is surely the wrong way to go about it. So, here is what I did:
I used ScheduledExecutorService with a 10sec delay.
I read a block of messages (say 10k) and process those messages
thread is invoked again and the "loop" continues.
This considerably reduces my CPU usage. Suggestions welcomed.

Lots of dumb answers from people who read books and only wasted time in schools, not as many direct logic or answers I see.
while(true) will set your program to use all the CPU power that's basically 'alloted' to it by the windows algorithms to run what is in the loop, usually as-fast-as-possible over and over. This doesn't mean if it says 100% on your application, that if you run a game, your empty loop .exe will be taking all your OS CPU power, the game should still run as intended. It is more like a visual bug, similar to the windows idle process and some other processes. The fix is to add a Sleep(1) (at least 1 millisecond) or better yet a Sleep(5) to make sure other stuff can run and ensure the CPU is not constantly looping your while(true) as fast as possible. This will generally drop CPU usage to 0% or 1% in the visual queue as 1 full millisecond is a big resting time for even older CPU.
Many times while(trues) or generic endless loops are bad designs and can be drastically slowed down to even Sleep(1000) - 1 second interval checks or higher. Endless loops are not always bad designs, but usually they can be improved..
funny to see this bug I learned whe nI was like 12 learning C pop up and all the 'dumb' answers given.
Just know if you try it, unless the scripted slower language you have learned to use has fixed it somewhere along the line by itself, windows will claim to use a lot of CPU on doing an empty loop when the OS is actually having free resources to spend.

Why aren't all methods displayed in VisualVM profiler?

I am using VisualVM to see where my application is slow. But it does not show all methods, probably does not show all methods that delays the application.
I have a realtime application (sound processing) and have a time deficiency in few hundreds of microseconds.
Is it possible that VisualVM hides methods which are fast themselves?
UPDATE 1
I found slow method by sampler and guessing. It was toString() method which was called from debug logging which was turned off, but consuming a time.
Settings helped and now I know how to see it: it was depending on Start profiling from option.

Other than the filters mentioned by Ryan Stewart, here are a couple of additional reasons why methods may not appear in the profiler:
Sampling profiles are inherently stochastic: a sample of the current stack of all threads is taken every N ms. Some methods which actually executed but which aren't caught in any sample during your run just won't appear. This is generally not too problematic since the very fact they didn't appear in any sample, means that with very high probability these are methods which aren't taking up a large part of your runtime.
When using instrumentation based sampling in visualvm (called "CPU profiling"), you need to define the entry point for profiled methods (the "Start profiling from" option). I have found this fails for methods in the default package, and also won't pick up time in methods which are current running when the profiler is attached (for the duration of the current invocation - it will get later invocations. This is probably because the instrumented method won't be swapped in until the current invocation finishes.
Sampling is subject to a potentially serious issue with stack traced based profiling, which is that samples are only taken at safe points in the code. When a trace is requested, each thread is forced to a safe point, then the stack is taken. In some cases you may have a hot spot in your code which does no safe point polling (common for simple loops that the JIT can guarantee terminate after a fixed number of iterations), interleaved with a bit of code that does have a safepoint poll. Your stacks will always show your process in the safe-point code, never in the safe-point free code, even though the latter may be taking the majority of CPU time.

I don't have it in front of my at the moment, but before you start profiling, there's a settings pane that's hidden by default and lets you enter regexes for filtering out methods. By default, it filters out a lot of the core JDK stuff.

I had the same problem with my pet project. I added a package name and the problem is solved. I don't understand why. VisualVM 1.4.1, jdk1.8.0_181 and jdk-10.0.2, Windows 10

Does a Java Thread have its own process ID?

I want to get the process ID of a Thread to see how much memory it takes.

It depends a lot on the OS and how it manages threads. Theoretically it also depends on how the JVM implements threads, but all modern JVMs implement them as native threads.
On Linux each thread will used to get its own process ID, but most tools hide all but one thread per process (i.e. you don't usually see them unless you explicitly ask for them, ps uses the -m flag for example). This is caused by the fact that the Linux kernel doesn't really make much of a difference between threads and tasks.
Edit: as I just learned this is no longer necessarily the case: you can create a thread with the exact same PID as the parent, in which case the threads will be distinguished by different thread IDs.
However since a thread shares its memory with all other threads in the same process, this doesn't help you find out "how much memory a thread takes", since all threads in a process will use the exact same amount (and they all use the same, so the real used memory is shown_memory_use and not shown_memory_user * number_of_threads).

Threads do not have PIDs, processes do. As such what you're asking is not possible. There is also no reliable way to retrieve your PID from within a Java process (although the first part of the value returned by ManagementFactory.getRuntimeMXBean().getName() usually is the PID).

As the name implies, PID means process ID. Each process can spawn multiple threads, which all share the same PID. Are you sure you don't mean Thread ID?

A feature of thread is that is shares the heap with all other threads. This means that any one thread can potentially use almost all the memory of the process. The only thing which a thread doesn't have access to is the stack or local variables of another thread.
As such it is not useful to try to determine how much memory an individual thread uses. Instead it can be useful to determine how much memory a data structure uses. (Although this can have similar difficulties)
It is worth noting that main memory is relatively cheap. Your situation may be different but a typical new server with 24 GB can cost as little as £1K. You can buy a 96 GB PC for around £2K. Sometimes it is not worth worrying about how much memory you are using until you know it is a problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.