Find the most executed section of java source code - java

I am using netbeans IDE 7.4. I want to find the lines of code that the most of running time is spent. I heard a little about profilers that can used to thread monitoring and etc... .
But I don't know (exactly!) how to find the section{s} of code that frequently used in my program. I want to find out the mechanism and equipments provided by JVM for that- not only using the third party packages(profilers and etc...).

You can profile the CPU with visualVM and you 'll find which methods are CPU consumming. You have to make filter (regex) to focus on your classes.

Suppose line L of your code, if removed, would cut the overall wall-clock time in its thread by 50%. Then if you, at a random time, dump the stacks of all running threads, locate your thread, and disregard all the levels that are not your code, there is a 50% chance that you will see line L in the remaining lines.
So if you do this 10 times, you will see line L about 5 times, give or take.
In fact, any line of your code, if you see it on more than one stack sample, if you can remove or bypass it, will save you a healthy fraction of time, guaranteed.
What's more, this method (while crude) will find any speedup that profilers can find, and more that they can't.
The math behind it can be found here.
Example: A worker thread is spending 80% of its time doing I/O and allocating memory in the process of parsing XML to build a data structure. You can see that the XML comes from a data structure in a different piece of code in the same thread. It's big code - you wouldn't have known this without the samples pointing it out. You only have to take two or three samples before you see it twice. Just bypass the XML - 5x speedup.

Related

Storing arrays in memory and using these arrays later

I am currently working on a program which requires preprocessing; filling multidimensional arrays with around 5765760*2 values.
My issue is that I have to run this preprocessing every time before I actually get to test the data and it takes around 2 minutes.
I don't want to have to wait 2 minutes each time I run a test, but I also don't want to store the values in a file.
Is there a way to store the values in a temporary memory rather than actually outputting them into a file?
I think, what you are asking for translates to: "can I make my JVM write data to some place in memory so that another JVM instance can later on read from there?"
And the simple answer is: no, that is not possible.
When the JVM dies, the memory consumed by the JVM is returned to the OS. That stuff is gone.
So even the infamous sun.misc.Unsafe with "direct" memory access does not allow you to do that.
The one thing that would work: if your OS is Linux, you could create a RAM disc. And then you write your file to that.
So, yes, you store your data in a file, but the file resides in memory; thus reading/writing is much faster compared to disk IO. And that data stays available as long as you don't delete the RAM disc or restart your OS.
On the other hand, when your OS is Linux, and you have enough RAM (a few GB should do!) then you should just try if an "ordinary disc" isn't good enough.
You see - those modern OSes, they do a lot of things in the background. It might look like "writing to disk", but in the end, the Linux OS just keeps using the memory.
So, before you spent hours on bizarre solutions - measure the impact of writing to disk!
Run the preprocessing, save the result using a data structure of your choice and keep your programm running until you need the result.
Can it be stored in memory? Well, yes, it's already in memory! The obvious solution is to keep your program running. You can put your program in a loop with an option to repeat - "enter Y to test again, or N to quit." Then, your program can skip the preprocessing if it's already been done. Until you exit the program, you can do this as many times as you like.
Another thing you might consider is whether your code can be made more efficient. If your code takes less time to run, it won't be quite so annoying to wait for it. In general, if something can be done outside a loop, don't do it inside a loop. If you have an instruction being run five million times, that can add up. If this is homework, you'll likely use more time making it more efficient than you'd spend waiting for it - however, this isn't wasted time, as you're practicing skills you may need later. Obviously, I can't give specific suggestions without the code (and making specific code more efficient would probably be better suited for the Code Review stack exchange site.)

Timing a block of code with C++ and Java

I am trying to compare the accuracy of timing methods with C++ and Java.
With C++ I usually use CLOCKS_PER_SEC, I run the block of code I want to time for a certain amount of time and then calculate how long it took, based on how many times the block was executed.
With Java I usually use System.nanoTime().
Which one is more accurate, the one I use for C++ or the one I use for Java? Is there any other way to time in C++ so I don't have to repeat the piece of code to get a proper measurement? Basically, is there a System.nanoTime() method for C++?
I am aware that both use system calls which cause considerable latencies. How does this distort the real value of the timing? Is there any way to prevent this?
Every method has errors. Before you spend a great deal of time on this question, you have to ask yourself "how accurate do I need my answer to be"? Usually the solution is to run a loop / piece of code a number of times, and keep track of the mean / standard deviation of the measurement. This is a good way to get a handle on the repeatability of your measurement. After that, assume that latency is "comparable" between the "start time" and "stop time" calls (regardless of what function you used), and you have a framework to understand the issues.
Bottom line: clock() function typically gives microsecond accuracy.
See https://stackoverflow.com/a/20497193/1967396 for an example of how to go about this in C (in that instance, using a usec precision clock). There's the ability to use ns timing - see for example the answer to clock_gettime() still not monotonic - alternatives? which uses clock_gettime(CLOCK_MONOTONIC_RAW, &tSpec);
Note that you have to extract seconds and nanoseconds separately from that structure.
Be careful using System.nanoTime() as it is still limited by the resolution that the machine you are running on can give you.
Also there are complications timing Java as the first few times through a function will be a lot slower until they get optimized for your system.
Virtually all modern systems use pre-emptive multi threading and multiple cores, etc - so all timings will vary from run to run. (For example if control gets switched away from your thread while it in the method).
To get reliable timings you need to
Warm up the system by running around the thing you are timing a few hundred times before starting.
Run the code for a good number of times and average the results.
The reliability issues are the same for any language so apply just as well to C as to Java so C may not need the warm-up loop but you will still need to take a lot of samples and average them.

Blocking Behaviour - Concurrent Data Structures Java

I'm currently running a highly concurrent benchmark which accesses a ConcurrentSkipList from the Java collections. I'm finding that threads are getting blocked within that method, more precisely here:
java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:828)
java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
(This is obtained through, over 10 seconds interval, printing the stack trace of each individual thread). This is still not resolved after minutes
Is this is an expected behaviour of collections? What are the concurrent other collections likely to experience blocking?
Having tested, it, I exhibit similar behaviour with ConcurrentHashMaps:
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:994)
This could well be a spurious result.
When you ask Java for a dump of all its current stack traces, it tells each thread to wait when it gets to a yield point, then it captures the traces, and then it resumes all the threads. As you can imagine, this means that yield points are over-represented in these traces; these include synchronized methods, volatile accesses, etc. ConcurrentSkipListMap.head, a volatile field, is accessed in doGet.
See this paper for a more detailed analysis.
Solaris Studio has a profiler that captures stack traces from the OS and translates them to Java stack traces. This does away with the bias toward yield points and gives you more accurate results; you might find that doGet goes away almost entirely. I've only had luck running it on Linux, and even then it's not out-of-the-box. If you're interested, ask me in the comments how to set it up, I'd be happy to help.
As an easier approach, you could wrap your calls to ConcurrentSkipList.get with System.nanoTime() to get a sanity check on whether this is really where your time is going. Figure out how much time you're spending in that method, and confirm whether it's about what you'd expect given that the profiler says you're spending such-and-such percent of your time in that method.
Shameless self-plug: I created a simple program that demonstrates this a few months ago for a presentation at work. If you run it against a profiler, it should show that SpinWork.work appears a lot, while HardWork.work doesn't show up at all -- even though the latter actually takes a lot more time. It doesn't contain yield points.
Well, it isn't blocking in its truest form. Blocking implies the suspension of thread activity. ConcurrentSkipListMap is non-blocking, it will spin until is succeeds. But it also guarantees it will eventually succeed (that is it shouldn't get into an infinite loop)
That being said, unless you are doing many many gets and many many puts asynchronously I don't see how you can be spending so much time in this method.
If you can re-create it with an example and share with us that may help.
ConcurrentHashMap.get is a volatile read, which means, the CPU must finish all outstanding writes before it can perform the read. This is called a STORE/LOAD barrier. Depending on how much is going on in the other thread/cores, this can take a long time. See https://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html.

Why aren't all methods displayed in VisualVM profiler?

I am using VisualVM to see where my application is slow. But it does not show all methods, probably does not show all methods that delays the application.
I have a realtime application (sound processing) and have a time deficiency in few hundreds of microseconds.
Is it possible that VisualVM hides methods which are fast themselves?
UPDATE 1
I found slow method by sampler and guessing. It was toString() method which was called from debug logging which was turned off, but consuming a time.
Settings helped and now I know how to see it: it was depending on Start profiling from option.
Other than the filters mentioned by Ryan Stewart, here are a couple of additional reasons why methods may not appear in the profiler:
Sampling profiles are inherently stochastic: a sample of the current stack of all threads is taken every N ms. Some methods which actually executed but which aren't caught in any sample during your run just won't appear. This is generally not too problematic since the very fact they didn't appear in any sample, means that with very high probability these are methods which aren't taking up a large part of your runtime.
When using instrumentation based sampling in visualvm (called "CPU profiling"), you need to define the entry point for profiled methods (the "Start profiling from" option). I have found this fails for methods in the default package, and also won't pick up time in methods which are current running when the profiler is attached (for the duration of the current invocation - it will get later invocations. This is probably because the instrumented method won't be swapped in until the current invocation finishes.
Sampling is subject to a potentially serious issue with stack traced based profiling, which is that samples are only taken at safe points in the code. When a trace is requested, each thread is forced to a safe point, then the stack is taken. In some cases you may have a hot spot in your code which does no safe point polling (common for simple loops that the JIT can guarantee terminate after a fixed number of iterations), interleaved with a bit of code that does have a safepoint poll. Your stacks will always show your process in the safe-point code, never in the safe-point free code, even though the latter may be taking the majority of CPU time.
I don't have it in front of my at the moment, but before you start profiling, there's a settings pane that's hidden by default and lets you enter regexes for filtering out methods. By default, it filters out a lot of the core JDK stuff.
I had the same problem with my pet project. I added a package name and the problem is solved. I don't understand why. VisualVM 1.4.1, jdk1.8.0_181 and jdk-10.0.2, Windows 10

Analyze the "runnable" thread dump under high load server side

The thread dump from a java based application is easy to get but hard to analyze!
There is something interesting we can see from the thread dump.
Suppose we are in a heavily load java web app. And I often take 10 or 15 thread dump files during the peak time ( under the high loads) to make the wide data. So first, there is no doubt that we need tune up the codes whose status is Blocked and Monitor. I can't dig into more with the rest Runnable threads.
So, if the "method" appears from the thread dump many time, can we say it is slower or faster than the other under high load server side? Of course, we can use more profiling tools to check that but the thread dump may give us the same useful information, especially we are in the production env.
Thank you in adv!
Vance
I would look carefully at the call stack of the thread(s) in each dump, regardless of the thread's state, and ask "What exactly is it doing or waiting for at that point in time, and why?"
You don't just look at functions on the call stack, you look at the lines of code where the functions are called. That tells you the local reason for the call. If you combine the local reasons for the calls on the stack, that gives you the complete reason (the "why chain") for what the thread was doing at that time.
What you're looking for is bad reasons that appear on more than one snapshot.
(It only takes one unnecessary call on the stack to make that whole sample improvable, so the deeper the stack, the better the hunting.)
Since they're bad, they can be fixed and you will get improved performance. The amount of the performance improvement is roughly the fraction of snapshots that show them.
That's this technique.
I'd say that if a method appears very often in a thread dump you'd have to either
optimize that method since it is called many times or
check whether that method is called too often
If you see that the thread runs spend lots of time in a particluar method, there might also be some bug (like we had with using a special regex that suffered from a bug in the regex engine). So you'd need to investigate that.

Categories