Getting method execution time from Stack Trace

Getting method execution time from Stack Trace - java

I have a small system sampler project that I have made in Java.
I want to find a way to get the method execution time (self-time) of methods from all threads, similar to VisualVM. However, I simply don't want to use any instrumentation.
So I have two main questions, a broad question and something slightly more specific for my case:
Broad question: Is there a way to calculate self-time of a method using solely Java + JMX? If yes, how accurate is your implementation?
More specific to my problem question: In my project, I can get the CPU time spent per method by sampling all thread stack traces, getting the delta CPU time between the samples and applying that to the top frame of the stack (in my data structure).
Could I infer a basic execution from this data and the length between samples?
Here is a simplified version of my code:
private static final ThreadMXBean MX = ManagementFactory.getThreadMXBean();
private long lastCpuTime;
private Map<Long, ThreadTimerData> threadCache = new HashMap<Long, ThreadTimerData>();
public void sample()
{
final ThreadInfo[] threadInfos = MX.getThreadInfo( MX.getAllThreadIds(), Integer.MAX_VALUE );
for( ThreadInfo threadInfo : threadInfos )
{
final long threadId = threadInfo.getThreadId();
ThreadTimerData data = threadCache.get(threadId); // Just assume we already have this in our Map.
final StackTraceElement[] trace = threadInfo.getStackTrace();
if( trace == null )
{
continue;
}
final long cpuTime = MX.getThreadCpuTime( threadId );
data.update(trace[0].getClassName() + "." + trace[0].getMethodName(), cpuTime - lastCpuTime); // Another map, holding the name string against the delta.
}
lastCpuTime = cpuTime;
}
The sample method is being called at a 200ms interval (within it's own thread) -- this can be changed.

I believe I found the solution for those curious:
Essentially, if any element (method on the stack frame) is on top of the stack it is technically executing. We need to measure for how long this element has been there for, so we sample the stack trace at an interval. This also answers the 2nd part of my first question, the accuracy depends on the interval -- a shorter interval would mean a more meaningful self-time as there's a lesser chance another element appeared and disappeared on the stack between any two samples.
But I digress a bit, we get the time difference between the two sample times (ideally using nano second precision) and this gives us how long the method has been executing. We aggregate this after several samples until the method has stopped executing (when the element leaves the stack trace) or just called something else (a new element has been pushed onto the stack).
Once the stack has changed, we repeat the process. Everything on the stack is using CPU time, additionally. The tricky part of all of this is creating the most efficient data structure to store, retrieve and update the methods from the stack.

Related

Why is a particular Guava Stopwatch.elapsed() call much later than others? (output in post)

I am working on a small game project and want to track time in order to process physics. After scrolling through different approaches, at first I had decided to use Java's Instant and Duration classes and now switched over to Guava's Stopwatch implementation, however, in my snippet, both of those approaches have a big gap at the second call of runtime.elapsed(). That doesn't seem like a big problem in the long run, but why does that happen?
I have tried running the code below as both in focus and as a Thread, in Windows and in Linux (Ubuntu 18.04) and the result stays the same - the exact values differ, but the gap occurs. I am using the IntelliJ IDEA environment with JDK 11.
Snippet from Main:
public static void main(String[] args) {
MassObject[] planets = {
new Spaceship(10, 0, 6378000)
};
planets[0].run();
}
This is part of my class MassObject extends Thread:
public void run() {
// I am using StringBuilder to eliminate flushing delays.
StringBuilder output = new StringBuilder();
Stopwatch runtime = Stopwatch.createStarted();
// massObjectList = static List<MassObject>;
for (MassObject b : massObjectList) {
if(b!=this) calculateGravity(this, b);
}
for (int i = 0; i < 10; i++) {
output.append(runtime.elapsed().getNano()).append("\n");
}
System.out.println(output);
}
Stdout:
30700
1807000
1808900
1811600
1812400
1813300
1830200
1833200
1834500
1835500
Thanks for your help.

You're calling Duration.getNano() on the Duration returned by elapsed(), which isn't what you want.
The internal representation of a Duration is a number of seconds plus a nano offset for whatever additional fraction of a whole second there is in the duration. Duration.getNano() returns that nano offset, and should almost never be called unless you're also calling Duration.getSeconds().
The method you probably want to be calling is toNanos(), which converts the whole duration to a number of nanoseconds.
Edit: In this case that doesn't explain what you're seeing because it does appear that the nano offsets being printed are probably all within the same second, but it's still the case that you shouldn't be using getNano().
The actual issue is probably some combination of classloading or extra work that has to happen during the first call, and/or JIT improving performance of future calls (though I don't think looping 10 times is necessarily enough that you'd see much of any change from JIT).

execution of task in java within specified time

I want to execute few lines of code with 5ms in Java. Below is the snippet of my code:
public void delay(ArrayList<Double> delay_array, int counter_main) {
long start=System.currentTimeMillis();
ArrayList<Double> delay5msecs=new ArrayList<Double>();
int index1=0, i1=0;
while(System.currentTimeMillis() - start <= 5)
{
delay5msecs.add(i1,null);
//System.out.println("time");
i1++;
}
for(int i=0;i<counter_main-1;i++) {
if(delay5msecs.get(i)!=null) {
double x1=delay_array.get(i-index1);
delay5msecs.add(i,x1);
//System.out.println(i);
} else {
index1++;
System.out.println("index is :"+index1);
}
}
}
Now the problem is that the entire array is getting filled with null values and I am getting some exceptions related to index as well. Basically, I want to fill my array list with 0 till 5ms and post that fill the data from another array list in it. I've not done coding since a long time. Appreciate your help.
Thank You.

System.currentTimeMillis() will probably not have the resolution you need for 5ms. The granularity on Windows may not be better than 15ms anyway, so your code will be very platform sensitive, and may actually not do what you want.
The resolution you need might be doable with System.nanoTime() but, again, there are platform limitations you might have to research. I recall that you can't just scale the value you get and have it work everywhere.
If you can guarantee no other threads running this code, then I suppose a naive loop and fill will work, without having to implement a worker thread that waits for the filler thread to finish.
You should try to use the Collection utilities and for-each loops instead of doing all this index math in the second part.
I suppose I should also warn you that nothing in a regular JVM is guaranteed to be real-time. So if you need a hard, dependable, reproducible 5ms you might be out of luck.

Java: Why is calling a method for the first time slower?

Recently, I was writing a plugin using Java and found that retrieving an element(using get()) from a HashMap for the first time is very slow. Originally, I wanted to ask a question on that and found this (No answers though). With further experiments, however, I notice that this phenomenon happens on ArrayList and then all the methods.
Here is the code:
public class Test {
public static void main(String[] args) {
long startTime, stopTime;
// Method 1
System.out.println("Test 1:");
for (int i = 0; i < 20; ++i) {
startTime = System.nanoTime();
testMethod1();
stopTime = System.nanoTime();
System.out.println((stopTime - startTime) + "ns");
}
// Method 2
System.out.println("Test 2:");
for (int i = 0; i < 20; ++i) {
startTime = System.nanoTime();
testMethod2();
stopTime = System.nanoTime();
System.out.println((stopTime - startTime) + "ns");
}
}
public static void testMethod1() {
// Do nothing
}
public static void testMethod2() {
// Do nothing
}
}
Snippet: Test Snippet
The output would be like this:
Test 1:
2485ns
505ns
453ns
603ns
362ns
414ns
424ns
488ns
325ns
426ns
618ns
794ns
389ns
686ns
464ns
375ns
354ns
442ns
404ns
450ns
Test 2:
3248ns
700ns
538ns
531ns
351ns
444ns
321ns
424ns
523ns
488ns
487ns
491ns
551ns
497ns
480ns
465ns
477ns
453ns
727ns
504ns
I ran the code for a few times and the results are about the same. The first call would be even longer(>8000 ns) on my computer(Windows 8.1, Oracle Java 8u25).
Apparently, the first calls is usually slower than the following calls(Some calls may be longer in random cases).
Update:
I tried to learn some JMH, and write a test program
Code w/ sample output: Code
I don't know whether it's a proper benchmark(If the program has some problems, tell me), but I found that the first warm-up iterations spend more time(I use two warm-up iterations in case the warm-ups affect the results). And I think that the first warm-up should be the first call and is slower. So this phenomenon exists, if the test is proper.
So why does it happen?

You're calling System.nanoTime() inside a loop. Those calls are not free, so in addition to the time taken for an empty method you're actually measuring the time it takes to exit from nanotime call #1 and to enter nanotime call #2.
To make things worse, you're doing that on windows where nanotime performs worse compared to other platforms.
Regarding JMH: I don't think it's much help in this situation. It's designed to measure by averaging many iterations, to avoid dead code elimination, account for JIT warmup, avoid ordering dependence, ... and afaik it simply uses nanotime under the hood too.
Its design goals pretty much aim for the opposite of what you're trying to measure.
You are measuring something. But that something might be several cache misses, nanotime call overhead, some JVM internals (class loading? some kind of lazy initialization in the interpreter?), ... probably a combination thereof.
The point is that your measurement can't really be taken at face value. Even if there is a certain cost for calling a method for the first time, the time you're measuring only provides an upper bound for that.

This kind of behaviour is often caused by the compiler or RE. It starts to optimize the execution after the first iteration. Additionally class loading can have an effect (I guess this is not the case in your example code as all classes are loaded in the first loop latest).
See this thread for a similar problem.
Please keep in mind this kind of behaviour is often dependent on the environment/OS it's running on.

Different execution time for HashMap and HashSet based on the order of execution?

I am getting different execution time, if i interchange the HashMap and HashSet. The execution time is always high the one appears first ( either HashMap/ Hashset ). I am not sure about the reason behind this. Any help appreciated
Execution 1 - HashMap first , then HashSet ---
Time taken map add: 2071ms,
Time taken set add: 794ms
Execution 2 - HashSet first , then HashMap ---
Time taken set add: 2147ms,
Time taken map add: 781ms
private static Random secureRandom = new SecureRandom();
public static void main(String args[])
{
int testnumber = 1000000;
// HashMap
long starttimemap = System.currentTimeMillis();
Map<String, String> hashmap = new HashMap<String, String>();
for (int i = 0; i < testnumber; i++)
{
hashmap.put(Long.toHexString(secureRandom.nextLong()), "true");
}
long endtimemap = System.currentTimeMillis();
System.out.println("Time taken map add: " + (endtimemap - starttimemap) + "ms");
// HashSet
long starttimeset = System.currentTimeMillis();
Set<String> hashset = new HashSet<String>();
for (int i = 0; i < testnumber; i++)
{
hashset.add(Long.toHexString(secureRandom.nextLong()));
}
long endtimeset = System.currentTimeMillis();
System.out.println("Time taken set add: " + (endtimeset - starttimeset) + "ms");
}

The reason is the way the JVM works. The JIT compiler needs some time to kick in because it decides which code to compile based on execution count.
So, it's totally natural that the second pass is faster, because the JIT already compiled a lot of Java code to native code.
If you start the program using the -Xint option (which disables the JIT), both runs should be roughly equal in execution time.

One likely reason is that you're not warming up the JIT before performing the benchmarks.
Basically, Java executes bytecode (which is somewhat slower) for a while before figuring out what's used often enough to justify JIT compiling it into native machine code (which is faster). As such, whatever happens first will often be slower.
Run both things a bunch of times before starting the real benchmarks to give it a chance to JIT the relevant code.

You are not getting different execution times, you are getting the same execution times. Regardless of whether you use HashMap or HashSet you get the same time for the first loop and the same time for the second. The difference between the first and second has been explained already, it’s due to the JVM’s optimizations. It’s not surprising that it doesn’t matter whether you use HashMap or HashSet as HashSet uses a HashMap internally. You are executing the same code all the time.

Why does this method print 4?

I was wondering what happens when you try to catch an StackOverflowError and came up with the following method:
class RandomNumberGenerator {
static int cnt = 0;
public static void main(String[] args) {
try {
main(args);
} catch (StackOverflowError ignore) {
System.out.println(cnt++);
}
}
}
Now my question:
Why does this method print '4'?
I thought maybe it was because System.out.println() needs 3 segments on the call stack, but I don't know where the number 3 comes from. When you look at the source code (and bytecode) of System.out.println(), it normally would lead to far more method invocations than 3 (so 3 segments on the call stack would not be sufficient). If it's because of optimizations the Hotspot VM applies (method inlining), I wonder if the result would be different on another VM.
Edit:
As the output seems to be highly JVM specific, I get the result 4 using
Java(TM) SE Runtime Environment (build 1.6.0_41-b02)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)
Explanation why I think this question is different from Understanding the Java stack:
My question is not about why there is a cnt > 0 (obviously because System.out.println() requires stack size and throws another StackOverflowError before something gets printed), but why it has the particular value of 4, respectively 0,3,8,55 or something else on other systems.

I think the others have done a good job at explaining why cnt > 0, but there's not enough details regarding why cnt = 4, and why cnt varies so widely among different settings. I will attempt to fill that void here.
Let
X be the total stack size
M be the stack space used when we enter main the first time
R be the stack space increase each time we enter into main
P be the stack space necessary to run System.out.println
When we first get into main, the space left over is X-M. Each recursive call takes up R more memory. So for 1 recursive call (1 more than original), the memory use is M + R. Suppose that StackOverflowError is thrown after C successful recursive calls, that is, M + C * R <= X and M + C * (R + 1) > X. At the time of the first StackOverflowError, there's X - M - C * R memory left.
To be able to run System.out.prinln, we need P amount of space left on the stack. If it so happens that X - M - C * R >= P, then 0 will be printed. If P requires more space, then we remove frames from the stack, gaining R memory at the cost of cnt++.
When println is finally able to run, X - M - (C - cnt) * R >= P. So if P is large for a particular system, then cnt will be large.
Let's look at this with some examples.
Example 1: Suppose
X = 100
M = 1
R = 2
P = 1
Then C = floor((X-M)/R) = 49, and cnt = ceiling((P - (X - M - C*R))/R) = 0.
Example 2: Suppose that
X = 100
M = 1
R = 5
P = 12
Then C = 19, and cnt = 2.
Example 3: Suppose that
X = 101
M = 1
R = 5
P = 12
Then C = 20, and cnt = 3.
Example 4: Suppose that
X = 101
M = 2
R = 5
P = 12
Then C = 19, and cnt = 2.
Thus, we see that both the system (M, R, and P) and the stack size (X) affects cnt.
As a side note, it does not matter how much space catch requires to start. As long as there is not enough space for catch, then cnt will not increase, so there are no external effects.
EDIT
I take back what I said about catch. It does play a role. Suppose it requires T amount of space to start. cnt starts to increment when the leftover space is greater than T, and println runs when the leftover space is greater than T + P. This adds an extra step to the calculations and further muddies up the already muddy analysis.
EDIT
I finally found time to run some experiments to back up my theory. Unfortunately, the theory doesn't seem to match up with the experiments. What actually happens is very different.
Experiment setup:
Ubuntu 12.04 server with default java and default-jdk. Xss starting at 70,000 at 1 byte increments to 460,000.
The results are available at: https://www.google.com/fusiontables/DataSource?docid=1xkJhd4s8biLghe6gZbcfUs3vT5MpS_OnscjWDbM
I've created another version where every repeated data point is removed. In other words, only points that are different from the previous are shown. This makes it easier to see anomalies. https://www.google.com/fusiontables/DataSource?docid=1XG_SRzrrNasepwZoNHqEAKuZlHiAm9vbEdwfsUA

This is the victim of bad recursive call. As you are wondering why the value of cnt varies, it is because the stack size depends on the platform. Java SE 6 on Windows has a default stack size of 320k in the 32-bit VM and 1024k in the 64-bit VM. You can read more here.
You can run using different stack sizes and you will see different values of cnt before the stack overflows-
java -Xss1024k RandomNumberGenerator
You don't see the value of cnt being printed multiple times even though the value is greater than 1 sometimes because your print statement is also throwing error which you can debug to be sure through Eclipse or other IDEs.
You can change the code to the following to debug per statement execution if you'd prefer-
static int cnt = 0;
public static void main(String[] args) {
try {
main(args);
} catch (Throwable ignore) {
cnt++;
try {
System.out.println(cnt);
} catch (Throwable t) {
}
}
}
UPDATE:
As this getting a lot more attention, let's have another example to make things clearer-
static int cnt = 0;
public static void overflow(){
try {
overflow();
} catch (Throwable t) {
cnt++;
}
}
public static void main(String[] args) {
overflow();
System.out.println(cnt);
}
We created another method named overflow to do a bad recursion and removed the println statement from the catch block so it doesn't start throwing another set of errors while trying to print. This works as expected. You can try putting System.out.println(cnt); statement after cnt++ above and compile. Then run multiple times. Depending on your platform, you may get different values of cnt.
This is why generally we do not catch errors because mystery in code is not fantasy.

The behavior is dependent upon the stack size (which can be manually set using Xss. The stack size is architecture specific. From JDK 7 source code:
// Default stack size on Windows is determined by the executable (java.exe
// has a default value of 320K/1MB [32bit/64bit]). Depending on Windows version, changing
// ThreadStackSize to non-zero may have significant impact on memory usage.
// See comments in os_windows.cpp.
So when the StackOverflowError is thrown, the error is caught in catch block. Here println() is another stack call which throws exception again. This gets repeated.
How many times it repeates? - Well it depends on when JVM thinks it is no longer stackoverflow. And that depends on the stack size of each function call (difficult to find) and the Xss. As mentioned above default total size and size of each function call (depends on memory page size etc) is platform specific. Hence different behavior.
Calling the java call with -Xss 4M gives me 41. Hence the correlataion.

I think the number displayed is the number of time the System.out.println call throws the Stackoverflow exception.
It probably depend on the implementation of the println and the number of stacking call it is made in it.
As an illustration:
The main() call trigger the Stackoverflow exception at call i.
The i-1 call of main catch the exception and call println which trigger a second Stackoverflow. cnt get increment to 1.
The i-2 call of main catch now the exception and call println. In println a method is called triggering a 3rd exception. cnt get increment to 2.
this continue until println can make all its needed call and finally display the value of cnt.
This is then dependent of the actual implementation of println.
For the JDK7 either it detect cycling call and throws the exception earlier either it keep some stack resource and throw the exception before reaching the limit to give some room for remediation logic either the println implementation doesn't make calls either the ++ operation is done after the println call thus is by pass by the exception.

main recurses on itself until it overflows the stack at recursion depth R.
The catch block at recursion depth R-1 is run.
The catch block at recursion depth R-1 evaluates cnt++.
The catch block at depth R-1 calls println, placing cnt's old value on the stack. println will internally call other methods and uses local variables and things. All these processes require stack space.
Because the stack was already grazing the limit, and calling/executing println requires stack space, a new stack overflow is triggered at depth R-1 instead of depth R.
Steps 2-5 happen again, but at recursion depth R-2.
Steps 2-5 happen again, but at recursion depth R-3.
Steps 2-5 happen again, but at recursion depth R-4.
Steps 2-4 happen again, but at recursion depth R-5.
It so happens that there is enough stack space now for println to complete (note that this is an implementation detail, it may vary).
cnt was post-incremented at depths R-1, R-2, R-3, R-4, and finally at R-5. The fifth post-increment returned four, which is what was printed.
With main completed successfully at depth R-5, the whole stack unwinds without more catch blocks being run and the program completes.

After digging around for a while, I can't say that I find the answer, but I think it's quite close now.
First, we need to know when a StackOverflowError will be thrown. In fact, the stack for a java thread stores frames, which containing all the data needed for invoking a method and resume. According to Java Language Specifications for JAVA 6, when invoking a method,
If there is not sufficient memory available to create such an activation frame, an StackOverflowError is thrown.
Second, we should make it clear what is "there is not sufficient memory available to create such an activation frame". According to Java Virtual Machine Specifications for JAVA 6,
frames may be heap allocated.
So, when a frame is created, there should be enough heap space to create a stack frame and enough stack space to store the new reference which point to the new stack frame if the frame is heap allocated.
Now let's go back to the question. From the above, we can know that when a method is execute, it may just costs the same amount of stack space. And invoking System.out.println (may) needs 5 level of method invocation, so 5 frames need to be created. Then when StackOverflowError is thrown out, it has to go back 5 times to get enough stack space to store 5 frames' references. Hence 4 is print out. Why not 5? Because you use cnt++. Change it to ++cnt, and then you will get 5.
And you will notice that when the size of stack go to a high level, you will get 50 sometimes. That is because the amount of available heap space need to be taken into consideration then. When the stack's size is too large, maybe heap space will run out before stack. And (maybe) the actual size of stack frames of System.out.println is about 51 times of main, therefore it goes back 51 times and print 50.

This is not exactly an answer to the question but I just wanted to add something to the original question that I came across and how I understood the problem:
In the original problem the exception is caught where it was possible:
For example with jdk 1.7 it is caught at first place of occurence.
but in earlier versions of jdk it looks like the exception is not being caught at the first place of occurence hence 4, 50 etc..
Now if you remove the try catch block as following
public static void main( String[] args ){
System.out.println(cnt++);
main(args);
}
Then you will see all the values of cnt ant the thrown exceptions (on jdk 1.7).
I used netbeans to see the output, as the cmd will not show all the output and exception thrown.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.