Android: How much overhead is generated by running an empty method? - java

I have created a class to handle my debug outputs so that I don't need to strip out all my log outputs before release.
public class Debug {
public static void debug( String module, String message) {
if( Release.DEBUG )
Log.d(module, message);
}
}
After reading another question, I have learned that the contents of the if statement are not compiled if the constant Release.DEBUG is false.
What I want to know is how much overhead is generated by running this empty method? (Once the if clause is removed there is no code left in the method) Is it going to have any impact on my application? Obviously performance is a big issue when writing for mobile handsets =P
Thanks
Gary

Measurements done on Nexus S with Android 2.3.2:
10^6 iterations of 1000 calls to an empty static void function: 21s <==> 21ns/call
10^6 iterations of 1000 calls to an empty non-static void function: 65s <==> 65ns/call
10^6 iterations of 500 calls to an empty static void function: 3.5s <==> 7ns/call
10^6 iterations of 500 calls to an empty non-static void function: 28s <==> 56ns/call
10^6 iterations of 100 calls to an empty static void function: 2.4s <==> 24ns/call
10^6 iterations of 100 calls to an empty non-static void function: 2.9s <==> 29ns/call
control:
10^6 iterations of an empty loop: 41ms <==> 41ns/iteration
10^7 iterations of an empty loop: 560ms <==> 56ns/iteration
10^9 iterations of an empty loop: 9300ms <==> 9.3ns/iteration
I've repeated the measurements several times. No significant deviations were found.
You can see that the per-call cost can vary greatly depending on workload (possibly due to JIT compiling),
but 3 conclusions can be drawn:
dalvik/java sucks at optimizing dead code
static function calls can be optimized much better than non-static
(non-static functions are virtual and need to be looked up in a virtual table)
the cost on nexus s is not greater than 70ns/call (thats ~70 cpu cycles)
and is comparable with the cost of one empty for loop iteration (i.e. one increment and one condition check on a local variable)
Observe that in your case the string argument will always be evaluated. If you do string concatenation, this will involve creating intermediate strings. This will be very costly and involve a lot of gc. For example executing a function:
void empty(String string){
}
called with arguments such as
empty("Hello " + 42 + " this is a string " + count );
10^4 iterations of 100 such calls takes 10s. That is 10us/call, i.e. ~1000 times slower than just an empty call. It also produces huge amount of GC activity. The only way to avoid this is to manually inline the function, i.e. use the >>if<< statement instead of the debug function call. It's ugly but the only way to make it work.

Unless you call this from within a deeply nested loop, I wouldn't worry about it.

A good compiler removes the entire empty method, resulting in no overhead at all. I'm not sure if the Dalvik compiler already does this, but I suspect it's likely, at least since the arrival of the Just-in-time compiler with Froyo.
See also: Inline expansion

In terms of performance the overhead of generating the messages which get passed into the debug function are going to be a lot more serious since its likely they do memory allocations eg
Debug.debug(mymodule, "My error message" + myerrorcode);
Which will still occur even through the message is binned.
Unfortunately you really need the "if( Release.DEBUG ) " around the calls to this function rather than inside the function itself if your goal is performance, and you will see this in a lot of android code.

This is an interesting question and I like #misiu_mp analysis, so I thought I would update it with a 2016 test on a Nexus 7 running Android 6.0.1. Here is the test code:
public void runSpeedTest() {
long startTime;
long[] times = new long[100000];
long[] staticTimes = new long[100000];
for (int i = 0; i < times.length; i++) {
startTime = System.nanoTime();
for (int j = 0; j < 1000; j++) {
emptyMethod();
}
times[i] = (System.nanoTime() - startTime) / 1000;
startTime = System.nanoTime();
for (int j = 0; j < 1000; j++) {
emptyStaticMethod();
}
staticTimes[i] = (System.nanoTime() - startTime) / 1000;
}
int timesSum = 0;
for (int i = 0; i < times.length; i++) { timesSum += times[i]; Log.d("status", "time," + times[i]); sleep(); }
int timesStaticSum = 0;
for (int i = 0; i < times.length; i++) { timesStaticSum += staticTimes[i]; Log.d("status", "statictime," + staticTimes[i]); sleep(); }
sleep();
Log.d("status", "final speed = " + (timesSum / times.length));
Log.d("status", "final static speed = " + (timesStaticSum / times.length));
}
private void sleep() {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private void emptyMethod() { }
private static void emptyStaticMethod() { }
The sleep() was added to prevent overflowing the Log.d buffer.
I played around with it many times and the results were pretty consistent with #misiu_mp:
10^5 iterations of 1000 calls to an empty static void function: 29ns/call
10^5 iterations of 1000 calls to an empty non-static void function: 34ns/call
The static method call was always slightly faster than the non-static method call, but it would appear that a) the gap has closed significantly since Android 2.3.2 and b) there's still a cost to making calls to an empty method, static or not.
Looking at a histogram of times reveals something interesting, however. The majority of call, whether static or not, take between 30-40ns, and looking closely at the data they are virtually all 30ns exactly.
Running the same code with empty loops (commenting out the method calls) produces an average speed of 8ns, however, about 3/4 of the measured times are 0ns while the remainder are exactly 30ns.
I'm not sure how to account for this data, but I'm not sure that #misiu_mp's conclusions still hold. The difference between empty static and non-static methods is negligible, and the preponderance of measurements are exactly 30ns. That being said, it would appear that there is still some non-zero cost to running empty methods.

Related

Array / Vector as method argument

I have always read that we should use Vector everywhere in Java and that there are no performance issues, which is certainly true. I'm writing a method to calculate the MSE (Mean Squared Error) and noticed that it was very slow - I basically was passing the Vector of values. When I switched to Array, it was 10 times faster but I don't understand why.
I have written a simple test:
public static void main(String[] args) throws IOException {
Vector <Integer> testV = new Vector<Integer>();
Integer[] testA = new Integer[1000000];
for(int i=0;i<1000000;i++){
testV.add(i);
testA[i]=i;
}
Long startTime = System.currentTimeMillis();
for(int i=0;i<500;i++){
double testVal = testArray(testA, 0, 1000000);
}
System.out.println(String.format("Array total time %s ",System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
for(int i=0;i<500;i++){
double testVal = testVector(testV, 0, 1000000);
}
System.out.println(String.format("Vector total time %s ",System.currentTimeMillis() - startTime));
}
Which calls the following methods:
public static double testVector(Vector<Integer> data, int start, int stop){
double toto = 0.0;
for(int i=start ; i<stop ; i++){
toto += data.get(i);
}
return toto / data.size();
}
public static double testArray(Integer[] data, int start, int stop){
double toto = 0.0;
for(int i=start ; i<stop ; i++){
toto += data[i];
}
return toto / data.length;
}
The array one is indeed 10 times faster. Here is the output:
Array total time 854
Vector total time 9840
Can somebody explain me why ? I have searched for quite a while, but cannot figure it out. The vector method appears to be making a local copy of the vector, but I always thought that objects where passed by reference in Java.
I have always read that we should use Vector everywhere in Java and that there are no performance issues, - Wrong. A vector is thread safe and thus it needs additional logic (code) to handle access/ modification by multiple threads So, it is slow. An array on the other hand doesn't need additional logic to handle multiple threads. You should try ArrayList instead of Vector to increase the speed
Note (based on your comment): I'm running the method 500 times each
This is not the right way to measure performance / speed in java. You should atleast give a warm-up run so as to nullify the effect of JIT.
Yes, that's the eternal problem of poor microbenchmarking. The Vector itself is not SO slow.
Here is a trick:
add -XX:BiasedLockingStartupDelay=0 and now testVector "magically" runs 5 times faster than before!
Next, wrap testVector into synchronized (data) - and now it is almost as fast as testArray.
You are basically measuring the performance of object monitors in HotSpot, not the data structures.
Simple thing. Vector is thread-safe so it needs synchoronization to add and access. Use ArrayList which is also back-up by array but it is not thread-safe and faster
Note:
Please provide size of the elements if you know in advance to ArrayList. Since in normal ArrayList without initial capacity resize will happen intenally which uses Arrays copy
And a normal array and ArrayList without initial capacity performances too varies drastically if no of elements is larger
Poor code, instead of list.get() rather use an iterator on the list. The array will still be faster though.

does splitting logic into multiple methods in java slows down execution? if yes then shall we avoid re-factoring of code

http://s24.postimg.org/9y073weid/refactor_vs_non_refactor.png
Well here is the result for the execution time in nanoseconds for re-factored and non re-factored code for a simple addition operation. 1 to 5 are the consecutive runs of the code.
My intent was just to find out whether splitting up logic into multiple methods makes execution slow or not and here is the result which shows that yes there is considerable time that goes into just putting the methods on stack.
I invite people who have done some research on it before or want to investigate on this area to correct me if I am doing something wrong and draw some conclusive results out of this.
In my opinion yes code re-factoring does help in making code more structured and understandable but in time critical systems like real time game engines I would prefer not to re-factor.
Following was the simple code which I used:
package com.sim;
public class NonThreadedMethodCallBenchMark{
public static void main(String args[]){
NonThreadedMethodCallBenchMark testObject = new NonThreadedMethodCallBenchMark();
System.out.println("************Starting***************");
long startTime =System.nanoTime();
for(int i=0;i<900000;i++){
//testObject.method(1, 2); // Uncomment this line and comment the line below to test refactor time
//testObject.method5(1,2); // uncomment this line and comment the above line to test non refactor time
}
long endTime =System.nanoTime();
System.out.println("Total :" +(endTime-startTime)+" nanoseconds");
}
public int method(int a , int b){
return method1(a,b);
}
public int method1(int a, int b){
return method2(a,b);
}
public int method2(int a, int b){
return method3(a,b);
}
public int method3(int a, int b){
return method4(a,b);
}
public int method4(int a, int b){
return method5(a,b);
}
public int method5(int a, int b){
return a+b;
}
public void run() {
int x=method(1,2);
}
}
You should consider that code which doesn't do anything useful can be optimised away. If you are not careful you can be timing how long it takes to detect the code doesn't do anything useful, rather than run the code. If you use multiple methods, it can take longer to detect useless code and thus give different results. I would always look at the steady state performance after the code has warmed up.
For the most expensive parts of your code, small methods will be inlined so they won't make any difference to performance cost. What can happen is
smaller methods can be optimised better as complex methods can defeat optimisation tricks.
smaller methods can be eliminated as they are inlined.
If you don't ever warm up the code, it is likely to be slower. However, if code is called rarely, it is unlikely to matter (except in low latency system, in which case I suggest you warm up your code on startup)
If you run
System.out.println("************Starting***************");
for (int j = 0; j < 10; j++) {
long startTime = System.nanoTime();
for (int i = 0; i < 1000000; i++) {
testObject.method(1, 2);
//testObject.method5(1,2); // uncomment this line and comment the above line to test non refactor time
}
long endTime = System.nanoTime();
System.out.println("Total :" + (endTime - startTime) + " nanoseconds");
}
prints
************Starting***************
Total :8644835 nanoseconds
Total :3363047 nanoseconds
Total :52 nanoseconds
Total :30 nanoseconds
Total :30 nanoseconds
Note: the 30 nano-seconds is the time it takes to perform a System.nanoTime() call. The inner loop and the methods calls have been eliminated.
Yes, extra method calls cost extra time (except in the circumstance where the compiler/jitter actually does inlining, which I think some of them will do sometimes but under circumstances that you will find hard to control).
You shouldn't worry about this except for the most expensive part of your code, because you won't be able to see the difference in most cases. In those cases where performance doesn't matter, you should refactor to maximize code clarity.
I suggest you refactor at will, occasionally measure performance using a profiler to find the expensive parts. Only when the profiler shows that some specific code is using a significant fraction of the time, should you worry about about function calls (or other speed vs. clarity tradeoffs) in that specific code. You will discover that the performance bottlenecks are often in places that you wouldn't have guessed.

Java Reflection Performance Issue

I know there's a lot of topics talking about Reflection performance.
Even official Java docs says that Reflection is slower, but I have this code:
public class ReflectionTest {
public static void main(String[] args) throws Exception {
Object object = new Object();
Class<Object> c = Object.class;
int loops = 100000;
long start = System.currentTimeMillis();
Object s;
for (int i = 0; i < loops; i++) {
s = object.toString();
System.out.println(s);
}
long regularCalls = System.currentTimeMillis() - start;
java.lang.reflect.Method method = c.getMethod("toString");
start = System.currentTimeMillis();
for (int i = 0; i < loops; i++) {
s = method.invoke(object);
System.out.println(s);
}
long reflectiveCalls = System.currentTimeMillis() - start;
start = System.currentTimeMillis();
for (int i = 0; i < loops; i++) {
method = c.getMethod("toString");
s = method.invoke(object);
System.out.println(s);
}
long reflectiveLookup = System.currentTimeMillis() - start;
System.out.println(loops + " regular method calls:" + regularCalls
+ " milliseconds.");
System.out.println(loops + " reflective method calls without lookup:"
+ reflectiveCalls+ " milliseconds.");
System.out.println(loops + " reflective method calls with lookup:"
+ reflectiveLookup + " milliseconds.");
}
}
That I don't think is a valid benchmark, but at least should show some difference.
I executed it waiting to see the reflection normal calls being a bit slower than regular ones.
But this prints this:
100000 regular method calls:1129 milliseconds.
100000 reflective method calls without lookup:910 milliseconds.
100000 reflective method calls with lookup:994 milliseconds.
Just for note, first I executed it without that bunch of sysouts, and then I realized that some JVM optimization are just making it goes faster, so I added these printls to see if reflection was still faster.
The result without sysouts are:
100000 regular method calls:68 milliseconds.
100000 reflective method calls without lookup:48 milliseconds.
100000 reflective method calls with lookup:168 milliseconds.
I saw over internet that the same test executed on old JVMs make the reflective without lookup are two times slower than regular calls, and that speed falls over new updates.
If anyone can execute it and say me I'm wrong, or at least show me if there's something different than the past that make it faster.
Following instructions, I ran every loop separated and the result are (without sysouts)
100000 regular method calls:70 milliseconds.
100000 reflective method calls without lookup:120 milliseconds.
100000 reflective method calls with lookup:129 milliseconds.
Never performance test different bits of code in the same "run". The JVM has various optimisations that mean it though the end result is the same, how the internals are performed may differ. In more concrete terms, during your test the JVM may have noticed you are calling Object.toString a lot and have started to inline the method calls to Object.toString. It may have started to perform loop unfolding. Or there could have been a garbage collection in the first loop but not the second or third loops.
To get a more meaningful, but still not totally accurate picture you should separate your test into three separate programs.
The results on my computer (with no printing and 1,000,000 runs each)
All three loops run in same program
1000000 regular method calls: 490 milliseconds.
1000000 reflective method calls without lookup: 393 milliseconds.
1000000 reflective method calls with loopup: 978 milliseconds.
Loops run in separate programs
1000000 regular method calls: 475 milliseconds.
1000000 reflective method calls without lookup: 555 milliseconds.
1000000 reflective method calls with loopup: 1160 milliseconds.
There's an article by Brian Goetz on microbenchmarks that's worth reading. It looks like you're not doing anything to warm up the JVM (meaning give it a chance to do whatever inlining or other optimizations it's going to do) before doing your measurements, so it's likely the non-reflective test is still not warmed-up yet, and that could skew your numbers.
When you have multiple long running loops, the first loop can trigger the method to compile resulting in the later loops being optimised from the start. However the optimisation can be sub-optimal as it has no runtime information for those loops. The toString is relatively expensive and couple be taking longer than the reflections calls.
You don't need separate programs to avoid loop being optimised due to an earlier loop. You can run them in different methods.
The results I get are
Average regular method calls:2 ns.
Average reflective method calls without lookup:10 ns.
Average reflective method calls with lookup:240 ns.
The code
import java.lang.reflect.Method;
public class ReflectionTest {
public static void main(String[] args) throws Exception {
int loops = 1000 * 1000;
Object object = new Object();
long start = System.nanoTime();
Object s;
testMethodCall(object, loops);
long regularCalls = System.nanoTime() - start;
java.lang.reflect.Method method = Object.class.getMethod("getClass");
method.setAccessible(true);
start = System.nanoTime();
testInvoke(object, loops, method);
long reflectiveCalls = System.nanoTime() - start;
start = System.nanoTime();
testGetMethodInvoke(object, loops);
long reflectiveLookup = System.nanoTime() - start;
System.out.println("Average regular method calls:"
+ regularCalls / loops + " ns.");
System.out.println("Average reflective method calls without lookup:"
+ reflectiveCalls / loops + " ns.");
System.out.println("Average reflective method calls with lookup:"
+ reflectiveLookup / loops + " ns.");
}
private static Object testMethodCall(Object object, int loops) {
Object s = null;
for (int i = 0; i < loops; i++) {
s = object.getClass();
}
return s;
}
private static Object testInvoke(Object object, int loops, Method method) throws Exception {
Object s = null;
for (int i = 0; i < loops; i++) {
s = method.invoke(object);
}
return s;
}
private static Object testGetMethodInvoke(Object object, int loops) throws Exception {
Method method;
Object s = null;
for (int i = 0; i < loops; i++) {
method = Object.class.getMethod("getClass");
s = method.invoke(object);
}
return s;
}
}
Micro-benchmarks like this are never going to be accurate at all - as the VM "warms up" it'll inline bits of code and optimise bits of code as it goes along, so the same thing executed 2 minutes into a program could vastly outperform it right at the start.
In terms of what's happening here, my guess is that the first "normal" method call block warms it up, so the reflective blocks (and indeed all subsequent calls) would be faster. The only overhead added through reflectively calling a method that I can see is looking up the pointer to that method, which is a nanosecond-scale operation anyway and would be easily cached by the JVM. The rest would be on how the VM is warmed up, which it is by the time you reach the reflective calls.
There is no inherent reason why reflective call should be slower than a normal call. JVM can optimize them into the same thing.
Practically, human resources are limited, and they had to optimize normal calls first. As time passes by they can work on optimizing reflective calls; especially when reflection becomes more and more popular.
I have been writing my own micro-benchmark, without loops, and with System.nanoTime():
public static void main(String[] args) throws NoSuchMethodException, IllegalArgumentException, IllegalAccessException, InvocationTargetException
{
Object obj = new Object();
Class<Object> objClass = Object.class;
String s;
long start = System.nanoTime();
s = obj.toString();
long directInvokeEnd = System.nanoTime();
System.out.println(s);
long methodLookupStart = System.nanoTime();
java.lang.reflect.Method method = objClass.getMethod("toString");
long methodLookupEnd = System.nanoTime();
s = (String) (method.invoke(obj));
long reflectInvokeEnd = System.nanoTime();
System.out.println(s);
System.out.println(directInvokeEnd - start);
System.out.println(methodLookupEnd - methodLookupStart);
System.out.println(reflectInvokeEnd - methodLookupEnd);
}
I have been executing that in Eclipse on my machine a dozen times, and the results vary quite a bit, but here is what I typically get:
the direct method invocation clocks at 40-50 microseconds
method lookup clocks at 150-200 microseconds
reflective invocation with the method variable clocks at 250-310 microseconds.
Now, do not forget the caveats on microbenchmarks described in Nathan's reply - there are certainly a lot of flaws in that micro benchmark - and trust the documentation if they say that reflection is a LOT slower than direct invocation.
It strikes me that you have placed a "System.out.println(s)" call inside your inner benchmark loop.
Since performing IO is bound to be slow, it actually "swallows up" your benchmark and the overhead of the invoke becomes negligible.
Try removing the "println()" call and running code like this, I'm sure you'd be surprised by the result (some of the silly calculations are needed to avoid the compiler optimizing away the calls altogether):
public class Experius
{
public static void main(String[] args) throws Exception
{
Experius a = new Experius();
int count = 10000000;
int v = 0;
long tm = System.currentTimeMillis();
for ( int i = 0; i < count; ++i )
{
v = a.something(i + v);
++v;
}
tm = System.currentTimeMillis() - tm;
System.out.println("Time: " + tm);
tm = System.currentTimeMillis();
Method method = Experius.class.getMethod("something", Integer.TYPE);
for ( int i = 0; i < count; ++i )
{
Object o = method.invoke(a, i + v);
++v;
}
tm = System.currentTimeMillis() - tm;
System.out.println("Time: " + tm);
}
public int something(int n)
{
return n + 5;
}
}
-- TR
Even if you look up the method in both cases (i.e. before 2nd and 3rd loop),
the first lookup takes way less time than the second lookup, which should have been the other way around and less than a regular method call on my machine.
Neverthless, if you use the 2nd loop with method lookup, and System.out.println statement, I get this:
regular call : 740 ms
look up(2nd loop) : 640 ms
look up ( 3rd loop) : 800 ms
Without System.out.println statement, I get:
regular call : 78 ms
look up (2nd) : 37 ms
look up (3rd ) : 112 ms

Can this code be more efficient?

This Program should do this
N 10*N 100*N 1000*N
1 10 100 1000
2 20 200 2000
3 30 300 3000
4 40 400 4000
5 50 500 5000
So here's my code:
public class ex_4_21 {
public static void main( String Args[] ){
int process = 1;
int process2 = 1;
int process22 = 1;
int process3 = 1;
int process33 = 2;
System.out.println("N 10*N 100*N 1000*N");
while(process<=5){
while(process2<=3){
System.out.printf("%d ",process2);
while(process22<=3){
process2 = process2 * 10;
System.out.printf("%d ",process2);
process22++;
}
process2++;
}
process++;
}
}
}
Can my code be more effecient? I am currently learning while loops. So far this what I got. Can anyone make this more efficient, or give me ideas on how to make my code more efficient?
This is not a homework, i am self studying java
You can use a single variable n to do this.
while(n is less than the maximum value that you wish n to be)
print n and a tab
print n * 10 and a tab
print n * 100 and a tab
print n * 1000 and a new line
n++
if the power of 10 is variable then you can try this:
while(n is less than the maximum value that you wish n to be)
while(i is less than the max power of ten)
print n * i * 10 and a tab
i++
print a newline
n++
If you must use a while loop
public class ex_4_21 {
public static void main( String Args[] ){
int process = 1;
System.out.println("N 10*N 100*N 1000*N");
while(process<=5){
System.out.println(process + " " + 10*process + " " + 100*process + " " + 1000*process + "\n");
process++;
}
}
}
You have one too many while loops (your "process2" while loop is unnecessary). You also appear to have some bugs related to the fact that the variables you are looping on in the inner loops are not re-initialized with each iteration.
I would also recommend against while loops for this; Your example fits a for loop much better; I understand you are trying to learn the looping mechanism, but part of learning should also be in deciding when to use which construct. This really isn't a performance recommendation, more an approach recommendation.
I don't have any further performance improvement suggestions, for what you are trying to do; You could obviously remove loops (dropping down to a single or even no loops), but two loops makes sense for what you are doing (allows you to easily add another row or column to the output with minimal changes).
You can try loop unrolling, similar to #Vincent Ramdhanie's answer.
However, loop unrolling and threading won't produce a significant performance improvement for such a small sample. The overhead involved in creating and launching threads (processes) takes more time than a simple while loop. The overhead in I/O will take more time than the unrolled version saves. A complex program is harder to debug and maintain than a simple one.
You're thinking is called microoptimization. Save the optimizations for larger programs and only when the requirements cannot be met or the customer(s) demand so.

Code inside thread slower than outside thread..?

I'm trying to alter some code so it can work with multithreading. I stumbled upon a performance loss when putting a Runnable around some code.
For clarification: The original code, let's call it
//doSomething
got a Runnable around it like this:
Runnable r = new Runnable()
{
public void run()
{
//doSomething
}
}
Then I submit the runnable to a ChachedThreadPool ExecutorService. This is my first step towards multithreading this code, to see if the code runs as fast with one thread as the original code.
However, this is not the case. Where //doSomething executes in about 2 seconds, the Runnable executes in about 2.5 seconds. I need to mention that some other code, say, //doSomethingElse, inside a Runnable had no performance loss compared to the original //doSomethingElse.
My guess is that //doSomething has some operations that are not as fast when working in a Thread, but I don't know what it could be or what, in that aspect is the difference with //doSomethingElse.
Could it be the use of final int[]/float[] arrays that makes a Runnable so much slower? The //doSomethingElse code also used some finals, but //doSomething uses more. This is the only thing I could think of.
Unfortunately, the //doSomething code is quite long and out-of-context, but I will post it here anyway. For those who know the Mean Shift segmentation algorithm, this a part of the code where the mean shift vector is being calculated for each pixel. The for-loop
for(int i=0; i<L; i++)
runs through each pixel.
timer.start(); // this is where I start the timer
// Initialize mode table used for basin of attraction
char[] modeTable = new char [L]; // (L is a class property and is about 100,000)
Arrays.fill(modeTable, (char)0);
int[] pointList = new int [L];
// Allcocate memory for yk (current vector)
double[] yk = new double [lN]; // (lN is a final int, defined earlier)
// Allocate memory for Mh (mean shift vector)
double[] Mh = new double [lN];
int idxs2 = 0; int idxd2 = 0;
for (int i = 0; i < L; i++) {
// if a mode was already assigned to this data point
// then skip this point, otherwise proceed to
// find its mode by applying mean shift...
if (modeTable[i] == 1) {
continue;
}
// initialize point list...
int pointCount = 0;
// Assign window center (window centers are
// initialized by createLattice to be the point
// data[i])
idxs2 = i*lN;
for (int j=0; j<lN; j++)
yk[j] = sdata[idxs2+j]; // (sdata is an earlier defined final float[] of about 100,000 items)
// Calculate the mean shift vector using the lattice
/*****************************************************/
// Initialize mean shift vector
for (int j = 0; j < lN; j++) {
Mh[j] = 0;
}
double wsuml = 0;
double weight;
// find bucket of yk
int cBucket1 = (int) yk[0] + 1;
int cBucket2 = (int) yk[1] + 1;
int cBucket3 = (int) (yk[2] - sMinsFinal) + 1;
int cBucket = cBucket1 + nBuck1*(cBucket2 + nBuck2*cBucket3);
for (int j=0; j<27; j++) {
idxd2 = buckets[cBucket+bucNeigh[j]]; // (buckets is a final int[] of about 75,000 items)
// list parse, crt point is cHeadList
while (idxd2>=0) {
idxs2 = lN*idxd2;
// determine if inside search window
double el = sdata[idxs2+0]-yk[0];
double diff = el*el;
el = sdata[idxs2+1]-yk[1];
diff += el*el;
//...
idxd2 = slist[idxd2]; // (slist is a final int[] of about 100,000 items)
}
}
//...
}
timer.end(); // this is where I stop the timer.
There is more code, but the last while loop was where I first noticed the difference in performance.
Could anyone think of a reason why this code runs slower inside a Runnable than original?
Thanks.
Edit: The measured time is inside the code, so excluding startup of the thread.
All code always runs "inside a thread".
The slowdown you see is most likely caused by the overhead that multithreading adds. Try parallelizing different parts of your code - the tasks should neither be too large, nor too small. For example, you'd probably be better off running each of the outer loops as a separate task, rather than the innermost loops.
There is no single correct way to split up tasks, though, it all depends on how the data looks and what the target machine looks like (2 cores, 8 cores, 512 cores?).
Edit: What happens if you run the test repeatedly? E.g., if you do it like this:
Executor executor = ...;
for (int i = 0; i < 10; i++) {
final int lap = i;
Runnable r = new Runnable() {
public void run() {
long start = System.currentTimeMillis();
//doSomething
long duration = System.currentTimeMillis() - start;
System.out.printf("Lap %d: %d ms%n", lap, duration);
}
};
executor.execute(r);
}
Do you notice any difference in the results?
I personally do not see any reason for this. Any program has at least one thread. All threads are equal. All threads are created by default with medium priority (5). So, the code should show the same performance in both the main application thread and other thread that you open.
Are you sure you are measuring the time of "do something" and not the overall time that your program runs? I believe that you are measuring the time of operation together with the time that is required to create and start the thread.
When you create a new thread you always have an overhead. If you have a small piece of code, you may experience performance loss.
Once you have more code (bigger tasks) you make get a performance improvement by your parallelization (the code on the thread will not necessarily run faster, but you are doing two thing at once).
Just a detail: this decision of how big small can a task be so parallelizing it is still worth is a known topic in parallel computation :)
You haven't explained exactly how you are measuring the time taken. Clearly there are thread start-up costs but I infer that you are using some mechanism that ensures that these costs don't distort your picture.
Generally speaking when measuring performance it's easy to get mislead when measuring small pieces of work. I would be looking to get a run of at least 1,000 times longer, putting the whole thing in a loop or whatever.
Here the one different between the "No Thread" and "Threaded" cases is actually that you have gone from having one Thread (as has been pointed out you always have a thread) and two threads so now the JVM has to mediate between two threads. For this kind of work I can't see why that should make a difference, but it is a difference.
I would want to be using a good profiling tool to really dig into this.

Categories