Multi-Threading Mechanics? - java

I've been experimenting with multithreading and I'm really confused about a test I did.
I had been doing research and everything was talking about how multithreading can allow 2 processes to run at the same time.
I made this program so 3 different threads would use a for loop to count 1-10, 11-20, and 21-30 so that I could see if they actually ran at the same time how I expected them to.
After running the program the output is something like 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25 26 27 28 29 30 11 12 13 14 15 16 17 18 19 20
Basically any variation order of the 3 sets of numbers. So they can all be in order or have 21-30 before 11-20 sometimes. That doesn't seem like it is running at the same time, just running one after another.
In the for loop in the println(i); if i change it to println(i + "a"); for 1-10 and b, c for 11-20, 21-30 The output is actually in a random order like I had expected. Like this: 1a 11b 21c 2a 22c 12b 13b 23c 3a
Does the program know its doing nothing but counting up and just throw all the numbers on the screen without actually doing it? Or does adding the string at the end make it slow enough for the other threads to sneak in between the operations? I know nothing about this ha.
public class Run{
static Runnable updatePosition;
static Runnable render;
static Runnable checkCollisions;
public static void main(String[] args) {
// System.out.println(Runtime.getRuntime().availableProcessors());
updatePosition = new Runnable() {
#Override
public void run() {
for (int i = 1; i <= 10; i++) {
System.out.println(i);
}
}
};
render = new Runnable() {
#Override
public void run() {
for (int i = 11; i <= 20; i++) {
System.out.println(i);
}
}
};
checkCollisions = new Runnable() {
#Override
public void run() {
for (int i = 21; i <= 30; i++) {
System.out.println(i);
}
}
};
Thread updatePositionThread = new Thread(updatePosition);
Thread renderThread = new Thread(render);
Thread checkCollisionsThread = new Thread(checkCollisions);
updatePositionThread.start();
renderThread.start();
checkCollisionsThread.start();
}
}
Also, how do threads get assigned to CPU cores? In depth please, reasonably. What I mean by asking this is: If I were to use a single thread program and use an update method and a draw method and they together took too long and made my program lag, would putting them on separate threads make this not help, or does it not actually run side by side? Assuming I can deal with all of the concurrency.

Actually, JIT optimizations, processor architecture and several other things play a part in how these kind of situations turn out.
The actual output should not be an ordered execution of threads like 1-10, 21-30, 11-20.
Changing your code a little to :
for (int i = 1; i <= 10000; i++)
for (int i = 10001; i <= 20000; i++)
for (int i = 20001; i <= 30000; i++)
I get (as expected, complete execution of one thread is not happening, as one might assume in your case). It is all about how much time a thread gets.
1
2
3
...
250
251
252
253
20001
20002
20003
...
20127
20128
10001
10002
10003
..
Changing i to "a" + i leads to dynamic construction of new Strings using StringBuilder, this will indeed take some time (and hence CPU cycles). On the other hand, primitive ints don't have this delay.. So, you get such an output.

Related

Multi threaded matrix multiplication performance issue

I am using java for multi threaded multiplication. I am practicing multi threaded programming. Following is the code that I took from another post of stackoverflow.
public class MatMulConcur {
private final static int NUM_OF_THREAD =1 ;
private static Mat matC;
public static Mat matmul(Mat matA, Mat matB) {
matC = new Mat(matA.getNRows(),matB.getNColumns());
return mul(matA,matB);
}
private static Mat mul(Mat matA,Mat matB) {
int numRowForThread;
int numRowA = matA.getNRows();
int startRow = 0;
Worker[] myWorker = new Worker[NUM_OF_THREAD];
for (int j = 0; j < NUM_OF_THREAD; j++) {
if (j<NUM_OF_THREAD-1){
numRowForThread = (numRowA / NUM_OF_THREAD);
} else {
numRowForThread = (numRowA / NUM_OF_THREAD) + (numRowA % NUM_OF_THREAD);
}
myWorker[j] = new Worker(startRow, startRow+numRowForThread,matA,matB);
myWorker[j].start();
startRow += numRowForThread;
}
for (Worker worker : myWorker) {
try {
worker.join();
} catch (InterruptedException e) {
}
}
return matC;
}
private static class Worker extends Thread {
private int startRow, stopRow;
private Mat matA, matB;
public Worker(int startRow, int stopRow, Mat matA, Mat matB) {
super();
this.startRow = startRow;
this.stopRow = stopRow;
this.matA = matA;
this.matB = matB;
}
#Override
public void run() {
for (int i = startRow; i < stopRow; i++) {
for (int j = 0; j < matB.getNColumns(); j++) {
double sum = 0;
for (int k = 0; k < matA.getNColumns(); k++) {
sum += matA.get(i, k) * matB.get(k, j);
}
matC.set(i, j, sum);
}
}
}
}
I ran this program for 1,10,20,...,100 threads but performance is decreasing instead. Following is the time table
Thread 1 takes 18 Milliseconds
Thread 10 takes 18 Milliseconds
Thread 20 takes 35 Milliseconds
Thread 30 takes 38 Milliseconds
Thread 40 takes 43 Milliseconds
Thread 50 takes 48 Milliseconds
Thread 60 takes 57 Milliseconds
Thread 70 takes 66 Milliseconds
Thread 80 takes 74 Milliseconds
Thread 90 takes 87 Milliseconds
Thread 100 takes 98 Milliseconds
Any Idea?
People think that using multiple threads will automatically (magically!) make any computation go faster. This is not so1.
There are a number of factors that can make multi-threading speedup less than you expect, or indeed result in a slowdown.
A computer with N cores (or hyperthreads) can do computations at most N times as fast as a computer with 1 core. This means that when you have T threads where T > N, the computational performance will be capped at N. (Beyond that, the threads make progress because of time slicing.)
A computer has a certain amount of memory bandwidth; i.e. it can only perform a certain number of read/write operations per second on main memory. If you have an application where the demand exceeds what the memory subsystem can achieve, it will stall (for a few nanoseconds). If there are many cores executing many threads at the same time, then it is the aggregate demand that matters.
A typical multi-threaded application working on shared variables or data structures will either use volatile or explicit synchronization to do this. Both of these increase the demand on the memory system.
When explicit synchronization is used and two threads want to hold a lock at the same time, one of them will be blocked. This lock contention slows down the computation. Indeed, the computation is likely to be slowed down if there was past contention on the lock.
Thread creation is expensive. Even acquiring an existing thread from a thread pool can be relatively expensive. If the task that you perform with the thread is too small, the setup costs can outweigh the possible speedup.
There is also the issue that you may be running into problems with a poorly written benchmark; e.g. the JVM may not be properly warmed up before taking the timing measurements.
There is insufficient detail in your question to be sure which of the above factors is likely to affect your application's performance. But it is likely to be a combination of 1 2 and 5 ... depending on how many cores are used, how big the CPUs memory caches are, how big the matrix is, and other factors.
1 - Indeed, if this was true then we would not need to buy computers with lots of cores. We could just use more and more threads. Provided you had enough memory, you could do an infinite amount of computation on a single machine. Bitcoin mining would be a doddle. Of course, it isn't true.
Using multi-threading is not primarily for performance, but for parallelization. There are cases where parallelization can benefit performance, though.
Your computer doesn't have infinite resources. Adding more and more threads will decrease performance. It's like starting more and more applications, you wouldn't expect a program to run faster when you start another program, and you probably wouldn't be surprised if it runs slower.
Up to a certain point performance will remain constant (your computer still has resources to handle the demand), but at some point you reach the maximum your computer can handle and performance will drop. That's exactly what your result shows. Performance stays somewhat constant with 1 or 10 threads, and then drops steadily.

Java - Difference between Java 8 parallelStream and creating threads ourselves

I was trying to find the difference between using Java 8's parallelStream(method1) and creating parallel threads(method2)
I measured the time taken when using method 1 and method 2. But I found a huge deviation. Method2(~700ms) is way faster than method1(~20sec)
Method 1: (list has about 100 entries)
list.parallelStream()
.forEach(ele -> {
//Do something.
}));
Method 2:
for(i = 0;i < 100; i++) {
Runnable task = () -> {
//Do something.
}
Thread thread = new Thread(task);
thread.start();
}
NOTE: Do something is an expensive operation like hitting a Database.
I added System.out.println() messages to both. I found that method 1(parallelStream) appeared to be executing sequentially while in method 2 the messages were printed very fast.
Can anyone explain what is happening.
Can anyone explain what is happening.
Most likely you are doing something wrong but it's not clear what.
for (int i = 0; i < 3; i++) {
long start = System.currentTimeMillis();
IntStream.range(0, 100).parallel()
.forEach(ele -> {
try {
Thread.sleep(100);
} catch (InterruptedException ignored) {
}
});
long time = System.currentTimeMillis() - start;
System.out.printf("Took %,d ms to perform 100 tasks of 100 ms on %d processors%n",
time, Runtime.getRuntime().availableProcessors());
}
prints
Took 475 ms to perform 100 tasks of 100 ms on 32 processors
Took 401 ms to perform 100 tasks of 100 ms on 32 processors
Took 401 ms to perform 100 tasks of 100 ms on 32 processors

Aparapi GPU execution slower than CPU

I am trying to test the performance of Aparapi.
I have seen some blogs where the results show that Aparapi does improve the performance while doing data parallel operations.
But I am not able to see that in my tests. Here is what I did, I wrote two programs, one using Aparapi, the other one using normal loops.
Program 1: In Aparapi
import com.amd.aparapi.Kernel;
import com.amd.aparapi.Range;
public class App
{
public static void main( String[] args )
{
final int size = 50000000;
final float[] a = new float[size];
final float[] b = new float[size];
for (int i = 0; i < size; i++) {
a[i] = (float) (Math.random() * 100);
b[i] = (float) (Math.random() * 100);
}
final float[] sum = new float[size];
Kernel kernel = new Kernel(){
#Override public void run() {
int gid = getGlobalId();
sum[gid] = a[gid] + b[gid];
}
};
long t1 = System.currentTimeMillis();
kernel.execute(Range.create(size));
long t2 = System.currentTimeMillis();
System.out.println("Execution mode = "+kernel.getExecutionMode());
kernel.dispose();
System.out.println(t2-t1);
}
}
Program 2: using loops
public class App2 {
public static void main(String[] args) {
final int size = 50000000;
final float[] a = new float[size];
final float[] b = new float[size];
for (int i = 0; i < size; i++) {
a[i] = (float) (Math.random() * 100);
b[i] = (float) (Math.random() * 100);
}
final float[] sum = new float[size];
long t1 = System.currentTimeMillis();
for(int i=0;i<size;i++) {
sum[i]=a[i]+b[i];
}
long t2 = System.currentTimeMillis();
System.out.println(t2-t1);
}
}
Program 1 takes around 330ms whereas Program 2 takes only around 55ms.
Am I doing something wrong here? I did printout the execution mode in Aparpai program and it prints that the mode of execution is GPU
You did not do anything wrong - execpt for the benchmark itself.
Benchmarking is always tricky, and particularly for the cases where a JIT is involved (as for Java), and for libraries where many nitty-gritty details are hidden from the user (as for Aparapi). And in both cases, you should at least execute the code section that you want to benchmark multiple times.
For the Java version, one might expect the computation time for a single execution of the loop to decrease when the loop itself it is executed multiple times, due to the JIT kicking in. There are many additional caveats to consider - for details, you should refer to this answer. In this simple test, the effect of the JIT may not really be noticable, but in more realistic or complex scenarios, this will make a difference. Anyhow: When repeating the loop for 10 times, the time for a single execution of the loop on my machine was about 70 milliseconds.
For the Aparapi version, the point of possible GPU initialization was already mentioned in the comments. And here, this is indeed the main problem: When running the kernel 10 times, the timings on my machine are
1248
72
72
72
73
71
72
73
72
72
You see that the initial call causes all the overhead. The reason for this is that, during the first call to Kernel#execute(), it has to do all the initializations (basically converting the bytecode to OpenCL, compile the OpenCL code etc.). This is also mentioned in the documentation of the KernelRunner class:
The KernelRunner is created lazily as a result of calling Kernel.execute().
The effect of this - namely, a comparatively large delay for the first execution - has lead to this question on the Aparapi mailing list: A way to eagerly create KernelRunners. The only workaround suggested there was to create an "initialization call" like
kernel.execute(Range.create(1));
without a real workload, only to trigger the whole setup, so that the subsequent calls are fast. (This also works for your example).
You may have noticed that, even after the initialization, the Aparapi version is still not faster than the plain Java version. The reason for that is that the task of a simple vector addition like this is memory bound - for details, you may refer to this answer, which explains this term and some issues with GPU programming in general.
As an overly suggestive example for a case where you might benefit from the GPU, you might want to modify your test, in order to create an artificial compute bound task: When you change the kernel to involve some expensive trigonometric functions, like this
Kernel kernel = new Kernel() {
#Override
public void run() {
int gid = getGlobalId();
sum[gid] = (float)(Math.cos(Math.sin(a[gid])) + Math.sin(Math.cos(b[gid])));
}
};
and the plain Java loop version accordingly, like this
for (int i = 0; i < size; i++) {
sum[i] = (float)(Math.cos(Math.sin(a[i])) + Math.sin(Math.cos(b[i])));;
}
then you will see a difference. On my machine (GeForce 970 GPU vs. AMD K10 CPU) the timings are about 140 milliseconds for the Aparapi version, and a whopping 12000 milliseconds for the plain Java version - that's a speedup of nearly 90 through Aparapi!
Also note that even in CPU mode, Aparapi may offer an advantage compared to plain Java. On my machine, in CPU mode, Aparapi needs only 2300 milliseconds, because it still parallelizes the execution using a Java thread pool.
Just add before main loop kernel execution
kernel.setExplicit(true);
kernel.put(a);
kernel.put(b);
and
kernel.get(sum);
after it.
Although Aparapi does analyze the byte code of the Kernel.run()
method (and any method reachable from Kernel.run()) Aparapi has no
visibility to the call site. In the above code there is no way for
Aparapi to detect that that hugeArray is not modified within the for
loop body. Unfortunately, Aparapi must default to being ‘safe’ and
copy the contents of hugeArray backwards and forwards to the GPU
device.
https://github.com/aparapi/aparapi/blob/master/doc/ExplicitBufferHandling.md

Finding Prime numbers n...n using a ThreadPool

So as the title suggests I am trying to find all the primes from 0 to MAX_LIMIT
sample input: javac Main.java 8 100
this means create 8 threads and find primes from 0 to 100, including 100.
my program takes two command line arguments: the first is the number of threads, the second is the range of primes (0 to n).
sample output:
Prime Number: 2 Thread #: 13
Prime Number: 7 Thread #: 15
Prime Number: 7 Thread #: 16
Prime Number: 11 Thread #: 18
:
Then the system will hang and ill have to stop the process:
Process finished with exit code 137
My question is:
Why does my thread pool go over its limit (thread numbers like 13 or 16, instead of 1-8)
and how can I make the threads not all calculate the same number at the same time?
I'm thinking of using a cache of some sort like adding numbers to an array list or something
but I do not know if that would be the correct approach to use.
It is possible that I am misunderstanding what a ThreadPool is and am in fact using something completely unrelated to it.
I am also unsure of why it is hanging and not printing all the primes from 0 to 100 in this case.
If there is an easier way to do what I am trying to do I would be interested in hearing it.
I'll be here working on this and will check back on this thread frequently.
Yes this is homework for an operating systems class about threads, I wouldn't normally ask for help but I am at a loss. All Code is located in one file.
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main {
private static int MAX_THREADS;
private static int MAX_LIMIT;
private static int numToTest = 0;
public static void main(String[] args) {
int max_threads = Integer.parseInt(args[0]);
int max_limit = Integer.parseInt(args[1]);
MAX_THREADS = max_threads;
MAX_LIMIT = max_limit;
Foo();
}
private static void Foo() {
class PrimeNumberGen implements Runnable {
int num = numToTest;
PrimeNumberGen(int n) {num = n;}
boolean isPrime(int n) { //first test is 0
if(n<2) return false;
if(n==2) return true;
if(n%2==0) return false;
int max = n/2;
for(int i=3; i< max; i=i+2) {
if (n % i == 0)
return false;
}
return true;
}
public void run() {
numToTest++;
if(isPrime(num)) {
System.out.println("Prime Number: "+num+" Thread #:
"+Thread.currentThread().getId());
}
else {
numToTest++;
}
}
}
//Thread t = new Thread(new PrimeNumberGen(num));
//t.start();
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS);
for (int i = 0;i <= MAX_LIMIT; i++) {
Runnable worker = new PrimeNumberGen(numToTest);
executor.execute(worker);
}
}
}
Your Thread id is a unique number of a thread. This can start at any number and doesn't have to be sequential. Over the life of a thread pool you can have more than the maximum number of threads, but no more than the maximum at any time.
BTW If you have to find multiple primes, using a Sieve of Eratosthenes will be much faster as it is a lower time complexity. It is usually single threaded, but it will still be faster.
Regarding the second part of your question, take a look at the Sieve of Eratosthenes.
Change
Runnable worker = new PrimeNumberGen(numToTest); to
Runnable worker = new PrimeNumberGen(i);
You can actually throw away this numToTest variable it's not needed anymore.
The problem for the duplicate prime numbers is that the threads do not see the updates of the other thread all the time, e.g.
Prime Number: 7 Thread #: 15
Prime Number: 7 Thread #: 16 (Thread 16 does not see the values from thread 15 perhaps they are running on different cores)
is because numToTest++; is not thread safe since numToTest is not volatile and the operation ++ is not atomic. I wrote a blog entry under http://blog.vmlens.com/2013/08/18/java-race-conditions-or-how-to-find-an-irreproducable-bug/ to explain this type of bug.
One solution would be to use AtomicInteger, see http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicInteger.html.
Your program seams to hang since you did not stop the thread pool. See How to stop the execution of Executor ThreadPool in java? how to do this.
Regarding the thread pool going over its limit,
Change
System.out.println("Prime Number: "+num+" Thread #:
"+Thread.currentThread().getId());
to
System.out.println("Prime Number: "+num+" Thread #:
"+Thread.currentThread().getName());
The thread ID is a positive long number generated when this thread was created, not the actual thread number; calling getName() will outputs something like
pool-1-thread-3
Ref : https://www.tutorialspoint.com/java/lang/thread_getid.htm

Adding numbers using Java Long wrapper versus primitive longs

I am running this code and getting unexpected results. I expect that the loop which adds the primitives would perform much faster, but the results do not agree.
import java.util.*;
public class Main {
public static void main(String[] args) {
StringBuilder output = new StringBuilder();
long start = System.currentTimeMillis();
long limit = 1000000000; //10^9
long value = 0;
for(long i = 0; i < limit; ++i){}
long i;
output.append("Base time\n");
output.append(System.currentTimeMillis() - start + "ms\n");
start = System.currentTimeMillis();
for(long j = 0; j < limit; ++j) {
value = value + j;
}
output.append("Using longs\n");
output.append(System.currentTimeMillis() - start + "ms\n");
start = System.currentTimeMillis();
value = 0;
for(long k = 0; k < limit; ++k) {
value = value + (new Long(k));
}
output.append("Using Longs\n");
output.append(System.currentTimeMillis() - start + "ms\n");
System.out.print(output);
}
}
Output:
Base time
359ms
Using longs
1842ms
Using Longs
614ms
I have tried running each individual test in it's own java program, but the results are the same. What could cause this?
Small detail: running java 1.6
Edit:
I asked 2 other people to try out this code, one gets the same exact strange results that I get. The other gets results that actually make sense! I asked the guy who got normal results to give us his class binary. We run it and we STILL get the strange results. The problem is not at compile time (I think). I'm running 1.6.0_31, the guy who gets normal results is on 1.6.0_16, the guy who gets strange results like I do is on 1.7.0_04.
Edit: Get same results with a Thread.sleep(5000) at the start of program. Also get the same results with a while loop around the whole program (to see if the times would converge to normal times after java was fully started up)
I suspect that this is a JVM warmup effect. Specifically, the code is being JIT compiled at some point, and this is distorting the times that you are seeing.
Put the whole lot in a loop, and ignore the times reported until they stabilize. (But note that they won't entirely stabilize. Garbage is being generated, and therefore the GC will need to kick occasionally. This is liable to distort the timings, at least a bit. The best way to deal with this is to run a huge number of iterations of the outer loop, and calculate / display the average times.)
Another problem is that the JIT compiler on some releases of Java may be able to optimize away the stuff you are trying to test:
It could figure out that the creation and immediate unboxing of the Long objects could be optimized away. (Thanks Louis!)
It could figure out that the loops are doing "busy work" ... and optimize them away entirely. (The value of value is not used once each loop ends.)
FWIW, it is generally recommended that you use Long.valueOf(long) rather than new Long(long) because the former can make use of a cached Long instance. However, in this case, we can predict that there will be a cache miss in all but the first few loop iterations, so the recommendation is not going to help. If anything, it is likely to make the loop in question slower.
UPDATE
I did some investigation of my own, and ended up with the following:
import java.util.*;
public class Main {
public static void main(String[] args) {
while (true) {
test();
}
}
private static void test() {
long start = System.currentTimeMillis();
long limit = 10000000; //10^9
long value = 0;
for(long i = 0; i < limit; ++i){}
long t1 = System.currentTimeMillis() - start;
start = System.currentTimeMillis();
for(long j = 0; j < limit; ++j) {
value = value + j;
}
long t2 = System.currentTimeMillis() - start;
start = System.currentTimeMillis();
for(long k = 0; k < limit; ++k) {
value = value + (new Long(k));
}
long t3 = System.currentTimeMillis() - start;
System.out.print(t1 + " " + t2 + " " + t3 + " " + value + "\n");
}
}
which gave me the following output.
28 58 2220 99999990000000
40 58 2182 99999990000000
36 49 157 99999990000000
34 51 157 99999990000000
37 49 158 99999990000000
33 52 158 99999990000000
33 50 159 99999990000000
33 54 159 99999990000000
35 52 159 99999990000000
33 52 159 99999990000000
31 50 157 99999990000000
34 51 156 99999990000000
33 50 159 99999990000000
Note that the first two columns are pretty stable, but the third one shows a significant speedup on the 3rd iteration ... probably indicating that JIT compilation has occurred.
Interestingly, before I separated out the test into a separate method, I didn't see the speedup on the 3rd iteration. The numbers all looked like the first two rows. And that seems to be saying that the JVM (that I'm using) won't JIT compile a method that is currently executing ... or something like that.
Anyway, this demonstrates (to me) that there should be a warm up effect. If you don't see a warmup effect, your benchmark is doing something that is inhibiting JIT compilation ... and therefore isn't meaningful for real applications.
I'm surprised, too.
My first guess would have been inadvertant "autoboxing", but that's clearly not an issue in your example code.
This link might give a clue:
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Long.html
valueOf
public static Long valueOf(long l)
Returns a Long instance representing the specified long value. If a new Long instance is not required, this method should generally be
used in preference to the constructor Long(long), as this method is
likely to yield significantly better space and time performance by
caching frequently requested values.
Parameters:
l - a long value.
Returns:
a Long instance representing l.
Since:
1.5
But yes, I would expect using a wrapper (e.g. "Long") to take MORE time, and MORE space. I would not expect using the wrapper to be three times FASTER!
================================================================================
ADDENDUM:
I got these results with your code:
Base time 6878ms
Using longs 10515ms
Using Longs 428022ms
I'm running JDK 1.6.0_16 on a pokey 32-bit, single-core CPU.
OK - here's a slightly different version, along with my results (running JDK 1.6.0_16 pokey 32-bit single-code CPU):
import java.util.*;
/*
Test Base longs Longs/new Longs/valueOf
---- ---- ----- --------- -------------
0 343 896 3431 6025
1 342 957 3401 5796
2 342 881 3379 5742
*/
public class LongTest {
private static int limit = 100000000;
private static int ntimes = 3;
private static final long[] base = new long[ntimes];
private static final long[] primitives = new long[ntimes];
private static final long[] wrappers1 = new long[ntimes];
private static final long[] wrappers2 = new long[ntimes];
private static void test_base (int idx) {
long start = System.currentTimeMillis();
for (int i = 0; i < limit; ++i){}
base[idx] = System.currentTimeMillis() - start;
}
private static void test_primitive (int idx) {
long value = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < limit; ++i){
value = value + i;
}
primitives[idx] = System.currentTimeMillis() - start;
}
private static void test_wrappers1 (int idx) {
long value = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < limit; ++i){
value = value + new Long(i);
}
wrappers1[idx] = System.currentTimeMillis() - start;
}
private static void test_wrappers2 (int idx) {
long value = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < limit; ++i){
value = value + Long.valueOf(i);
}
wrappers2[idx] = System.currentTimeMillis() - start;
}
public static void main(String[] args) {
for (int i=0; i < ntimes; i++) {
test_base (i);
test_primitive(i);
test_wrappers1 (i);
test_wrappers2 (i);
}
System.out.println ("Test Base longs Longs/new Longs/valueOf");
System.out.println ("---- ---- ----- --------- -------------");
for (int i=0; i < ntimes; i++) {
System.out.printf (" %2d %6d %6d %6d %6d\n",
i, base[i], primitives[i], wrappers1[i], wrappers2[i]);
}
}
}
=======================================================================
5.28.2012:
Here are some additional timings, from a faster (but still modest), dual-core CPU running Windows 7/64 and running the same JDK revision 1.6.0_16:
/*
PC 1: limit = 100,000,000, ntimes = 3, JDK 1.6.0_16 (32-bit):
Test Base longs Longs/new Longs/valueOf
---- ---- ----- --------- -------------
0 343 896 3431 6025
1 342 957 3401 5796
2 342 881 3379 5742
PC 2: limit = 1,000,000,000, ntimes = 5,JDK 1.6.0_16 (64-bit):
Test Base longs Longs/new Longs/valueOf
---- ---- ----- --------- -------------
0 3 2 5627 5573
1 0 0 5494 5537
2 0 0 5475 5530
3 0 0 5477 5505
4 0 0 5487 5508
PC 2: "for loop" counters => long; limit = 10,000,000,000, ntimes = 5:
Test Base longs Longs/new Longs/valueOf
---- ---- ----- --------- -------------
0 6278 6302 53713 54064
1 6273 6286 53547 53999
2 6273 6294 53606 53986
3 6274 6325 53593 53938
4 6274 6279 53566 53974
*/
You'll notice:
I'm not using StringBuilder, and I separate out all of the I/O until the end of the program.
"long" primtive is consistently equivalent to a "no-op"
"Long" wrappers are consistently much, much slower
"new Long()" is slightly faster than "Long.valueOf()"
Changing the loop counters from "int" to "long" makes the first two columns ("base" and "longs" much slower.
"JIT warmup" is negligible after the the first few iterations...
... provided I/O (like System.out) and potentially memory-intensive activities (like StringBuilder) are moved outside of the actual test sections.

Categories