Threads in Java - Sum of N numbers - java

I tried to perform sum of N numbers using conventional method and also using threads to see the performance of threads. I see that the conventional method runs faster than the thread based.
My plan is to break down the upper limit(N) into ranges then run a thread for each range and finally add the sum calculated from each thread.
stats in milliseconds :
248
500000000500000000
-----same with threads------
498
500000000500000000
Here I see the approach using threads took ~500 milliseconds and conventional method took only ~250 seconds.
I wanted to know If I am correctly implementing threads for this problem.
Thanks
code :
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
class MyThread implements Runnable {
private int from , to , sum;
public MyThread(long from , long to) {
this.from = from;
this.to = to;
sum = 0;
}
public void run() {
for(long i=from;i<=to;i++) {
sum+=i;
}
}
public long getSum() {
return this.sum;
}
}
public class exercise {
public static void main(String args[]) {
long startTime = System.currentTimeMillis();
long sum = 0;
for(long i=1;i<=1000000000;i++) {
sum+=i;
}
long endTime = System.currentTimeMillis();
long duration = (endTime - startTime); //Total execution time in milli seconds
System.out.println(duration);
System.out.println(sum);
System.out.println("-----same with threads------");
ExecutorService executor = Executors.newFixedThreadPool(5);
MyThread one = new MyThread(1, 100000);
MyThread two = new MyThread(100001, 10000000);
MyThread three = new MyThread(10000001, 1000000000);
startTime = System.currentTimeMillis();
executor.execute(one);
executor.execute(two);
executor.execute(three);
executor.shutdown();
// Wait until all threads are finish
while (!executor.isTerminated()) {
}
endTime = System.currentTimeMillis();
System.out.println(endTime - startTime);
long thsum = one.getSum() + two.getSum() + three.getSum();
System.out.println(thsum);
}
}

It only makes sense to split the work into multiple threads when each thread is assigned the same amount of work.
In your case, the first thread does almost nothing, the second thread does almost 1% of the work, and the third thread does 99% of the work.
Therefore, you pay the overhead for running multiple threads without benefiting from the parallel execution.
Splitting the work evenly, as follows, should yield better results:
MyThread one = new MyThread(1, 333333333);
MyThread two = new MyThread(333333334, 666666667);
MyThread three = new MyThread(666666668, 1000000000);

The multithread part of your example includes the time for thread creation. Thread creation is an expensive operation and I presume that it is responsible for a large share of the difference between the single thread and multithread approaches.
Your question was if you are correctly implementing the threads. Did you mean implementing the runnable tasks? If so, I wonder why you have distributed the number ranges so unevenly. The task three seems to be far bigger than the others and as a result the performance will be close to a single thread version however you choose to set up the threads.

Related

Java MultiThreading not stopping

I have the following code for a kind of 'stopwatch' that extends the Thread class:
package StopWatch;
//Code taken from:
//https://stackoverflow.com/questions/9526041/how-to-program-for-a-stopwatch
public class Stopwatch extends Thread {
private long startTime;
private boolean started;
public void startTimer() {
this.startTime = System.currentTimeMillis();
this.started = true;
this.start();
}
public void run() {
while(started){/*currentTimeMillis increases on its own */}
System.out.println("timer stopped");
}
public int[] getTime() {
long milliTime = System.currentTimeMillis() - this.startTime;
int[] time = new int[]{0,0,0,0};
time[0] = (int)(milliTime / 3600000); //gives number of hours elapsed
time[1] = (int)(milliTime / 60000) % 60; //gives number of remaining minutes elapsed
time[2] = (int)(milliTime / 1000) % 60; //gives number of remaining seconds elapsed
time[3] = (int)(milliTime); //gives number of remaining milliseconds elapsed
return time;
}
public void stopTimer() {
this.started = false;
}
}
and I'm testing it in the following driver class:
import StopWatch.Stopwatch;
public class StopWatchTest {
public static void main(String[] args) {
Stopwatch stopwatch = new Stopwatch();
stopwatch.startTimer();
int sum = 0;
for (long i = 0; i < 100000; i++) {
sum++;
}
int[] time = stopwatch.getTime();
for (int i = 0; i < 4; i++) {
if (i < 3) {
System.out.print(time[i]+":");
} else {
System.out.print(time[i]);
}
}
stopwatch.stopTimer();
}
}
My intent is to use instances of class Stopwatch to measure the performance of various blocks of code (The for-loop in the driver class for instance) by having these Stopwatch objects in a main thread start a timer in separate thread before executing the blocks of code I want to evaluate, then have them (the Stopwatch objects) stop their timer once execution of said blocks in the main thread have finished. I understand that there are much simpler and easier ways to do this but I wanted to try doing it this way as sort of a "proof of concept" and to simply get better with multi-threading, but I'm encountering some problems:
1) When I run the driver class StopWatchTest I get seemingly random and arbitrary output each time (but mostly 0:0:0:0)
2) The main thread (or possibly the Stopwatch thread, I'm not even sure anymore) seems to never stop executing after I get outputs like 0:0:0:0
3) When I try debugging with breakpoints and the like I get completely unexpected behavior depending on where I put the breakpoints (The main thread does sometime finish execution but with random outputs like 0:0:13:2112 and other times I just get stuck in the Stopwatch thread)
Point 3 doesn't concern me as much as 1 and 2 as I have limited knowledge of how multi-threading behaves when one or several of the threads are paused at breakpoints for debugging (I suspect that when I break in the main thread the Stopwatch thread continues running). Points 1 and 2 bother me much more as I cannot see why they would be occurring.
To get you started, you should flag the boolean started as volatile:
private volatile boolean started;
That should work, but it would make a busy loop, which is very bad for your CPU usage.
You should look to wait()/notify() methods next.

Java factorial calculation with thread pool

I achieved to calculate factorial with two threads without the pool. I have two factorial classes which are named Factorial1, Factorial2 and extends Thread class. Let's consider I want to calculate the value of !160000. In Factorial1's run() method I do the multiplication in a for loop from i=2 to i=80000 and in Factorial2's from i=80001 to 160000. After that, i return both values and multiply them in the main method. When I compare the execution time it's much better (which is 5000 milliseconds) than the non-thread calculation's time (15000 milliseconds) even with two threads.
Now I want to write clean and better code because I saw the efficiency of threads at factorial calculation but when I use a thread pool to calculate the factorial value, the parallel calculation always takes more time than the non-thread calculation (nearly 16000). My code pieces look like:
for(int i=2; i<= Calculate; i++)
{
myPool.execute(new Multiplication(result, i));
}
run() method which is in Multiplication class:
public void run()
{
s1.Mltply(s2); // s1 and s2 are instances of my Number class
// their fields holds BigInteger values
}
Mltply() method which is in Number class:
public void Multiply(int number)
{
area.lock(); // result is going wrong without lock
Number temp = new Number(number);
value = value.multiply(temp.value); // value is a BigInteger
area.unlock();
}
In my opinion this lock may kills the all advantage of the thread usage because it seems like all that threads do is multiplication but nothing else. But without it, i can't even calculate the true result. Let's say i want to calculate !10, so thread1 calculates the 10*9*8*7*6 and thread2 calculate the 5*4*3*2*1. Is that the way I'm looking for? Is it even possible with thread pool? Of course execution time must be less than the normal calculation...
I appreciate all your help and suggestion.
EDIT: - My own solution to the problem -
public class MyMultiplication implements Runnable
{
public static BigInteger subResult1;
public static BigInteger subResult2;
int thread1StopsAt;
int thread2StopsAt;
long threadId;
static boolean idIsSet=false;
public MyMultiplication(BigInteger n1, int n2) // First Thread
{
MyMultiplication.subResult1 = n1;
this.thread1StopsAt = n2/2;
thread2StopsAt = n2;
}
public MyMultiplication(int n2,BigInteger n1) // Second Thread
{
MyMultiplication.subResult2 = n1;
this.thread2StopsAt = n2;
thread1StopsAt = n2/2;
}
#Override
public void run()
{
if(idIsSet==false)
{
threadId = Thread.currentThread().getId();
idIsSet=true;
}
if(Thread.currentThread().getId() == threadId)
{
for(int i=2; i<=thread1StopsAt; i++)
{
subResult1 = subResult1.multiply(BigInteger.valueOf(i));
}
}
else
{
for(int i=thread1StopsAt+1; i<= thread2StopsAt; i++)
{
subResult2 = subResult2.multiply(BigInteger.valueOf(i));
}
}
}
}
public class JavaApplication3
{
public static void main(String[] args) throws InterruptedException
{
int calculate=160000;
long start = System.nanoTime();
BigInteger num = BigInteger.valueOf(1);
for (int i = 2; i <= calculate; i++)
{
num = num.multiply(BigInteger.valueOf(i));
}
long end = System.nanoTime();
double time = (end-start)/1000000.0;
System.out.println("Without threads: \t" +
String.format("%.2f",time) + " miliseconds");
System.out.println("without threads Result: " + num);
BigInteger num1 = BigInteger.valueOf(1);
BigInteger num2 = BigInteger.valueOf(1);
ExecutorService myPool = Executors.newFixedThreadPool(2);
start = System.nanoTime();
myPool.execute(new MyMultiplication(num1,calculate));
Thread.sleep(100);
myPool.execute(new MyMultiplication(calculate,num2));
myPool.shutdown();
while(!myPool.isTerminated()) {} // waiting threads to end
end = System.nanoTime();
time = (end-start)/1000000.0;
System.out.println("With threads: \t" +String.format("%.2f",time)
+ " miliseconds");
BigInteger result =
MyMultiplication.subResult1.
multiply(MyMultiplication.subResult2);
System.out.println("With threads Result: " + result);
System.out.println(MyMultiplication.subResult1);
System.out.println(MyMultiplication.subResult2);
}
}
input : !160000
Execution time without threads : 15000 milliseconds
Execution time with 2 threads : 4500 milliseconds
Thanks for ideas and suggestions.
You may calculate !160000 concurrently without using a lock by splitting 160000 into disjunct junks as you explaint by splitting it into 2..80000 and 80001..160000.
But you may achieve this by using the Java Stream API:
IntStream.rangeClosed(1, 160000).parallel()
.mapToObj(val -> BigInteger.valueOf(val))
.reduce(BigInteger.ONE, BigInteger::multiply);
It does exactly what you try to do. It splits the whole range into junks, establishes a thread pool and computes the partial results. Afterwards it joins the partial results into a single result.
So why do you bother doing it by yourself? Just practicing clean coding?
On my real 4 core machine computation in a for loop took 8 times longer than using a parallel stream.
Threads have to run independent to run fast. Many dependencies like locks, synchronized parts of your code or some system calls leads to sleeping threads which are waiting to access some resources.
In your case you should minimize the time a thread is inside the lock. Maybe I am wrong, but it seems like you create a thread for each number. So for 1.000! you spawn 1.000 Threads. All of them trying to get the lock on area and are not able to calculate anything, because one thread has become the lock and all other threads have to wait until the lock is unlocked again. So the threads are only running in serial which is as fast as your non-threaded example plus the extra time for locking and unlocking, thread management and so on. Oh, and because of cpu's context switching it gets even worse.
Your first attempt to splitt the factorial in two threads is the better one. Each thread can calculate its own result and only when they are done the threads have to communicate with each other. So they are independent most of the time.
Now you have to generalize this solution. To reduce context switching of the cpu you only want as many threads as your cpu has cores (maybe a little bit less because of your OS). Every thread gets a rang of numbers and calculates their product. After this it locks the overall result and adds its own result to it.
This should improve the performance of your problem.
Update: You ask for additional advice:
You said you have two classes Factorial1 and Factorial2. Probably they have their ranges hard codes. You only need one class which takes the range as constructor arguments. This class implements Runnable so it has a run-Method which multiplies all values in that range.
In you main-method you can do something like that:
int n = 160_000;
int threads = 2;
ExecutorService executor = Executors.newFixedThreadPool(threads);
for (int i = 0; i < threads; i++) {
int start = i * (n/threads) + 1;
int end = (i + 1) * (n/threads) + 1;
executor.execute(new Factorial(start, end));
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.DAYS);
Now you have calculated the result of each thread but not the overall result. This can be solved by a BigInteger which is visible to the Factorial-class (like a static BigInteger reuslt; in the same main class.) and a lock, too. In the run-method of Factorial you can calculate the overall result by locking the lock and calculation the result:
Main.lock.lock();
Main.result = Main.result.multiply(value);
Main.lock.unlock();
Some additional advice for the future: This isn't really clean because Factorial needs to have information about your main class, so it has a dependency to it. But ExecutorService returns a Future<T>-Object which can be used to receive the result of the thread. Using this Future-Object you don't need to use locks. But this needs some extra work, so just try to get this running for now ;-)
In addition to my Java Stream API solution here another solution which uses a self-managed thread-pool as you demanded:
public static final int CHUNK_SIZE = 10000;
public static BigInteger fac(int max) {
ExecutorService executor = newCachedThreadPool();
try {
return rangeClosed(0, (max - 1) / CHUNK_SIZE)
.mapToObj(val -> executor.submit(() -> prod(leftBound(val), rightBound(val, max))))
.map(future -> valueOf(future))
.reduce(BigInteger.ONE, BigInteger::multiply);
} finally {
executor.shutdown();
}
}
private static int leftBound(int chunkNo) {
return chunkNo * CHUNK_SIZE + 1;
}
private static int rightBound(int chunkNo, int max) {
return Math.min((chunkNo + 1) * CHUNK_SIZE, max);
}
private static BigInteger valueOf(Future<BigInteger> future) {
try {
return future.get();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private static BigInteger prod(int min, int max) {
BigInteger res = BigInteger.valueOf(min);
for (int val = min + 1; val <= max; val++) {
res = res.multiply(BigInteger.valueOf(val));
}
return res;
}

java - Simple calculation takes longer in multi threads than in single thread

I'm trying to understand how to take advantage of using multi threads. I wrote a simple program that increments the value of i, let's say, 400,000 times using two ways : a single threaded way (0 to 400,000) and a multiple threaded way (in my case, 4 times : 0 to 100,000) with the number of thread equal to Runtime.getRuntime().availableProcessors().
I'm surprised with the results I measured : the single threaded way is decidedly faster, sometimes 3 times faster. Here is my code :
public class Main {
public static int LOOPS = 100000;
private static ExecutorService executor=null;
public static void main(String[] args) throws InterruptedException, ExecutionException {
int procNb = Runtime.getRuntime().availableProcessors();
long startTime;
long endTime;
executor = Executors.newFixedThreadPool(procNb);
ArrayList<Calculation> c = new ArrayList<Calculation>();
for (int i=0;i<procNb;i++){
c.add(new Calculation());
}
// Make parallel computations (4 in my case)
startTime = System.currentTimeMillis();
queryAll(c);
endTime = System.currentTimeMillis();
System.out.println("Computation time using " + procNb + " threads : " + (endTime - startTime) + "ms");
startTime = System.currentTimeMillis();
for (int i =0;i<procNb*LOOPS;i++)
{
}
endTime = System.currentTimeMillis();
System.out.println("Computation time using main thread : " + (endTime - startTime) + "ms");
}
public static List<Integer> queryAll(List<Calculation> queries) throws InterruptedException, ExecutionException {
List<Future<Integer>> futures = executor.invokeAll(queries);
List<Integer> aggregatedResults = new ArrayList<Integer>();
for (Future<Integer> future : futures) {
aggregatedResults.add(future.get());
}
return aggregatedResults;
}
}
class Calculation implements Callable<Integer> {
#Override
public Integer call() {
int i;
for (i=0;i<Main.LOOPS;i++){
}
return i;
}
}
Console :
Computation time using 4 threads : 10ms.
Computation time using main thread : 3ms.
Could anyone explain this ?
An addition probably takes one cpu cycle, so if your cpu runs at 3GHz, that's 0.3 nanoseconds. Do it 400k times and that becomes 120k nanoseconds or 0.1 milliseconds. So your measurement is more affected by the overhead of starting threads, thread switching, JIT compilation etc. than by the operation you are trying to measure.
You also need to account for the compiler optimisations: if you place your empty loop in a method and run that method many times you will notice that it runs in 0 ms after some time,. because the compiler determines that the loop does nothing and optimises it away completely.
I suggest you use a specialised library for micro benchmarking, such as jmh.
See also: How do I write a correct micro-benchmark in Java?

How to run a Thread for a user specified amount of time?

Am creating a program that is based on mixing and making perturbation in a population containing solutions Vector.
So I created a for loop that stops after a certain time given by the user.
Inside the loop, am going to call 5 procedures and I thought that if i put each procedure in a thread will make the program making more solutions in a same time than calling normal methods.
Here 5 created the 5 threads, but when i start them the don't want to stop even if i use the Thread.stop, Thread.suspend, Thread.interrupt or Thread.destroy
Here is my code and could u help me with your ideas ?
I have inserted a new variable :
public volatile boolean CrossOpb = true;`
Here is my code:
Thread CrossOp = new Thread(new Runnable() {
public void run() {
while(CrossOpb == true){
int rdmCross2=(int) (Math.random() * allPopulation.size()) ; // Crossover 1st vector
int rdmCross1=(int) (Math.random() * allPopulation.size()) ;
Vector muted = new Vector();
Vector copy = copi((Vector) allPopulation.get(rdmCross2));
Vector callp = copi((Vector) allPopulation.get(rdmCross1));
muted = crossover(callp, copy);
System.out.println("cross over Between two Randoms ----------->");
affiche_resultat(muted);
allPopulation.add(muted);
}
}
});
The loop :
CrossOp.setDaemon(true);
int loop = 1;
long StartTime = System.currentTimeMillis() / 1000;
for (int i = 0; i < loop; ++i) {
loop++;
if (timevalue < ((System.currentTimeMillis() / 1000) - StartTime)) {
loop = 0;
CrossOpb = false;
}
CrossOp.start();
}
I already answered to a similar question. In that case, it was C#, but the concept is the same.
You must not kill threads. Threads must exit on their own will.
Just put a volatile boolean variable somewhere, and set it to true/false, when you want your thread to terminate, then, in the thread, replace the while (true) with a while (myVariable == true/false).
Anyway, you say:
Inside the loop, am going to call 5 procedures ant i thought that if i put each procedure in a thread will make the program making more solutions in a same time than calling normal methods.
Well, that's generally false. If the procedures are data-dependent (each of them depends on the results of the previous one), putting them on threads will change nothing. It might be smarter to put iterations in a pipeline, so that you have 5 threads executing steps of successive iterations. I'm not sure if that's possible for genetic algorithms, and anyway you'll have to handle some special case (e.g. a mutation, that alters the population of partially computed iterations).
How to run a Thread for a specific amount of time:
Here is the basic approach is to keep calculate how long the Thread has run and exit and return the result, which in our case here is details on how long the Thread executed.
NOTE: you must use System.nanoTime() as System.currentTimeMillis() will just return the same thing every time you call it in the method.
I use a Random number to calculate different lifetimes for each of the Callables so that you can see that they don't execute exactly for the time specified but they are very very close, and the variance of the delta is pretty consistent, at least on my machine.
Here a Gist of the code below for easier access.
package com.stackoverflow.Q18818482;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Random;
import java.util.concurrent.*;
public class Question18818482
{
public static Random RND;
static
{
RND = new Random();
}
public static void main(final String[] args)
{
try
{
final ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
final List<Future<String>> results = new ArrayList<>(10);
for (int i = 0; i < 10; i++)
{
results.add(es.submit(new TimeSliceTask(RND.nextInt(10), TimeUnit.SECONDS)));
}
es.shutdown();
while(!results.isEmpty())
{
final Iterator<Future<String>> i = results.iterator();
while (i.hasNext())
{
final Future<String> f = i.next();
if (f.isDone())
{
System.out.println(f.get());
i.remove();
}
}
}
}
catch (InterruptedException e)
{
throw new RuntimeException(e);
}
catch (ExecutionException e)
{
throw new RuntimeException(e);
}
}
public static class TimeSliceTask implements Callable<String>
{
private final long timeToLive;
private final long duration;
public TimeSliceTask(final long timeToLive, final TimeUnit timeUnit)
{
this.timeToLive = System.nanoTime() + timeUnit.toNanos(timeToLive);
this.duration = timeUnit.toMillis(timeToLive);
}
#Override
public String call() throws Exception
{
while( timeToLive <= System.nanoTime() )
{
// simulate work here
Thread.sleep(500);
}
final long end = System.nanoTime();
return String.format("Finished Elapsed Time = %d, scheduled for %d", TimeUnit.NANOSECONDS.toMillis(timeToLive - end), this.duration );
}
}
}
Here is what one runs output looks like
NOTE: All times are in milliseconds
Finished Elapsed Time = 999, scheduled for 1000
Finished Elapsed Time = 2998, scheduled for 3000
Finished Elapsed Time = 5999, scheduled for 6000
Finished Elapsed Time = 1994, scheduled for 2000
Finished Elapsed Time = 8994, scheduled for 9000
Finished Elapsed Time = 6993, scheduled for 7000
Finished Elapsed Time = 6993, scheduled for 7000
Finished Elapsed Time = 5993, scheduled for 6000
Finished Elapsed Time = 5998, scheduled for 6000
After reading the whole last night about threads, i have discovered that the solution for my problem was not that hard.
The idea was to edit the condition of the stopping loop inside the thread so we control it by giving it a specific amount of time to run for it and here is my Example :
class ProcessorCordm extends Thread {
int runningtime;
public ProcessorCordm(int runningtime) {
this.runningtime = runningtime;
}
public void run() {
int loop = 1;
long StartTime = System.currentTimeMillis() / 1000;
for (int i = 0; i < loop; ++i) {
int rdmCross2 = (int) (Math.random() * allPopulation.size()); // Crossover 1st vector
int rdmCross1 = (int) (Math.random() * allPopulation.size());
Vector muted = new Vector();
Vector copy = copi((Vector) allPopulation.get(rdmCross2));
Vector callp = copi((Vector) allPopulation.get(rdmCross1));
muted = crossover(callp, copy);
System.out.println("cross over Between two Randoms ----------->");
affiche_resultat(muted);
addsolution(muted);
loop++;
if (timevalue < ((System.currentTimeMillis() / 1000) - StartTime)) {
loop = 0;
}
}
}
}
So if i want to run my Thread for 10 seconds i only need to :
ProcessorCoG CrossOpg = new ProcessorCoG(10);
And fo my case, I have to call many Threads simultaneously working for a specific TimeValue so i used the ExecutorServiceClass :
ProcessorCoG CrossOpg = new ProcessorCoG(timevalue);//extends Thread class
ProcessorCordm CrossOp = new ProcessorCordm(timevalue);//extends Thread class
ProcessorCordm CrossOp2 = new ProcessorCordm(timevalue);//extends Thread class
MutateGb MutGb = new MutateGb(timevalue);//extends Thread class
MutateRdm MutRdm = new MutateRdm(timevalue);//extends Thread class
MbsRdm MbsR = new MbsRdm(timevalue);//extends Thread class
ExecutorService executor = Executors.newFixedThreadPool(6);
executor.submit(MutGb);
executor.submit(MutRdm);
executor.submit(CrossOp);
executor.submit(CrossOp2);
executor.submit(CrossOpg);
executor.submit(MbsR);

Using parallelism in Java makes program slower (four times slower!!!)

I'm writing conjugate-gradient method realization.
I use Java multi threading for matrix back-substitution.
Synchronization is made using CyclicBarrier, CountDownLatch.
Why it takes so much time to synchronize threads?
Are there other ways to do it?
code snippet
private void syncThreads() {
// barrier.await();
try {
barrier.await();
} catch (InterruptedException e) {
} catch (BrokenBarrierException e) {
}
}
You need to ensure that each thread spends more time doing useful work than it costs in overhead to pass a task to another thread.
Here is an example of where the overhead of passing a task to another thread far outweighs the benefits of using multiple threads.
final double[] results = new double[10*1000*1000];
{
long start = System.nanoTime();
// using a plain loop.
for(int i=0;i<results.length;i++) {
results[i] = (double) i * i;
}
long time = System.nanoTime() - start;
System.out.printf("With one thread it took %.1f ns per square%n", (double) time / results.length);
}
{
ExecutorService ex = Executors.newFixedThreadPool(4);
long start = System.nanoTime();
// using a plain loop.
for(int i=0;i<results.length;i++) {
final int i2 = i;
ex.execute(new Runnable() {
#Override
public void run() {
results[i2] = i2 * i2;
}
});
}
ex.shutdown();
ex.awaitTermination(1, TimeUnit.MINUTES);
long time = System.nanoTime() - start;
System.out.printf("With four threads it took %.1f ns per square%n", (double) time / results.length);
}
prints
With one thread it took 1.4 ns per square
With four threads it took 715.6 ns per square
Using multiple threads is much worse.
However, increase the amount of work each thread does and
final double[] results = new double[10 * 1000 * 1000];
{
long start = System.nanoTime();
// using a plain loop.
for (int i = 0; i < results.length; i++) {
results[i] = Math.pow(i, 1.5);
}
long time = System.nanoTime() - start;
System.out.printf("With one thread it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
{
int threads = 4;
ExecutorService ex = Executors.newFixedThreadPool(threads);
long start = System.nanoTime();
int blockSize = results.length / threads;
// using a plain loop.
for (int i = 0; i < threads; i++) {
final int istart = i * blockSize;
final int iend = (i + 1) * blockSize;
ex.execute(new Runnable() {
#Override
public void run() {
for (int i = istart; i < iend; i++)
results[i] = Math.pow(i, 1.5);
}
});
}
ex.shutdown();
ex.awaitTermination(1, TimeUnit.MINUTES);
long time = System.nanoTime() - start;
System.out.printf("With four threads it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
prints
With one thread it took 287.6 ns per pow 1.5
With four threads it took 77.3 ns per pow 1.5
That's an almost 4x improvement.
How many threads are being used in total? That is likely the source of your problem. Using multiple threads will only really give a performance boost if:
Each task in the thread does some sort of blocking. For example, waiting on I/O. Using multiple threads in this case enables that blocking time to be used by other threads.
or You have multiple cores. If you have 4 cores or 4 CPUs, you can do 4 tasks simultaneously (or 4 threads).
It sounds like you are not blocking in the threads so my guess is you are using too many threads. If you are for example using 10 different threads to do the work at the same time but only have 2 cores, that would likely be much slower than running all of the tasks in sequence. Generally start the number of threads equal to your number of cores/CPUs. Increase the threads used slowly gaging the performance each time. This will give you the optimal thread count to use.
Perhaps you could try to implement to re-implement your code using fork/join from JDK 7 and see what it does?
The default creates a thread-pool with exactly the same amount of threads as you have cores in your system. If you choose the threshold for dividing your work into smaller chunks reasonably this will probably execute much more efficient.
You are most likely aware of this, but in case you aren't, please read up on Amdahl's Law. It gives the relationship between expected speedup of a program by using parallelism and the sequential segments of the program.
synchronizing across cores is much slower than on a single cored environment see if you can limit the jvm to 1 core (see this blog post)
or you can use a ExecuterorService and use invokeAll to run the parallel tasks

Categories