I have the code like the below. In a loop it is executing the method "process". It is running sequentially. I want to run this method parallel, but it should be finished within the loop so that I can sum in the next line. i.e even it is running parallel all functions should finish before the 2nd for loop execute.
How to solve this in Jdk1.7 not JDK1.8 version?
public static void main(String s[]){
int arrlen = 10;
int arr[] = new int[arrlen] ;
int t =0;
for(int i=0;i<arrlen;i++){
arr[i] = i;
t = process(arr[i]);
arr[i] = t;
}
int sum =0;
for(int i=0;i<arrlen;i++){
sum += arr[i];
}
System.out.println(sum);
}
public static int process(int arr){
return arr*2;
}
Below example might help you. I have used fork/join framework to do that.
For small array size like your example, conventional method might be faster and I doubt that fork/join way would take slight higher time. But for larger size or process , fork/join framework is suitable. Even java 8 parallel streams uses fork/join framework as underlying base.
public class ForkMultiplier extends RecursiveAction {
int[] array;
int threshold = 3;
int start;
int end;
public ForkMultiplier(int[] array,int start, int end) {
this.array = array;
this.start = start;
this.end = end;
}
protected void compute() {
if (end - start < threshold) {
computeDirectly();
} else {
int middle = (end + start) / 2;
ForkMultiplier f1= new ForkMultiplier(array, start, middle);
ForkMultiplier f2= new ForkMultiplier(array, middle, end);
invokeAll(f1, f2);
}
}
protected void computeDirectly() {
for (int i = start; i < end; i++) {
array[i] = array[i] * 2;
}
}
}
You main class would like this below
public static void main(String s[]){
int arrlen = 10;
int arr[] = new int[arrlen] ;
for(int i=0;i<arrlen;i++){
arr[i] = i;
}
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new ForkMultiplier(arr, 0, arr.length));
int sum =0;
for(int i=0;i<arrlen;i++){
sum += arr[i];
}
System.out.println(sum);
}
You basically need to use Executors and Futures combined that exist since Java 1.5 (see Java Documentation).
In the following example, I've created a main class that uses another helper class that acts like the processor you want to parallelize.
The main class is splitted in 3 steps:
Creates the processes pool and executes tasks in parallel.
Waits for all tasks to finish their work.
Collects the results from tasks.
For didactic reasons, I've put some logs and more important, I've put a random waiting time in each process' business logic, simulating a time-consuming algorithm ran by the Process class.
The maximum waiting time for each process is 2 seconds, which is also the highest waiting time for step 2, even if you increase the number of parallel tasks (just try changing the variable totalTasks of the following code to test it).
Here the Main class:
package com.example;
import java.util.ArrayList;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class Main
{
public static void main(String[] args) throws InterruptedException, ExecutionException
{
int totalTasks = 100;
ExecutorService newFixedThreadPool = Executors.newFixedThreadPool(totalTasks);
System.out.println("Step 1 - Starting parallel tasks");
ArrayList<Future<Integer>> tasks = new ArrayList<Future<Integer>>();
for (int i = 0; i < totalTasks; i++) {
tasks.add(newFixedThreadPool.submit(new Process(i)));
}
long ts = System.currentTimeMillis();
System.out.println("Step 2 - Wait for processes to finish...");
boolean tasksCompleted;
do {
tasksCompleted = true;
for (Future<Integer> task : tasks) {
if (!task.isDone()) {
tasksCompleted = false;
Thread.sleep(10);
break;
}
}
} while (!tasksCompleted);
System.out.println(String.format("Step 2 - End in '%.3f' seconds", (System.currentTimeMillis() - ts) / 1000.0));
System.out.println("Step 3 - All processes finished to run, let's collect results...");
Integer sum = 0;
for (Future<Integer> task : tasks) {
sum += task.get();
}
System.out.println(String.format("Total final sum is: %d", sum));
}
}
Here the Process class:
package com.example;
import java.util.concurrent.Callable;
public class Process implements Callable<Integer>
{
private Integer value;
public Process(Integer value)
{
this.value = value;
}
public Integer call() throws Exception
{
Long sleepTime = (long)(Math.random() * 2000);
System.out.println(String.format("Starting process with value %d, sleep time %d", this.value, sleepTime));
Thread.sleep(sleepTime);
System.out.println(String.format("Stopping process with value %d", this.value));
return value * 2;
}
}
Hope this helps.
Related
My program is trying to sum a range with a given number of threads in order to run it in parallel but it seems that with just one threads it runs better than 4 (I have an 8 core CPU). It is my first time working with multithreading in Java so maybe I have a problem in my code that makes it take longer?
My benchmarks(sum of range 0-10000) done for the moment are:
1 thread: 1350 microsecs (average)
2 thread: 1800 microsecs (average)
4 thread: 2400 microsecs (average)
8 thread: 3300 microsecs (average)
Thanks in advance!
/*
Compile: javac RangeSum.java
Execute: java RangeSum nThreads initRange finRange
*/
import java.util.ArrayList;
import java.util.concurrent.*;
public class RangeSum implements Runnable {
private int init;
private int end;
private int id;
static public int out = 0;
Object lock = new Object();
public synchronized static void increment(int partial) {
out = out + partial;
}
public RangeSum(int init,int end) {
this.init = init;
this.end = end;
}//parameters to pass in threads
// the function called for each thread
public void run() {
int partial = 0;
for(int k = this.init; k < this.end; k++)
{
partial = k + partial + 1;
}
increment(partial);
}//thread: sum its id to the out variable
public static void main(String args[]) throws InterruptedException {
final long startTime = System.nanoTime()/1000;//start time: microsecs
//get command line values for
int NumberOfThreads = Integer.valueOf(args[0]);
int initRange = Integer.valueOf(args[1]);
int finRange = Integer.valueOf(args[2]);
//int[] out = new int[NumberOfThreads];
// an array of threads
ArrayList<Thread> Threads = new ArrayList<Thread>(NumberOfThreads);
// spawn the threads / CREATE
for (int i = 0; i < NumberOfThreads; i++) {
int initial = i*finRange/NumberOfThreads;
int end = (i+1)*finRange/NumberOfThreads;
Threads.add(i, new Thread(new RangeSum(initial,end)));
Threads.get(i).start();
}
// wait for the threads to finish / JOIN
for (int i = 0; i < NumberOfThreads; i++) {
try {
Threads.get(i).join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out.println("All threads finished!");
System.out.println("Total range sum: " + out);
final long endTime = System.nanoTime()/1000;//end time
System.out.println("Time elapsed: "+(endTime - startTime));
}
}
Your workload entirely in memory-non-blocking computation - on a general principle, in this kind of scenario, a single thread will complete the work faster than multiple threads.
Multiple threads tend to interfere with the L1/L2 CPU caching and incur additional overhead for context
switching
Specifically, wrt to your code, you initialize final long startTime = System.nanoTime()/1000; too early and measure thread setup time as well as the actual time it takes them to complete. Its probably better to setup your Threads list first and then:
final long startTime =...
for (int i = 0; i < NumberOfThreads; i++) {
Thread.get(i).start()
}
but really, in this case, the expectations that multiple threads will improve processing time is not warranted.
I am trying to learn about the ForkJoinPool framework and came across the below example:
public class ArrayCounter extends RecursiveTask<Integer> {
int[] array;
int threshold = 100_000;
int start;
int end;
public ArrayCounter(int[] array, int start, int end) {
this.array = array;
this.start = start;
this.end = end;
}
protected Integer compute() {
if (end - start < threshold) {
return computeDirectly();
} else {
int middle = (end + start) / 2;
ArrayCounter subTask1 = new ArrayCounter(array, start, middle);
ArrayCounter subTask2 = new ArrayCounter(array, middle, end);
invokeAll(subTask1, subTask2);
return subTask1.join() + subTask2.join();
}
}
protected Integer computeDirectly() {
Integer count = 0;
for (int i = start; i < end; i++) {
if (array[i] % 2 == 0) {
count++;
}
}
return count;
}
}
Main :
public class ForkJoinRecursiveTaskTest
{
static final int SIZE = 10_000_000;
static int[] array = randomArray();
public static void main(String[] args) {
ArrayCounter mainTask = new ArrayCounter(array, 0, SIZE);
ForkJoinPool pool = new ForkJoinPool();
Integer evenNumberCount = pool.invoke(mainTask);
System.out.println("Number of even numbers: " + evenNumberCount);
}
static int[] randomArray() {
int[] array = new int[SIZE];
Random random = new Random();
for (int i = 0; i < SIZE; i++) {
array[i] = random.nextInt(100);
}
return array;
}
}
According to the Java Docs,invokeAll() submits the tasks to the pool and returns the results as well.Hence no need for a separate join(). can someone please explain why a separate join is needed in this case?
in your example, you are using RecursiveTask<Integer>
so you are expecting to return a value from compute() method.
let's look at invokAll(t1,t12) signature.
static void invokeAll(ForkJoinTask<?> t1, ForkJoinTask<?> t2)
so invokeAll() doesn't have return a value.
according to the documentation :
Forks the given tasks, returning when isDone holds for each task or an (unchecked) exception is encountered, in which case the exception is rethrown.
So:
return subTask1.join() + subTask2.join(); is the key for your example.
both tasks are merged after each complete the task passing the result recursivly to the next call of compute() method.
task.join()
Returns the result of the computation when it is done.
As per javadoc, join
Returns the result of the computation when it is done. This method
differs from get() in that abnormal completion results in
RuntimeException or Error, not ExecutionException, and that interrupts
of the calling thread do not cause the method to abruptly return by
throwing InterruptedException.
So, when task is done, join helps you to get the computed value, which you are adding later together.
return subTask1.join() + subTask2.join();
I am practicing concurrent java and wrote a concurrent mergesort. The Mergesort is working well if the number of elements are less than 10,000. However, more than that it seems to take forever, I believe that the some of the threads are stuck(deadlock?). Now I don't have any shared resource as I always pass and return new copy. What are some known ways to profile such code for e.g. which threads are stuck, how many threads have been executed?
Sharing the code for reference:-
package mergesort;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
// 18s
public class Main {
public static final ExecutorService ex = new ThreadPoolExecutor(100, 100, 5, TimeUnit.SECONDS,
new ArrayBlockingQueue<>(10000), new ThreadPoolExecutor.CallerRunsPolicy());
public static void main(String[] args) throws InterruptedException, ExecutionException {
int n = 1_000_000;
Future<int[]> T1 = ex.submit(new Callable<int[]>() {
#Override
public int[] call() throws Exception {
// TODO Auto-generated method stub
return mergesort(generate(n));
}
});
int[] ret = T1.get();
for (int i : ret) {
System.out.println(i);
}
System.out.println("done");
ex.shutdownNow();
}
public static int[] generate(int n) {
int[] nums = new int[n];
for (int i = 0; i < n; i++) {
nums[i] = (int) Math.floor(Math.random() * n);
}
return nums;
}
public static int[] mergesort(int[] nums) throws InterruptedException, ExecutionException {
final int[] B;
if (nums.length < 2) {
return nums;
}
final int[] A = new int[nums.length / 2];
if (nums.length % 2 == 0) {
B = new int[nums.length / 2];
} else {
B = new int[nums.length / 2 + 1];
}
for (int i = 0; i < nums.length; i++) {
if (i < nums.length / 2) {
A[i] = nums[i];
} else {
B[i - nums.length / 2] = nums[i];
}
}
Future<int[]> T2 = ex.submit(new Callable<int[]>() {
#Override
public int[] call() throws Exception {
// TODO Auto-generated method stub
return mergesort(B);
}
});
Future<int[]> T1 = ex.submit(new Callable<int[]>() {
#Override
public int[] call() throws Exception {
// TODO Auto-generated method stub
return mergesort(A);
}
});
Future<int[]> T3 = ex.submit(new Callable<int[]>() {
#Override
public int[] call() throws Exception {
return merge(T1.get(), T2.get());
}
});
return T3.get();
}
public static int[] merge(int[] A, int[] B) {
int[] ret = new int[A.length + B.length];
int i = 0;
int j = 0;
int k = 0;
while (i < A.length && j < B.length) {
if (A[i] < B[j]) {
ret[k] = A[i];
i++;
} else {
ret[k] = B[j];
j++;
}
k++;
}
while (j < B.length) {
ret[k] = B[j];
j++;
k++;
}
while (i < A.length) {
ret[k] = A[i];
i++;
k++;
}
return ret;
}
}
Edit:
So using the tools I could analyze memory dump, see the running threads, live objects etc. But what are the strategies people follow(things they look for) when trying to understand the stack trace of a concurrent process. I.e. where do I start looking? for e.g. in my example I saw that all my tasks are waiting on FutureTask, but that's it. Why FutureTask is not returning I have no idea. How can I move further?
Your problem is that you create Futures recursively, then a huge number of threads are necessary to compute intermediate results and the pool may not have sufficient number of available threads. Mind that most of your threads are blocked waiting for others to give their results, so when the pool is exhausted you have: threads waiting for new threads to be created (while it is impossible due to pool exhaustion).
If you use a cached thread pool, it will work:
ExecutorService ex = Executors.newCachedThreadPool();
as such a pool is expandable.
----- EDIT -----
I also recommend you to use the new Java 8 functional style:
Future<int[]> f2 = ex.submit(() -> mergesort(B));
Future<int[]> f1 = ex.submit(() -> mergesort(A));
return merge(f1.get(),f2.get());
Also note, that it is not useful to use a Future to compute the merge as you are synchronizing on it.
You can use some of the profiler available in the market like YourKit or AppDynamics . You can use trial version of both. or you can simply take the thread dump and analyze yourself but it will be time-consuming.
I preferred App-dynamics , i was having very big application and using the thread-dump and analyze it manually was very time-consuming.
Refer https://docs.appdynamics.com/display/PRO14S/Trace+MultiThreaded+Transactions+for+Java , on how to trace multi-threaded app using App-dynamics.
App-dynamics also shows that which piece of code is causing the thread-contention, how much time thread is running/blocking. Also it shows whether its a CPU which is getting bottleneck or thread is waiting on some shared resource etc.
Let me know if you need more info.
so for my programming class we have to do the following:
Fill an integer array with 5 million integers ranging from 0-9.
Then find the number of times each number (0-9) occurs and display this.
We have to measure the time it takes to count the occurences for both single threaded, and multi-threaded. Currently I average 9.3ms for single threaded, and 8.9 ms multithreaded with 8 threads on my 8 core cpu, why is this?
Currently for multithreading I have one array filled with numbers and am calculating lower and upper bounds for each thread to count occurences. here is my current attempt:
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
threads[i].join();
}
}
Could anyone shed some light? Cheers.
You are essentially doing all the work sequentially because each thread you create you immediately join it.
Move the threads[i].join() outside the main construction loop into it's own loop. While you're at it you should probably also start all of the threads outside of the loop as starting them while new threads are still being created is not a good idea because creating threads takes time.
class ThreadTester {
private final int threadCount;
private final int numberCount;
int[] numbers = new int[5_000_000];
AtomicIntegerArray occurences;
Thread[] threads;
AtomicLong milliseconds = new AtomicLong();
public ThreadTester(int threadCount, int numberCount) {
this.threadCount = threadCount;
this.numberCount = numberCount;
occurences = new AtomicIntegerArray(numberCount);
threads = new Thread[threadCount];
Random r = new Random();
for (int i = 0; i < numbers.length; i++) {
numbers[i] = r.nextInt(numberCount);
}
}
public void createThreads() throws InterruptedException {
final int divisionSize = numbers.length / threadCount;
for (int i = 0; i < threads.length; i++) {
final int lower = (i * divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
#Override
public void run() {
long start = System.nanoTime();
for (int i = lower; i <= upper; i++) {
occurences.addAndGet(numbers[i], 1);
}
long end = System.nanoTime();
milliseconds.addAndGet(end - start);
}
});
}
}
private void startThreads() {
for (Thread thread : threads) {
thread.start();
}
}
private void finishThreads() throws InterruptedException {
for (Thread thread : threads) {
thread.join();
}
}
public long test() throws InterruptedException {
createThreads();
startThreads();
finishThreads();
return milliseconds.get();
}
}
public void test() throws InterruptedException {
for (int threads = 1; threads < 50; threads++) {
ThreadTester tester = new ThreadTester(threads, 10);
System.out.println("Threads=" + threads + " ns=" + tester.test());
}
}
Note that even here the fastest solution is using one thread but you can clearly see that an even number of threads does it quicker as I am using an i5 which has 2 cores but works as 4 via hyperthreading.
Interestingly though - as suggested by #biziclop - removing all contention between threads via the occurrences by giving each thread its own `occurrences array we get a more expected result:
The other answers all explored the immediate problems with your code, I'll give you a different angle: one that's about design of multi-threading in general.
The idea of parallel computing speeding up calculations depends on the assumption that the small bits you broke the problem up into can indeed be run in parallel, independently of each other.
And at first glance, your problem is exactly like that, chop the input range up into 8 equal parts, fire up 8 threads and off they go.
There is a catch though:
occurences[numbers[i]]++;
The occurences array is a resource shared by all threads, and therefore you must control access to it to ensure correctness: either by explicit synchronization (which is slow) or something like an AtomicIntegerArray. But the Atomic* classes are only really fast if access to them is rarely contested. And in your case access will be contested a lot, because most of what your inner loop does is incrementing the number of occurrences.
So what can you do?
The problem is caused partly by the fact that occurences is such a small structure (an array with 10 elements only, regardless of input size), threads will continuously try to update the same element. But you can turn that to your advantage: make all the threads keep their own separate tally, and when they all finished, just add up their results. This will add a small, constant overhead to the end of the process but will make the calculations go truly parallel.
The join method allows one thread to wait for the completion of another, so the second thread will start only after the first will finish.
Join each thread after you started all threads.
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
}
for(int i = 0; i < threads.length; i++) {
threads[i].join();
}
}
Also there seem to be a race condition in code at occurences[numbers[i]]++
So most probably if you update the code and use more threads the output wouldn't be correct. You should use an AtomicIntegerArray: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicIntegerArray.html
Use an ExecutorService with Callable and invoke all tasks then you can safely aggregate them. Also use TimeUnit for elapsing time manipulations (sleep, joining, waiting, convertion, ...)
Start by defining the task with his input/output :
class Task implements Callable<Task> {
// input
int[] source;
int sliceStart;
int sliceEnd;
// output
int[] occurences = new int[10];
String runner;
long elapsed = 0;
Task(int[] source, int sliceStart, int sliceEnd) {
this.source = source;
this.sliceStart = sliceStart;
this.sliceEnd = sliceEnd;
}
#Override
public Task call() {
runner = Thread.currentThread().getName();
long start = System.nanoTime();
try {
compute();
} finally {
elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
}
return this;
}
void compute() {
for (int i = sliceStart; i < sliceEnd; i++) {
occurences[source[i]]++;
}
}
}
Then let's define some variable to manage parameters:
// Parametters
int size = 5_000_000;
int parallel = Runtime.getRuntime().availableProcessors();
int slices = parallel;
Then generates random input:
// Generated source
int[] source = new int[size];
ThreadLocalRandom random = ThreadLocalRandom.current();
for (int i = 0; i < source.length; i++) source[i] = random.nextInt(10);
Start timing total computation and prepare tasks:
long start = System.nanoTime();
// Prepare tasks
List<Task> tasks = new ArrayList<>(slices);
int sliceSize = source.length / slices;
for (int sliceStart = 0; sliceStart < source.length;) {
int sliceEnd = Math.min(sliceStart + sliceSize, source.length);
Task task = new Task(source, sliceStart, sliceEnd);
tasks.add(task);
sliceStart = sliceEnd;
}
Executes all task on threading configuration (don't forget to shutdown it !):
// Execute tasks
ExecutorService executor = Executors.newFixedThreadPool(parallel);
try {
executor.invokeAll(tasks);
} finally {
executor.shutdown();
}
Then task have been completed, just aggregate data:
// Collect data
int[] occurences = new int[10];
for (Task task : tasks) {
for (int i = 0; i < occurences.length; i++) {
occurences[i] += task.occurences[i];
}
}
Finally you can output computation result:
// Display result
long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.printf("Computation done in %tT.%<tL%n", calendar(elapsed));
System.out.printf("Results: %s%n", Arrays.toString(occurences));
You can also output partial computations:
// Print debug output
int idxSize = (String.valueOf(size).length() * 4) / 3;
String template = "Slice[%," + idxSize + "d-%," + idxSize + "d] computed in %tT.%<tL by %s: %s%n";
for (Task task : tasks) {
System.out.printf(template, task.sliceStart, task.sliceEnd, calendar(task.elapsed), task.runner, Arrays.toString(task.occurences));
}
Which gives on my workstation:
Computation done in 00:00:00.024
Results: [500159, 500875, 500617, 499785, 500017, 500777, 498394, 498614, 499498, 501264]
Slice[ 0-1 250 000] computed in 00:00:00.013 by pool-1-thread-1: [125339, 125580, 125338, 124888, 124751, 124608, 124463, 124351, 125023, 125659]
Slice[1 250 000-2 500 000] computed in 00:00:00.014 by pool-1-thread-2: [124766, 125423, 125111, 124756, 125201, 125695, 124266, 124405, 125083, 125294]
Slice[2 500 000-3 750 000] computed in 00:00:00.013 by pool-1-thread-3: [124903, 124756, 124934, 125640, 124954, 125452, 124556, 124816, 124737, 125252]
Slice[3 750 000-5 000 000] computed in 00:00:00.014 by pool-1-thread-4: [125151, 125116, 125234, 124501, 125111, 125022, 125109, 125042, 124655, 125059]
the small trick to convert elapsed millis in a stopwatch calendar:
static final TimeZone UTC= TimeZone.getTimeZone("UTC");
public static Calendar calendar(long millis) {
Calendar calendar = Calendar.getInstance(UTC);
calendar.setTimeInMillis(millis);
return calendar;
}
I've coded a multi-threaded matrix multiplication. I believe my approach is right, but I'm not 100% sure. In respect to the threads, I don't understand why I can't just run a (new MatrixThread(...)).start() instead of using an ExecutorService.
Additionally, when I benchmark the multithreaded approach versus the classical approach, the classical is much faster...
What am I doing wrong?
Matrix Class:
import java.util.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
class Matrix
{
private int dimension;
private int[][] template;
public Matrix(int dimension)
{
this.template = new int[dimension][dimension];
this.dimension = template.length;
}
public Matrix(int[][] array)
{
this.dimension = array.length;
this.template = array;
}
public int getMatrixDimension() { return this.dimension; }
public int[][] getArray() { return this.template; }
public void fillMatrix()
{
Random randomNumber = new Random();
for(int i = 0; i < dimension; i++)
{
for(int j = 0; j < dimension; j++)
{
template[i][j] = randomNumber.nextInt(10) + 1;
}
}
}
#Override
public String toString()
{
String retString = "";
for(int i = 0; i < this.getMatrixDimension(); i++)
{
for(int j = 0; j < this.getMatrixDimension(); j++)
{
retString += " " + this.getArray()[i][j];
}
retString += "\n";
}
return retString;
}
public static Matrix classicalMultiplication(Matrix a, Matrix b)
{
int[][] result = new int[a.dimension][b.dimension];
for(int i = 0; i < a.dimension; i++)
{
for(int j = 0; j < b.dimension; j++)
{
for(int k = 0; k < b.dimension; k++)
{
result[i][j] += a.template[i][k] * b.template[k][j];
}
}
}
return new Matrix(result);
}
public Matrix multiply(Matrix multiplier) throws InterruptedException
{
Matrix result = new Matrix(dimension);
ExecutorService es = Executors.newFixedThreadPool(dimension*dimension);
for(int currRow = 0; currRow < multiplier.dimension; currRow++)
{
for(int currCol = 0; currCol < multiplier.dimension; currCol++)
{
//(new MatrixThread(this, multiplier, currRow, currCol, result)).start();
es.execute(new MatrixThread(this, multiplier, currRow, currCol, result));
}
}
es.shutdown();
es.awaitTermination(2, TimeUnit.DAYS);
return result;
}
private class MatrixThread extends Thread
{
private Matrix a, b, result;
private int row, col;
private MatrixThread(Matrix a, Matrix b, int row, int col, Matrix result)
{
this.a = a;
this.b = b;
this.row = row;
this.col = col;
this.result = result;
}
#Override
public void run()
{
int cellResult = 0;
for (int i = 0; i < a.getMatrixDimension(); i++)
cellResult += a.template[row][i] * b.template[i][col];
result.template[row][col] = cellResult;
}
}
}
Main class:
import java.util.Scanner;
public class MatrixDriver
{
private static final Scanner kb = new Scanner(System.in);
public static void main(String[] args) throws InterruptedException
{
Matrix first, second;
long timeLastChanged,timeNow;
double elapsedTime;
System.out.print("Enter value of n (must be a power of 2):");
int n = kb.nextInt();
first = new Matrix(n);
first.fillMatrix();
second = new Matrix(n);
second.fillMatrix();
timeLastChanged = System.currentTimeMillis();
//System.out.println("Product of the two using threads:\n" +
first.multiply(second);
timeNow = System.currentTimeMillis();
elapsedTime = (timeNow - timeLastChanged)/1000.0;
System.out.println("Threaded took "+elapsedTime+" seconds");
timeLastChanged = System.currentTimeMillis();
//System.out.println("Product of the two using classical:\n" +
Matrix.classicalMultiplication(first,second);
timeNow = System.currentTimeMillis();
elapsedTime = (timeNow - timeLastChanged)/1000.0;
System.out.println("Classical took "+elapsedTime+" seconds");
}
}
P.S. Please let me know if any further clarification is needed.
There is a bunch of overhead involved in creating threads, even when using an ExecutorService. I suspect the reason why you're multithreaded approach is so slow is that you're spending 99% creating a new thread and only 1%, or less, doing the actual math.
Typically, to solve this problem you'd batch a whole bunch of operations together and run those on a single thread. I'm not 100% how to do that in this case, but I suggest breaking your matrix into smaller chunks (say, 10 smaller matrices) and run those on threads, instead of running each cell in its own thread.
You're creating a lot of threads. Not only is it expensive to create threads, but for a CPU bound application, you don't want more threads than you have available processors (if you do, you have to spend processing power switching between threads, which also is likely to cause cache misses which are very expensive).
It's also unnecessary to send a thread to execute; all it needs is a Runnable. You'll get a big performance boost by applying these changes:
Make the ExecutorService a static member, size it for the current processor, and send it a ThreadFactory so it doesn't keep the program running after main has finished. (It would probably be architecturally cleaner to send it as a parameter to the method rather than keeping it as a static field; I leave that as an exercise for the reader. ☺)
private static final ExecutorService workerPool =
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors(), new ThreadFactory() {
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setDaemon(true);
return t;
}
});
Make MatrixThread implement Runnable rather than inherit Thread. Threads are expensive to create; POJOs are very cheap. You can also make it static which makes the instances smaller (as non-static classes get an implicit reference to the enclosing object).
private static class MatrixThread implements Runnable
From change (1), you can no longer awaitTermination to make sure all tasks are finished (as this worker pool). Instead, use the submit method which returns a Future<?>. Collect all the future objects in a list, and when you've submitted all the tasks, iterate over the list and call get for each object.
Your multiply method should now look something like this:
public Matrix multiply(Matrix multiplier) throws InterruptedException {
Matrix result = new Matrix(dimension);
List<Future<?>> futures = new ArrayList<Future<?>>();
for(int currRow = 0; currRow < multiplier.dimension; currRow++) {
for(int currCol = 0; currCol < multiplier.dimension; currCol++) {
Runnable worker = new MatrixThread(this, multiplier, currRow, currCol, result);
futures.add(workerPool.submit(worker));
}
}
for (Future<?> f : futures) {
try {
f.get();
} catch (ExecutionException e){
throw new RuntimeException(e); // shouldn't happen, but might do
}
}
return result;
}
Will it be faster than the single-threaded version? Well, on my arguably crappy box the multithreaded version is slower for values of n < 1024.
This is just scratching the surface, though. The real problem is that you create a lot of MatrixThread instances - your memory consumption is O(n²), which is a very bad sign. Moving the inner for loop into MatrixThread.run would improve performance by a factor of craploads (ideally, you don't create more tasks than you have worker threads).
Edit: As I have more pressing things to do, I couldn't resist optimizing this further. I came up with this (... horrendously ugly piece of code) that "only" creates O(n) jobs:
public Matrix multiply(Matrix multiplier) throws InterruptedException {
Matrix result = new Matrix(dimension);
List<Future<?>> futures = new ArrayList<Future<?>>();
for(int currRow = 0; currRow < multiplier.dimension; currRow++) {
Runnable worker = new MatrixThread2(this, multiplier, currRow, result);
futures.add(workerPool.submit(worker));
}
for (Future<?> f : futures) {
try {
f.get();
} catch (ExecutionException e){
throw new RuntimeException(e); // shouldn't happen, but might do
}
}
return result;
}
private static class MatrixThread2 implements Runnable
{
private Matrix self, mul, result;
private int row, col;
private MatrixThread2(Matrix a, Matrix b, int row, Matrix result)
{
this.self = a;
this.mul = b;
this.row = row;
this.result = result;
}
#Override
public void run()
{
for(int col = 0; col < mul.dimension; col++) {
int cellResult = 0;
for (int i = 0; i < self.getMatrixDimension(); i++)
cellResult += self.template[row][i] * mul.template[i][col];
result.template[row][col] = cellResult;
}
}
}
It's still not great, but basically the multi-threaded version can compute anything you'll be patient enough to wait for, and it'll do it faster than the single-threaded version.
First of all, you should use a newFixedThreadPool of the size as many cores you have, on a quadcore you use 4. Second of all, don't create a new one for each matrix.
If you make the executorservice a static member variable I get almost consistently faster execution of the threaded version at a matrix size of 512.
Also, change MatrixThread to implement Runnable instead of extending Thread also speeds up execution to where the threaded is on my machine 2x as fast on 512