Related
In the pursuit of learning I have written a multi-threaded linear search, designed to operate on an int[] array. I believe the search works as intended, however after completing it I tested it against a standard 'for loop' and was surprised to see that the 'for loop' beat my search in terms of speed every time. I've tried tinkering with the code, but cannot get the search to beat a basic 'for loop'. At the moment I am wondering the following:
Is there an obvious flaw in my code that I am not seeing?
Is my code perhaps not well optimised for CPU caches?
Is this just the overheads of multi-threading slowing down my program and so I need a larger array to reap the benefits?
Unable to work it out myself, I am hoping someone here may be able to point me in the right direction, leading to my question:
Is there an inefficiency/flaw in my code that is making it slower than a standard loop, or is this just the overheads of threading slowing it down?
The Search:
public class MLinearSearch {
private MLinearSearch() {};
public static int[] getMultithreadingPositions(int[] data, int processors) {
int pieceSize = data.length / processors;
int remainder = data.length % processors;
int curPosition = 0;
int[] results = new int[processors + 1];
for (int i = 0; i < results.length - 1; i++) {
results[i] = curPosition;
curPosition += pieceSize;
if(i < remainder) {
curPosition++;
}
}
results[results.length - 1] = data.length;
return results;
}
public static int search(int target, int[]data) {
MLinearSearch.processors = Runtime.getRuntime().availableProcessors();
MLinearSearch.foundIndex = -1;
int[] domains = MLinearSearch.getMultithreadingPositions(data, processors);
Thread[] threads = new Thread[MLinearSearch.processors];
for(int i = 0; i < MLinearSearch.processors; i++) {
MLSThread searcher = new MLSThread(target, data, domains[i], domains[(i + 1)]);
searcher.setDaemon(true);
threads[i] = searcher;
searcher.run();
}
for(Thread thread : threads) {
try {
thread.join();
} catch (InterruptedException e) {
return MLinearSearch.foundIndex;
}
}
return MLinearSearch.foundIndex;
}
private static class MLSThread extends Thread {
private MLSThread(int target, int[] data, int start, int end) {
this.counter = start;
this.dataEnd = end;
this.target = target;
this.data = data;
}
#Override
public void run() {
while(this.counter < (this.dataEnd) && MLinearSearch.foundIndex == -1) {
if(this.target == this.data[this.counter]) {
MLinearSearch.foundIndex = this.counter;
return;
}
counter++;
}
}
private int counter;
private int dataEnd;
private int target;
private int[] data;
}
private static volatile int foundIndex = -1;
private static volatile int processors;
}
Note: "getMultithreadingPositions" is normally in a separate class. I have copied the method here for simplicity.
This is how I've been testing the code. Another test (Omitted here, but in the same file & run) runs the basic for loop, which beats my multi-threaded search every time.
public class SearchingTest {
#Test
public void multiLinearTest() {
int index = MLinearSearch.search(TARGET, arrayData);
assertEquals(TARGET, arrayData[index]);
}
private static int[] getShuffledArray(int[] array) {
// https://stackoverflow.com/questions/1519736/random-shuffling-of-an-array
Random rnd = ThreadLocalRandom.current();
for (int i = array.length - 1; i > 0; i--)
{
int index = rnd.nextInt(i + 1);
int a = array[index];
array[index] = array[i];
array[i] = a;
}
return array;
}
private static final int[] arrayData = SearchingTests.getShuffledArray(IntStream.range(0, 55_000_000).toArray());
private static final int TARGET = 7;
}
The loop beating this is literally just a for loop that iterates over the same array. I would imagine for smaller arrays the for loop would win out as its simplicity allows it to get going before my multi-threaded search can initiate its threads. At the array size I am trying though I would have expected a single thread to lose out.
Note: I had to increase my heap size with the following JVM argument:
-Xmx4096m
To avoid a heap memory exception.
Thank you for any help offered.
I have the code like the below. In a loop it is executing the method "process". It is running sequentially. I want to run this method parallel, but it should be finished within the loop so that I can sum in the next line. i.e even it is running parallel all functions should finish before the 2nd for loop execute.
How to solve this in Jdk1.7 not JDK1.8 version?
public static void main(String s[]){
int arrlen = 10;
int arr[] = new int[arrlen] ;
int t =0;
for(int i=0;i<arrlen;i++){
arr[i] = i;
t = process(arr[i]);
arr[i] = t;
}
int sum =0;
for(int i=0;i<arrlen;i++){
sum += arr[i];
}
System.out.println(sum);
}
public static int process(int arr){
return arr*2;
}
Below example might help you. I have used fork/join framework to do that.
For small array size like your example, conventional method might be faster and I doubt that fork/join way would take slight higher time. But for larger size or process , fork/join framework is suitable. Even java 8 parallel streams uses fork/join framework as underlying base.
public class ForkMultiplier extends RecursiveAction {
int[] array;
int threshold = 3;
int start;
int end;
public ForkMultiplier(int[] array,int start, int end) {
this.array = array;
this.start = start;
this.end = end;
}
protected void compute() {
if (end - start < threshold) {
computeDirectly();
} else {
int middle = (end + start) / 2;
ForkMultiplier f1= new ForkMultiplier(array, start, middle);
ForkMultiplier f2= new ForkMultiplier(array, middle, end);
invokeAll(f1, f2);
}
}
protected void computeDirectly() {
for (int i = start; i < end; i++) {
array[i] = array[i] * 2;
}
}
}
You main class would like this below
public static void main(String s[]){
int arrlen = 10;
int arr[] = new int[arrlen] ;
for(int i=0;i<arrlen;i++){
arr[i] = i;
}
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new ForkMultiplier(arr, 0, arr.length));
int sum =0;
for(int i=0;i<arrlen;i++){
sum += arr[i];
}
System.out.println(sum);
}
You basically need to use Executors and Futures combined that exist since Java 1.5 (see Java Documentation).
In the following example, I've created a main class that uses another helper class that acts like the processor you want to parallelize.
The main class is splitted in 3 steps:
Creates the processes pool and executes tasks in parallel.
Waits for all tasks to finish their work.
Collects the results from tasks.
For didactic reasons, I've put some logs and more important, I've put a random waiting time in each process' business logic, simulating a time-consuming algorithm ran by the Process class.
The maximum waiting time for each process is 2 seconds, which is also the highest waiting time for step 2, even if you increase the number of parallel tasks (just try changing the variable totalTasks of the following code to test it).
Here the Main class:
package com.example;
import java.util.ArrayList;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class Main
{
public static void main(String[] args) throws InterruptedException, ExecutionException
{
int totalTasks = 100;
ExecutorService newFixedThreadPool = Executors.newFixedThreadPool(totalTasks);
System.out.println("Step 1 - Starting parallel tasks");
ArrayList<Future<Integer>> tasks = new ArrayList<Future<Integer>>();
for (int i = 0; i < totalTasks; i++) {
tasks.add(newFixedThreadPool.submit(new Process(i)));
}
long ts = System.currentTimeMillis();
System.out.println("Step 2 - Wait for processes to finish...");
boolean tasksCompleted;
do {
tasksCompleted = true;
for (Future<Integer> task : tasks) {
if (!task.isDone()) {
tasksCompleted = false;
Thread.sleep(10);
break;
}
}
} while (!tasksCompleted);
System.out.println(String.format("Step 2 - End in '%.3f' seconds", (System.currentTimeMillis() - ts) / 1000.0));
System.out.println("Step 3 - All processes finished to run, let's collect results...");
Integer sum = 0;
for (Future<Integer> task : tasks) {
sum += task.get();
}
System.out.println(String.format("Total final sum is: %d", sum));
}
}
Here the Process class:
package com.example;
import java.util.concurrent.Callable;
public class Process implements Callable<Integer>
{
private Integer value;
public Process(Integer value)
{
this.value = value;
}
public Integer call() throws Exception
{
Long sleepTime = (long)(Math.random() * 2000);
System.out.println(String.format("Starting process with value %d, sleep time %d", this.value, sleepTime));
Thread.sleep(sleepTime);
System.out.println(String.format("Stopping process with value %d", this.value));
return value * 2;
}
}
Hope this helps.
I am working on a homework problem where I have to create a Multithreaded version of Merge Sort. I was able to implement it, but I am not able to stop the creation of threads. I looked into using an ExecutorService to limit the creation of threads but I cannot figure out how to implement it within my current code.
Here is my current Multithreaded Merge Sort. We are required to implement a specific strategy pattern so that is where my sort() method comes from.
#Override
public int[] sort(int[] list) {
int array_size = list.length;
list = msort(list, 0, array_size-1);
return list;
}
int[] msort(int numbers[], int left, int right) {
final int mid;
final int leftRef = left;
final int rightRef = right;
final int array[] = numbers;
if (left<right) {
mid = (right + left) / 2;
//new thread
Runnable r1 = new Runnable(){
public void run(){
msort(array, leftRef, mid);
}
};
Thread t1 = new Thread(r1);
t1.start();
//new thread
Runnable r2 = new Runnable(){
public void run(){
msort(array, mid+1, rightRef);
}
};
Thread t2 = new Thread(r2);
t2.start();
//join threads back together
try {
t1.join();
t2.join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
merge(numbers, leftRef, mid, mid+1, rightRef);
}
return numbers;
}
void merge(int numbers[], int startA, int endA, int startB, int endB) {
int finalStart = startA;
int finalEnd = endB;
int indexC = 0;
int[] listC = new int[numbers.length];
while(startA <= endA && startB <= endB){
if(numbers[startA] < numbers[startB]){
listC[indexC] = numbers[startA];
startA = startA+1;
}
else{
listC[indexC] = numbers[startB];
startB = startB +1;
}
indexC++;
}
if(startA <= endA){
for(int i = startA; i < endA; i++){
listC[indexC]= numbers[i];
indexC++;
}
}
indexC = 0;
for(int i = finalStart; i <= finalEnd; i++){
numbers[i]=listC[indexC];
indexC++;
}
}
Any pointers would be gratefully received.
Following #mcdowella's comment, I also think that the fork/join framework is your best bet if you want to limit the number of threads that run in parallel.
I know that this won't give you any help on your homework, because you are probably not allowed to use the fork/join framework in Java7. However it is about to learn something, isn't it?;)
As I commented, I think your merge method is wrong. I can't pinpoint the failure, but I have rewritten it. I strongly suggest you to write a testcase with all the edge cases that can happen during that merge method and if you verified it works, plant it back to your multithreaded code.
#lbalazscs also gave you the hint that the fork/join sort is mentioned in the javadocs, however I had nothing else to do- so I will show you the solution if you'd implemented it with Java7.
public class MultithreadedMergeSort extends RecursiveAction {
private final int[] array;
private final int begin;
private final int end;
public MultithreadedMergeSort(int[] array, int begin, int end) {
this.array = array;
this.begin = begin;
this.end = end;
}
#Override
protected void compute() {
if (end - begin < 2) {
// swap if we only have two elements
if (array[begin] > array[end]) {
int tmp = array[end];
array[end] = array[begin];
array[begin] = tmp;
}
} else {
// overflow safe method to calculate the mid
int mid = (begin + end) >>> 1;
// invoke recursive sorting action
invokeAll(new MultithreadedMergeSort(array, begin, mid),
new MultithreadedMergeSort(array, mid + 1, end));
// merge both sides
merge(array, begin, mid, end);
}
}
void merge(int[] numbers, int startA, int startB, int endB) {
int[] toReturn = new int[endB - startA + 1];
int i = 0, k = startA, j = startB + 1;
while (i < toReturn.length) {
if (numbers[k] < numbers[j]) {
toReturn[i] = numbers[k];
k++;
} else {
toReturn[i] = numbers[j];
j++;
}
i++;
// if we hit the limit of an array, copy the rest
if (j > endB) {
System.arraycopy(numbers, k, toReturn, i, startB - k + 1);
break;
}
if (k > startB) {
System.arraycopy(numbers, j, toReturn, i, endB - j + 1);
break;
}
}
System.arraycopy(toReturn, 0, numbers, startA, toReturn.length);
}
public static void main(String[] args) {
int[] toSort = { 55, 1, 12, 2, 25, 55, 56, 77 };
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new MultithreadedMergeSort(toSort, 0, toSort.length - 1));
System.out.println(Arrays.toString(toSort));
}
Note that the construction of your threadpool limits the number of active parallel threads to the number of cores of your processor.
ForkJoinPool pool = new ForkJoinPool();
According to it's javadoc:
Creates a ForkJoinPool with parallelism equal to
java.lang.Runtime.availableProcessors, using the default thread
factory, no UncaughtExceptionHandler, and non-async LIFO processing
mode.
Also notice how my merge method differs from yours, because I think that is your main problem. At least your sorting works if I replace your merge method with mine.
As mcdowella pointed out, the Fork/Join framework in Java 7 is exactly for tasks that can be broken into smaller pieces recursively.
Actually, the Javadoc for RecursiveAction has a merge sort as the first example :)
Also note that ForkJoinPool is an ExecutorService.
I am experimenting with parallelizing algorithms in Java. I began with merge sort, and posted my attempt in this question. My revised attempt is in the code below, where I now try to parallelize quick sort.
Are there any rookie mistakes in my multi-threaded implementation or approach to this problem? If not, shouldn't I expect more than a 32% speed increase between a sequential and a parallelized algorithm on a duel-core (see timings at bottom)?
Here is the multithreading algorithm:
public class ThreadedQuick extends Thread
{
final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
CountDownLatch doneSignal;
static int num_threads = 1;
int[] my_array;
int start, end;
public ThreadedQuick(CountDownLatch doneSignal, int[] array, int start, int end) {
this.my_array = array;
this.start = start;
this.end = end;
this.doneSignal = doneSignal;
}
public static void reset() {
num_threads = 1;
}
public void run() {
quicksort(my_array, start, end);
doneSignal.countDown();
num_threads--;
}
public void quicksort(int[] array, int start, int end) {
int len = end-start+1;
if (len <= 1)
return;
int pivot_index = medianOfThree(array, start, end);
int pivotValue = array[pivot_index];
swap(array, pivot_index, end);
int storeIndex = start;
for (int i = start; i < end; i++) {
if (array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, end);
if (num_threads < MAX_THREADS) {
num_threads++;
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, start, storeIndex - 1).start();
quicksort(array, storeIndex + 1, end);
try {
completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex) {
ex.printStackTrace();
}
} else {
quicksort(array, start, storeIndex - 1);
quicksort(array, storeIndex + 1, end);
}
}
}
Here is how I start it off:
ThreadedQuick.reset();
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, 0, array.length-1).start();
try {
completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex){
ex.printStackTrace();
}
I tested this against Arrays.sort and a similar sequential quick sort algorithm. Here are the timing results on an intel duel-core dell laptop, in seconds:
Elements: 500,000,
sequential: 0.068592,
threaded: 0.046871,
Arrays.sort: 0.079677
Elements: 1,000,000,
sequential: 0.14416,
threaded: 0.095492,
Arrays.sort: 0.167155
Elements: 2,000,000,
sequential: 0.301666,
threaded: 0.205719,
Arrays.sort: 0.350982
Elements: 4,000,000,
sequential: 0.623291,
threaded: 0.424119,
Arrays.sort: 0.712698
Elements: 8,000,000,
sequential: 1.279374,
threaded: 0.859363,
Arrays.sort: 1.487671
Each number above is the average time of 100 tests, throwing out the 3 lowest and 3 highest cases. I used Random.nextInt(Integer.MAX_VALUE) to generate an array for each test, which was initialized once every 10 tests with the same seed. Each test consisted of timing the given algorithm with System.nanoTime. I rounded to six decimal places after averaging. And obviously, I did check to see if each sort worked.
As you can see, there is about a 32% increase in speed between the sequential and threaded cases in every set of tests. As I asked above, shouldn't I expect more than that?
Making numThreads static can cause problems, it is highly likely that you will end up with more than MAX_THREADS running at some point.
Probably the reason why you don't get a full double up in performance is that your quick sort can not be fully parallelised. Note that the first call to quicksort will do a pass through the whole array in the initial thread before it starts to really run in parallel. There is also an overhead in parallelising an algorithm in the form of context switching and mode transitions when farming off to separate threads.
Have a look at the Fork/Join framework, this problem would probably fit quite neatly there.
A couple of points on the implementation. Implement Runnable rather than extending Thread. Extending a Thread should be used only when you create some new version of Thread class. When you just want to do some job to be run in parallel you are better off with Runnable. While iplementing a Runnable you can also still extend another class which gives you more flexibility in OO design. Use a thread pool that is restricted to the number of threads you have available in the system. Also don't use numThreads to make the decision on whether to fork off a new thread or not. You can calculate this up front. Use a minimum partition size which is the size of the total array divided by the number of processors available. Something like:
public class ThreadedQuick implements Runnable {
public static final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
static final ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS);
final int[] my_array;
final int start, end;
private final int minParitionSize;
public ThreadedQuick(int minParitionSize, int[] array, int start, int end) {
this.minParitionSize = minParitionSize;
this.my_array = array;
this.start = start;
this.end = end;
}
public void run() {
quicksort(my_array, start, end);
}
public void quicksort(int[] array, int start, int end) {
int len = end - start + 1;
if (len <= 1)
return;
int pivot_index = medianOfThree(array, start, end);
int pivotValue = array[pivot_index];
swap(array, pivot_index, end);
int storeIndex = start;
for (int i = start; i < end; i++) {
if (array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, end);
if (len > minParitionSize) {
ThreadedQuick quick = new ThreadedQuick(minParitionSize, array, start, storeIndex - 1);
Future<?> future = executor.submit(quick);
quicksort(array, storeIndex + 1, end);
try {
future.get(1000, TimeUnit.SECONDS);
} catch (Exception ex) {
ex.printStackTrace();
}
} else {
quicksort(array, start, storeIndex - 1);
quicksort(array, storeIndex + 1, end);
}
}
}
You can kick it off by doing:
ThreadedQuick quick = new ThreadedQuick(array / ThreadedQuick.MAX_THREADS, array, 0, array.length - 1);
quick.run();
This will start the sort in the same thread, which avoids an unnecessary thread hop at start up.
Caveat: Not sure the above implementation will actually be faster as I haven't benchmarked it.
This uses a combination of quick sort and merge sort.
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ParallelSortMain {
public static void main(String... args) throws InterruptedException {
Random rand = new Random();
final int[] values = new int[100*1024*1024];
for (int i = 0; i < values.length; i++)
values[i] = rand.nextInt();
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(threads);
int blockSize = (values.length + threads - 1) / threads;
for (int i = 0; i < values.length; i += blockSize) {
final int min = i;
final int max = Math.min(min + blockSize, values.length);
es.submit(new Runnable() {
#Override
public void run() {
Arrays.sort(values, min, max);
}
});
}
es.shutdown();
es.awaitTermination(10, TimeUnit.MINUTES);
for (int blockSize2 = blockSize; blockSize2 < values.length / 2; blockSize2 *= 2) {
for (int i = 0; i < values.length; i += blockSize2) {
final int min = i;
final int mid = Math.min(min + blockSize2, values.length);
final int max = Math.min(min + blockSize2 * 2, values.length);
mergeSort(values, min, mid, max);
}
}
}
private static boolean mergeSort(int[] values, int left, int mid, int end) {
int[] results = new int[end - left];
int l = left, r = mid, m = 0;
for (; l < left && r < mid; m++) {
int lv = values[l];
int rv = values[r];
if (lv < rv) {
results[m] = lv;
l++;
} else {
results[m] = rv;
r++;
}
}
while (l < mid)
results[m++] = values[l++];
while (r < end)
results[m++] = values[r++];
System.arraycopy(results, 0, values, left, results.length);
return false;
}
}
Couple of comments if I understand your code right:
I don't see a lock around the numthreads object even though it could be accessed via multiple threads. Perhaps you should make it an AtomicInteger.
Use a thread pool and arrange the tasks, i.e. a single call to quicksort, to take advantange of a thread pool. Use Futures.
Your current method of dividing things the way you're doing could leave a smaller division with a thread and a larger division without a thread. That is to say, it doesn't prioritize larger segments with their own threads.
I am experimenting with parallelizing algorithms in Java. I began with merge sort, and posted my attempt in this question. My revised attempt is in the code below, where I now try to parallelize quick sort.
Are there any rookie mistakes in my multi-threaded implementation or approach to this problem? If not, shouldn't I expect more than a 32% speed increase between a sequential and a parallelized algorithm on a duel-core (see timings at bottom)?
Here is the multithreading algorithm:
public class ThreadedQuick extends Thread
{
final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
CountDownLatch doneSignal;
static int num_threads = 1;
int[] my_array;
int start, end;
public ThreadedQuick(CountDownLatch doneSignal, int[] array, int start, int end) {
this.my_array = array;
this.start = start;
this.end = end;
this.doneSignal = doneSignal;
}
public static void reset() {
num_threads = 1;
}
public void run() {
quicksort(my_array, start, end);
doneSignal.countDown();
num_threads--;
}
public void quicksort(int[] array, int start, int end) {
int len = end-start+1;
if (len <= 1)
return;
int pivot_index = medianOfThree(array, start, end);
int pivotValue = array[pivot_index];
swap(array, pivot_index, end);
int storeIndex = start;
for (int i = start; i < end; i++) {
if (array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, end);
if (num_threads < MAX_THREADS) {
num_threads++;
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, start, storeIndex - 1).start();
quicksort(array, storeIndex + 1, end);
try {
completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex) {
ex.printStackTrace();
}
} else {
quicksort(array, start, storeIndex - 1);
quicksort(array, storeIndex + 1, end);
}
}
}
Here is how I start it off:
ThreadedQuick.reset();
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, 0, array.length-1).start();
try {
completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex){
ex.printStackTrace();
}
I tested this against Arrays.sort and a similar sequential quick sort algorithm. Here are the timing results on an intel duel-core dell laptop, in seconds:
Elements: 500,000,
sequential: 0.068592,
threaded: 0.046871,
Arrays.sort: 0.079677
Elements: 1,000,000,
sequential: 0.14416,
threaded: 0.095492,
Arrays.sort: 0.167155
Elements: 2,000,000,
sequential: 0.301666,
threaded: 0.205719,
Arrays.sort: 0.350982
Elements: 4,000,000,
sequential: 0.623291,
threaded: 0.424119,
Arrays.sort: 0.712698
Elements: 8,000,000,
sequential: 1.279374,
threaded: 0.859363,
Arrays.sort: 1.487671
Each number above is the average time of 100 tests, throwing out the 3 lowest and 3 highest cases. I used Random.nextInt(Integer.MAX_VALUE) to generate an array for each test, which was initialized once every 10 tests with the same seed. Each test consisted of timing the given algorithm with System.nanoTime. I rounded to six decimal places after averaging. And obviously, I did check to see if each sort worked.
As you can see, there is about a 32% increase in speed between the sequential and threaded cases in every set of tests. As I asked above, shouldn't I expect more than that?
Making numThreads static can cause problems, it is highly likely that you will end up with more than MAX_THREADS running at some point.
Probably the reason why you don't get a full double up in performance is that your quick sort can not be fully parallelised. Note that the first call to quicksort will do a pass through the whole array in the initial thread before it starts to really run in parallel. There is also an overhead in parallelising an algorithm in the form of context switching and mode transitions when farming off to separate threads.
Have a look at the Fork/Join framework, this problem would probably fit quite neatly there.
A couple of points on the implementation. Implement Runnable rather than extending Thread. Extending a Thread should be used only when you create some new version of Thread class. When you just want to do some job to be run in parallel you are better off with Runnable. While iplementing a Runnable you can also still extend another class which gives you more flexibility in OO design. Use a thread pool that is restricted to the number of threads you have available in the system. Also don't use numThreads to make the decision on whether to fork off a new thread or not. You can calculate this up front. Use a minimum partition size which is the size of the total array divided by the number of processors available. Something like:
public class ThreadedQuick implements Runnable {
public static final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
static final ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS);
final int[] my_array;
final int start, end;
private final int minParitionSize;
public ThreadedQuick(int minParitionSize, int[] array, int start, int end) {
this.minParitionSize = minParitionSize;
this.my_array = array;
this.start = start;
this.end = end;
}
public void run() {
quicksort(my_array, start, end);
}
public void quicksort(int[] array, int start, int end) {
int len = end - start + 1;
if (len <= 1)
return;
int pivot_index = medianOfThree(array, start, end);
int pivotValue = array[pivot_index];
swap(array, pivot_index, end);
int storeIndex = start;
for (int i = start; i < end; i++) {
if (array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, end);
if (len > minParitionSize) {
ThreadedQuick quick = new ThreadedQuick(minParitionSize, array, start, storeIndex - 1);
Future<?> future = executor.submit(quick);
quicksort(array, storeIndex + 1, end);
try {
future.get(1000, TimeUnit.SECONDS);
} catch (Exception ex) {
ex.printStackTrace();
}
} else {
quicksort(array, start, storeIndex - 1);
quicksort(array, storeIndex + 1, end);
}
}
}
You can kick it off by doing:
ThreadedQuick quick = new ThreadedQuick(array / ThreadedQuick.MAX_THREADS, array, 0, array.length - 1);
quick.run();
This will start the sort in the same thread, which avoids an unnecessary thread hop at start up.
Caveat: Not sure the above implementation will actually be faster as I haven't benchmarked it.
This uses a combination of quick sort and merge sort.
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ParallelSortMain {
public static void main(String... args) throws InterruptedException {
Random rand = new Random();
final int[] values = new int[100*1024*1024];
for (int i = 0; i < values.length; i++)
values[i] = rand.nextInt();
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(threads);
int blockSize = (values.length + threads - 1) / threads;
for (int i = 0; i < values.length; i += blockSize) {
final int min = i;
final int max = Math.min(min + blockSize, values.length);
es.submit(new Runnable() {
#Override
public void run() {
Arrays.sort(values, min, max);
}
});
}
es.shutdown();
es.awaitTermination(10, TimeUnit.MINUTES);
for (int blockSize2 = blockSize; blockSize2 < values.length / 2; blockSize2 *= 2) {
for (int i = 0; i < values.length; i += blockSize2) {
final int min = i;
final int mid = Math.min(min + blockSize2, values.length);
final int max = Math.min(min + blockSize2 * 2, values.length);
mergeSort(values, min, mid, max);
}
}
}
private static boolean mergeSort(int[] values, int left, int mid, int end) {
int[] results = new int[end - left];
int l = left, r = mid, m = 0;
for (; l < left && r < mid; m++) {
int lv = values[l];
int rv = values[r];
if (lv < rv) {
results[m] = lv;
l++;
} else {
results[m] = rv;
r++;
}
}
while (l < mid)
results[m++] = values[l++];
while (r < end)
results[m++] = values[r++];
System.arraycopy(results, 0, values, left, results.length);
return false;
}
}
Couple of comments if I understand your code right:
I don't see a lock around the numthreads object even though it could be accessed via multiple threads. Perhaps you should make it an AtomicInteger.
Use a thread pool and arrange the tasks, i.e. a single call to quicksort, to take advantange of a thread pool. Use Futures.
Your current method of dividing things the way you're doing could leave a smaller division with a thread and a larger division without a thread. That is to say, it doesn't prioritize larger segments with their own threads.