I have an array of size N. I want to shuffle its elements in 2 threads (or more). Each thread should work with it's own part of the array.
Lets say, the first thread shuffles elements from 0 to K, and the second thread shuffles elements from K to N (where 0 < K < N). So, it can look like this:
//try-catch stuff is ommited
static void shuffle(int[] array) {
Thread t1 = new ShufflingThread(array, 0, array.length / 2);
Thread t2 = new ShufflingThread(array, array.length / 2, array.length);
t1.start();
t2.start();
t1.join();
t2.join();
}
public static void main(String[] args) {
int array = generateBigSortedArray();
shuffle(array);
}
Are there any guaranties from JVM that I will see changes in the array from the main method after such shuffling?
How should I implement ShufflingThread (or, how should I run it, maybe within a synchronized block or whatever else) in order to get such guaranties?
The join() calls are sufficient to ensure memory coherency: when t1.join() returns, the main thread "sees" whatever operations thread t1 did on the array.
Also, Java guarantees that there is no word-tearing on arrays: distinct threads may use distinct elements of the same array without needing synchronization.
I think this is a good exercise in thread control, where (1) a job can be broken up into several parts (2) the parts can run independently and asynchronously and (3) A master thread monitors the completion of all such jobs in their respective threads. All you need is for this master thread to wait() and be notify()-ed jobCount times, every time a thread completes execution. Here is a sample code that you can compile/run. Uncomment the println()'s to see more.
Notes: [1] JVM doesnt guarantee the order of execution of the threads [2] You need to synchronize when your master thread access the big array, to not have corrupted data....
public class ShufflingArray {
private int nPart = 4, // Count of jobs distributed, resource dependent
activeThreadCount, // Currently active, monitored with notify
iRay[]; // Array the threads will work on
public ShufflingArray (int[] a) {
iRay = a;
printArray (a);
}
private void printArray (int[] ia) {
for (int i = 0 ; i < ia.length ; i++)
System.out.print (" " + ((ia[i] < 10) ? " " : "") + ia[i]);
System.out.println();
}
public void shuffle () {
int startNext = 0, pLen = iRay.length / nPart; // make a bunch of parts
for (int i = 0 ; i < nPart ; i++, activeThreadCount++) {
int start = (i == 0) ? 0 : startNext,
stop = start + pLen;
startNext = stop;
if (i == (nPart-1))
stop = iRay.length;
new Thread (new ShuffleOnePart (start, stop, (i+1))).start();
}
waitOnShufflers (0); // returns when activeThreadCount == 0
printArray (iRay);
}
synchronized private void waitOnShufflers (int bump) {
if (bump == 0) {
while (activeThreadCount > 0) {
// System.out.println ("Waiting on " + activeThreadCount + " threads");
try {
wait();
} catch (InterruptedException intex) {
}}} else {
activeThreadCount += bump;
notify();
}}
public class ShuffleOnePart implements Runnable {
private int startIndex, stopIndex; // Operate on global array iRay
public ShuffleOnePart (int i, int j, int k) {
startIndex = i;
stopIndex = j;
// System.out.println ("Shuffler part #" + k);
}
// Suppose shuffling means interchanging the first and last pairs
public void run () {
int tmp = iRay[startIndex+1];
iRay[startIndex+1] = iRay[startIndex]; iRay[startIndex] = tmp;
tmp = iRay[stopIndex-1];
iRay[stopIndex-1] = iRay[stopIndex-2]; iRay[stopIndex-2] = tmp;
try { // Lets imagine it needs to do something else too
Thread.sleep (157);
} catch (InterruptedException iex) { }
waitOnShufflers (-1);
}}
public static void main (String[] args) {
int n = 25, ia[] = new int[n];
for (int i = 0 ; i < n ; i++)
ia[i] = i+1;
new ShufflingArray(ia).shuffle();
}}
Thread.start() and Thread.join() are enough to give you the happens-before relationships between the array initialisation, its hand-off to the threads and then the read back in the main method.
Actions that cause happens-before are documented here.
As mentioned elsewhere, ForkJoin is very well suited to this kind of divide-and-conquer algorithm and frees you from a lot of the book-keeping that you'd otherwise need to implement.
using ExecutorService from java.util.Concurrent package along with Callable Task to return the part of the array's from each thread run, once both thread are completed is another way to do for consistent behaviour.
Well, they can't BOTH be accessing the same array and if you use a lock, or a mutex or any other synchronizing mechanism, you kinda lose the power of the threads (since one will have to wait for another, either to finish the shuffling or finish a bit of the shuffling).
Why don't you just divide the array in half, give each thread its bit of the array and then merge the two arrays?
Related
I've been trying to write a program to find the max value in an array.
I know how to use threads to find it but to get the right answer I need to use .join ().
I don't get why it wouldn't find the max without using .join (); and if I use .join () then the running time of the program would be as much as the same program without threads; so if it's the only way to run my thread and it doesn't fasten the process why do we use threads?
public void run() {
for (int i = start; i < end; i++) {
if (array[i] > threadMax) {
threadMax = array[i];
}
}
}
This is my run method; I give each thread quarter of the array and then find the max between them.
this works but if only I use .join() but I don't want to slow down the program.
what else can I do?
Edit: This is my code:
ThreadMax[] findMax = new ThreadMax[4];
findMax[0] = new ThreadMax(array, 0,array.length/4);
findMax[1] = new ThreadMax(array, array.length/4, (2*array.length)/4);
findMax[2] = new ThreadMax(array, (2*array.length)/4, (3 * array.length) / 4);
findMax[3] = new ThreadMax(array, (3 * array.length) / 4, array.length);
for(int i = 0; i < 4; i++) {
findMax[i].myThread.start();
try {
findMax[i].myThread.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
int[] topFour = new int[4];
for(int i = 0; i < 4; i++)
topFour[i] = findMax[i].threadMax;
int result = 0;
for(int i = 0; i < 4; i++){
if(result < topFour[i])
result = topFour[i];
}
System.out.println("Max = " + result);
and my ThreadMax class:
int start;
int end;
int threadMax;
int[] array;
Thread myThread;
ThreadMax(int[] array, int start, int end) {
this.array = array;
this.start = start;
this.end = end;
myThread = new Thread(this);
}
#Override
public void run() {
for (int i = start; i < end; i++) {
if (array[i] > threadMax) {
threadMax = array[i];
}
}
}
Why do you use synchronized? You want to run all threads parallel, this avoids it.
You need to join the threads to see when they have finished. The code you call your threads from is another thread that doesn't know when these four threads are done with their work. join() waits for this moment.
this works but if only I use .join()
"works" means you don't have the correct result without join()? Yes because the calling code (that you unfortunately didn't show) probably runs over the thread-starting lines and comes to it's end even before the threads were started.
Edit
Now as more code is available it's clear to me. You execute a loop:
start the first thread
wait for the first thread
start the second thread...
No thread is started until the previous one has finished!
You need separate loops. FIRST start all threads in one loop, THEN wait for them to finish in a second loop.
You should use the Thread.join() method on the main thread. It ensures that wait for the thread until finish it's job. Otherwise main thread can be terminated before your threads are done.
The class could implement comparable<arraylist> and you could use the Collections.max(arrayList) method.
I'm in troubles with a multithreading java program.
The program consists of a splitted sum of an array of integers with multithreads and than the total sum of the slices.
The problem is that computing time does not decrements by incrementing number of threads (I know that there is a limit number of threads after that the computing time is slower than less threads). I expect to see a decrease of execution time before that limit number of threads (benefits of parallel execution). I use the variable fake in run method to make time "readable".
public class MainClass {
private final int MAX_THREAD = 8;
private final int ARRAY_SIZE = 1000000;
private int[] array;
private SimpleThread[] threads;
private int numThread = 1;
private int[] sum;
private int start = 0;
private int totalSum = 0;
long begin, end;
int fake;
MainClass() {
fillArray();
for(int i = 0; i < MAX_THREAD; i++) {
threads = new SimpleThread[numThread];
sum = new int[numThread];
begin = (long) System.currentTimeMillis();
for(int j = 0 ; j < numThread; j++) {
threads[j] = new SimpleThread(start, ARRAY_SIZE/numThread, j);
threads[j].start();
start+= ARRAY_SIZE/numThread;
}
for(int k = 0; k < numThread; k++) {
try {
threads[k].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
end = (long) System.currentTimeMillis();
for(int g = 0; g < numThread; g++) {
totalSum+=sum[g];
}
System.out.printf("Result with %d thread-- Sum = %d Time = %d\n", numThread, totalSum, end-begin);
numThread++;
start = 0;
totalSum = 0;
}
}
public static void main(String args[]) {
new MainClass();
}
private void fillArray() {
array = new int[ARRAY_SIZE];
for(int i = 0; i < ARRAY_SIZE; i++)
array[i] = 1;
}
private class SimpleThread extends Thread{
int start;
int size;
int index;
public SimpleThread(int start, int size, int sumIndex) {
this.start = start;
this.size = size;
this.index = sumIndex;
}
public void run() {
for(int i = start; i < start+size; i++)
sum[index]+=array[i];
for(long i = 0; i < 1000000000; i++) {
fake++;
}
}
}
Unexpected Result Screenshot
As a general rule, you won't get a speedup from multi-threading if the "work" performed by each thread is less than the overheads of using the threads.
One of the overheads is the cost of starting a new thread. This is surprisingly high. Each time you start a thread the JVM needs to perform syscalls to allocate the thread stack memory segment and the "red zone" memory segment, and initialize them. (The default thread stack size is typically 500KB or 1MB.) Then there are further syscalls to create the native thread and schedule it.
In this example, you have 1,000,000 elements to sum and you divide this work among N threads. As N increases, the amount of work performed by each thread decreases.
It is not hard to see that the time taken to sum 1,000,000 elements is going to be less than the time needed to start 4 threads ... just based on counting the memory read and write operations. Then you need to take into account that the child threads are created one at a time by the parent thread.
If you do the analysis completely, it is clear that there is a point at which adding more threads actually slows down the computation even if you have enough to cores to run all threads in parallel. And your benchmarking seems to suggest1 that that point is around about 2 threads.
By the way, there is a second reason why you may not get as much speedup as you expect for a benchmark like this one. The "work" that each thread is doing is basically scanning a large array. Reading and writing arrays will generate requests to the memory system. Ideally, these requests will be satisfied by the (fast) on-chip memory caches. However, if you try to read / write an array that is larger than the memory cache, then many / most of those requests turn into (slow) main memory requests. Worse still, if you have N cores all doing this then you can find that the number of main memory requests is too much for the memory system to keep up .... and the threads slow down.
The bottom line is that multi-threading does not automatically make an application faster, and it certainly won't if you do it the wrong way.
In your example:
the amount of work per thread is too small compared with the overheads of creating and starting threads, and
memory bandwidth effects are likely to be a problem if can "factor out" the thread creation overheads
1 - I don't understand the point of the "fake" computation. It probably invalidates the benchmark, though it is possible that the JIT compiler optimizes it away.
Why sum is wrong sometimes?
Because ARRAY_SIZE/numThread may have fractional part (e.g. 1000000/3=333333.3333333333) which gets rounded down so start variable loses some hence the sum maybe less than 1000000 depending on the value of divisor.
Why the time taken is increasing as the number of threads increases?
Because in the run function of each thread you do this:
for(long i = 0; i < 1000000000; i++) {
fake++;
}
which I do not understand from your question :
I use the variable fake in run method to make time "readable".
what that means. But every thread needs to increment your fake variable 1000000000 times.
As a side note, for what you're trying to do there is the Fork/Join-Framework. It allows you easily split tasks recursively and implements an algorithm which will distribute your workload automatically.
There is a guide available here; it's example is very similar to your case, which boils down to a RecursiveTask like this:
class Adder extends RecursiveTask<Integer>
{
private int[] toAdd;
private int from;
private int to;
/** Add the numbers in the given array */
public Adder(int[] toAdd)
{
this(toAdd, 0, toAdd.length);
}
/** Add the numbers in the given array between the given indices;
internal constructor to split work */
private Adder(int[] toAdd, int fromIndex, int upToIndex)
{
this.toAdd = toAdd;
this.from = fromIndex;
this.to = upToIndex;
}
/** This is the work method */
#Override
protected Integer compute()
{
int amount = to - from;
int result = 0;
if (amount < 500)
{
// base case: add ints and return the result
for (int i = from; i < to; i++)
{
result += toAdd[i];
}
}
else
{
// array too large: split it into two parts and distribute the actual adding
int newEndIndex = from + (amount / 2);
Collection<Adder> invokeAll = invokeAll(Arrays.asList(
new Adder(toAdd, from, newEndIndex),
new Adder(toAdd, newEndIndex, to)));
for (Adder a : invokeAll)
{
result += a.invoke();
}
}
return result;
}
}
To actually run this, you can use
RecursiveTask adder = new Adder(fillArray(ARRAY_LENGTH));
int result = ForkJoinPool.commonPool().invoke(adder);
Starting threads is heavy and you'll only see the benefit of it on large processes that don't compete for the same resources (none of it applies here).
If I have a simple program to parallelize counting the number of 1's from a random integer from 0 - 9 for a large number of iterations, how do I reduce the variable counting the 1's (numOnes) using the sum function so that I can be able to use the total sum later on in my program.
This is equivalent to the reduction directive in OpenMP.
public void run() {
long work = total_iterations / threads;
long numOnes = 0;
for (long i = 0; i < work; i++) {
int randomNum = rand.nextInt(9);
if (randomNum == 1) {
numOnes += 1;
}
}
}
Once each thread is done executing, I want to be able to use numOnes containing the aggregate result.
In Java, you would have to sit down and manage things manually. In other words: assuming that your data is partitioned, you just kick of those 10 threads, and let them do their work.
In the end, you want to join all those threads; as in: only when all threads "joined"; all of them are done; and you are ready to go forward and process their results.
Alternatively, you would look into more "abstract" things like ExecutorService and Futures; to get away from dealing with "bare metal" threads directly.
Of course that is pretty generic; but well, so is your question.
You could use streams for this.
public class ParallelInts
{
public static void main(String[] args) {
int count = new Random().ints( 1_000_000, 0, 10 ).parallel()
.reduce( 0, (sum, i) -> sum + ((i==1)?1:0) );
System.out.println( "count = " + count );
}
}
so for my programming class we have to do the following:
Fill an integer array with 5 million integers ranging from 0-9.
Then find the number of times each number (0-9) occurs and display this.
We have to measure the time it takes to count the occurences for both single threaded, and multi-threaded. Currently I average 9.3ms for single threaded, and 8.9 ms multithreaded with 8 threads on my 8 core cpu, why is this?
Currently for multithreading I have one array filled with numbers and am calculating lower and upper bounds for each thread to count occurences. here is my current attempt:
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
threads[i].join();
}
}
Could anyone shed some light? Cheers.
You are essentially doing all the work sequentially because each thread you create you immediately join it.
Move the threads[i].join() outside the main construction loop into it's own loop. While you're at it you should probably also start all of the threads outside of the loop as starting them while new threads are still being created is not a good idea because creating threads takes time.
class ThreadTester {
private final int threadCount;
private final int numberCount;
int[] numbers = new int[5_000_000];
AtomicIntegerArray occurences;
Thread[] threads;
AtomicLong milliseconds = new AtomicLong();
public ThreadTester(int threadCount, int numberCount) {
this.threadCount = threadCount;
this.numberCount = numberCount;
occurences = new AtomicIntegerArray(numberCount);
threads = new Thread[threadCount];
Random r = new Random();
for (int i = 0; i < numbers.length; i++) {
numbers[i] = r.nextInt(numberCount);
}
}
public void createThreads() throws InterruptedException {
final int divisionSize = numbers.length / threadCount;
for (int i = 0; i < threads.length; i++) {
final int lower = (i * divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
#Override
public void run() {
long start = System.nanoTime();
for (int i = lower; i <= upper; i++) {
occurences.addAndGet(numbers[i], 1);
}
long end = System.nanoTime();
milliseconds.addAndGet(end - start);
}
});
}
}
private void startThreads() {
for (Thread thread : threads) {
thread.start();
}
}
private void finishThreads() throws InterruptedException {
for (Thread thread : threads) {
thread.join();
}
}
public long test() throws InterruptedException {
createThreads();
startThreads();
finishThreads();
return milliseconds.get();
}
}
public void test() throws InterruptedException {
for (int threads = 1; threads < 50; threads++) {
ThreadTester tester = new ThreadTester(threads, 10);
System.out.println("Threads=" + threads + " ns=" + tester.test());
}
}
Note that even here the fastest solution is using one thread but you can clearly see that an even number of threads does it quicker as I am using an i5 which has 2 cores but works as 4 via hyperthreading.
Interestingly though - as suggested by #biziclop - removing all contention between threads via the occurrences by giving each thread its own `occurrences array we get a more expected result:
The other answers all explored the immediate problems with your code, I'll give you a different angle: one that's about design of multi-threading in general.
The idea of parallel computing speeding up calculations depends on the assumption that the small bits you broke the problem up into can indeed be run in parallel, independently of each other.
And at first glance, your problem is exactly like that, chop the input range up into 8 equal parts, fire up 8 threads and off they go.
There is a catch though:
occurences[numbers[i]]++;
The occurences array is a resource shared by all threads, and therefore you must control access to it to ensure correctness: either by explicit synchronization (which is slow) or something like an AtomicIntegerArray. But the Atomic* classes are only really fast if access to them is rarely contested. And in your case access will be contested a lot, because most of what your inner loop does is incrementing the number of occurrences.
So what can you do?
The problem is caused partly by the fact that occurences is such a small structure (an array with 10 elements only, regardless of input size), threads will continuously try to update the same element. But you can turn that to your advantage: make all the threads keep their own separate tally, and when they all finished, just add up their results. This will add a small, constant overhead to the end of the process but will make the calculations go truly parallel.
The join method allows one thread to wait for the completion of another, so the second thread will start only after the first will finish.
Join each thread after you started all threads.
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
}
for(int i = 0; i < threads.length; i++) {
threads[i].join();
}
}
Also there seem to be a race condition in code at occurences[numbers[i]]++
So most probably if you update the code and use more threads the output wouldn't be correct. You should use an AtomicIntegerArray: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicIntegerArray.html
Use an ExecutorService with Callable and invoke all tasks then you can safely aggregate them. Also use TimeUnit for elapsing time manipulations (sleep, joining, waiting, convertion, ...)
Start by defining the task with his input/output :
class Task implements Callable<Task> {
// input
int[] source;
int sliceStart;
int sliceEnd;
// output
int[] occurences = new int[10];
String runner;
long elapsed = 0;
Task(int[] source, int sliceStart, int sliceEnd) {
this.source = source;
this.sliceStart = sliceStart;
this.sliceEnd = sliceEnd;
}
#Override
public Task call() {
runner = Thread.currentThread().getName();
long start = System.nanoTime();
try {
compute();
} finally {
elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
}
return this;
}
void compute() {
for (int i = sliceStart; i < sliceEnd; i++) {
occurences[source[i]]++;
}
}
}
Then let's define some variable to manage parameters:
// Parametters
int size = 5_000_000;
int parallel = Runtime.getRuntime().availableProcessors();
int slices = parallel;
Then generates random input:
// Generated source
int[] source = new int[size];
ThreadLocalRandom random = ThreadLocalRandom.current();
for (int i = 0; i < source.length; i++) source[i] = random.nextInt(10);
Start timing total computation and prepare tasks:
long start = System.nanoTime();
// Prepare tasks
List<Task> tasks = new ArrayList<>(slices);
int sliceSize = source.length / slices;
for (int sliceStart = 0; sliceStart < source.length;) {
int sliceEnd = Math.min(sliceStart + sliceSize, source.length);
Task task = new Task(source, sliceStart, sliceEnd);
tasks.add(task);
sliceStart = sliceEnd;
}
Executes all task on threading configuration (don't forget to shutdown it !):
// Execute tasks
ExecutorService executor = Executors.newFixedThreadPool(parallel);
try {
executor.invokeAll(tasks);
} finally {
executor.shutdown();
}
Then task have been completed, just aggregate data:
// Collect data
int[] occurences = new int[10];
for (Task task : tasks) {
for (int i = 0; i < occurences.length; i++) {
occurences[i] += task.occurences[i];
}
}
Finally you can output computation result:
// Display result
long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.printf("Computation done in %tT.%<tL%n", calendar(elapsed));
System.out.printf("Results: %s%n", Arrays.toString(occurences));
You can also output partial computations:
// Print debug output
int idxSize = (String.valueOf(size).length() * 4) / 3;
String template = "Slice[%," + idxSize + "d-%," + idxSize + "d] computed in %tT.%<tL by %s: %s%n";
for (Task task : tasks) {
System.out.printf(template, task.sliceStart, task.sliceEnd, calendar(task.elapsed), task.runner, Arrays.toString(task.occurences));
}
Which gives on my workstation:
Computation done in 00:00:00.024
Results: [500159, 500875, 500617, 499785, 500017, 500777, 498394, 498614, 499498, 501264]
Slice[ 0-1 250 000] computed in 00:00:00.013 by pool-1-thread-1: [125339, 125580, 125338, 124888, 124751, 124608, 124463, 124351, 125023, 125659]
Slice[1 250 000-2 500 000] computed in 00:00:00.014 by pool-1-thread-2: [124766, 125423, 125111, 124756, 125201, 125695, 124266, 124405, 125083, 125294]
Slice[2 500 000-3 750 000] computed in 00:00:00.013 by pool-1-thread-3: [124903, 124756, 124934, 125640, 124954, 125452, 124556, 124816, 124737, 125252]
Slice[3 750 000-5 000 000] computed in 00:00:00.014 by pool-1-thread-4: [125151, 125116, 125234, 124501, 125111, 125022, 125109, 125042, 124655, 125059]
the small trick to convert elapsed millis in a stopwatch calendar:
static final TimeZone UTC= TimeZone.getTimeZone("UTC");
public static Calendar calendar(long millis) {
Calendar calendar = Calendar.getInstance(UTC);
calendar.setTimeInMillis(millis);
return calendar;
}
I have been working on different variations of the Radix Sort. At first I used chaining, which was really slow. Then I moved onto using a count sort while using val % (10 * pass), and most recently turning it into the respective bytes and count sorting those, which allows me to sort by negative values also.
I wanted to try it with multithreading, and can only get it to work about half the time. I was wondering if someone can help look at my code, and see where I'm going wrong with the threading. I have each thread count sort each byte. Thanks:
public class radixSort {
public int[] array;
public int arraySize, arrayRange;
public radixSort (int[] array, int size, int range) {
this.array = array;
this.arraySize = size;
this.arrayRange = range;
}
public int[] RadixSort() {
Thread[] threads = new Thread[4];
for (int i=0;i<4;i++)
threads[i] = new Thread(new Radix(arraySize, i));
for (int i=0;i<4;i++)
threads[i].start();
for (int i=0;i<4;i++)
try {
threads[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
return array;
}
class Radix implements Runnable {
private int pass, size;
private int[] tempArray, freqArray;
public Radix(int size, int pass) {
this.pass = pass;
this.size = size;
this.tempArray = new int[size];
this.freqArray = new int[256];
}
public void run() {
int temp, i, j;
synchronized(array) {
for (i=0;i<size;i++) {
if (array[i] <= 0) temp = array[i] ^ 0x80000000;
else temp = array[i] ^ ((array[i] >> 31) | 0x80000000);
j = temp >> (pass << 3) & 0xFF;
freqArray[j]++;
}
for (i=1;i<256;i++)
freqArray[i] += freqArray[i-1];
for (i=size-1;i>=0;i--) {
if (array[i] <= 0) temp = array[i] ^ 0x80000000;
else temp = array[i] ^ ((array[i] >> 31) | 0x80000000);
j = temp >> (pass << 3) & 0xFF;
tempArray[--freqArray[j]] = array[i];
}
for (i=0;i<size;i++)
array[i] = tempArray[i];
}
}
}
}
There is a basic problem with this approach. To get a benefit from multithreading, you need to give each thread a non-overlapping task compared to the other treads. By synchonizing on the array you have made it so only one thread does work at a time, meaning you get all the overhead of threads with none of the benefit.
Think of ways to partition the task so that threads work in parallel. For example, after the first pass, all the item with a 1 high bit will be in one part of the array, and those with a zero high-bit will be in the other. You could have one thread work on each part of the array without synchronizing.
Note that your runnable has to completely change so that it does one pass at a specified subset of the array then spawns threads for the next pass.
Besides wrong class and method names (class should start with capital letter, method shouldn't), I can see that you are synchronizing all thread works on the array. So it's in fact not parallel at all.
I am pretty sure that you can't really parallelize RadixSort, at least in the way you are trying to. Someone pointed out that you can do it by divide-and-conquer, as you first order by the highest bits, but in fact, RadixSort works by comparing the lower-order bits first, so you can't really divide-and-conquer. The array can basically be completely permuted after each pass.
Guys, prove me wrong, but i think it's inherently impossible to parallelize this algorithm like you try to. Maybe you can parallelize the (count) sorting that is done inside of each pass, but be aware that ++ is not an atomic operation.