Finding Maximum in an array; using four threads in Java

Finding Maximum in an array; using four threads in Java - java

I've been trying to write a program to find the max value in an array.
I know how to use threads to find it but to get the right answer I need to use .join ().
I don't get why it wouldn't find the max without using .join (); and if I use .join () then the running time of the program would be as much as the same program without threads; so if it's the only way to run my thread and it doesn't fasten the process why do we use threads?
public void run() {
for (int i = start; i < end; i++) {
if (array[i] > threadMax) {
threadMax = array[i];
}
}
}
This is my run method; I give each thread quarter of the array and then find the max between them.
this works but if only I use .join() but I don't want to slow down the program.
what else can I do?
Edit: This is my code:
ThreadMax[] findMax = new ThreadMax[4];
findMax[0] = new ThreadMax(array, 0,array.length/4);
findMax[1] = new ThreadMax(array, array.length/4, (2*array.length)/4);
findMax[2] = new ThreadMax(array, (2*array.length)/4, (3 * array.length) / 4);
findMax[3] = new ThreadMax(array, (3 * array.length) / 4, array.length);
for(int i = 0; i < 4; i++) {
findMax[i].myThread.start();
try {
findMax[i].myThread.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
int[] topFour = new int[4];
for(int i = 0; i < 4; i++)
topFour[i] = findMax[i].threadMax;
int result = 0;
for(int i = 0; i < 4; i++){
if(result < topFour[i])
result = topFour[i];
}
System.out.println("Max = " + result);
and my ThreadMax class:
int start;
int end;
int threadMax;
int[] array;
Thread myThread;
ThreadMax(int[] array, int start, int end) {
this.array = array;
this.start = start;
this.end = end;
myThread = new Thread(this);
}
#Override
public void run() {
for (int i = start; i < end; i++) {
if (array[i] > threadMax) {
threadMax = array[i];
}
}
}

Why do you use synchronized? You want to run all threads parallel, this avoids it.
You need to join the threads to see when they have finished. The code you call your threads from is another thread that doesn't know when these four threads are done with their work. join() waits for this moment.
this works but if only I use .join()
"works" means you don't have the correct result without join()? Yes because the calling code (that you unfortunately didn't show) probably runs over the thread-starting lines and comes to it's end even before the threads were started.
Edit
Now as more code is available it's clear to me. You execute a loop:
start the first thread
wait for the first thread
start the second thread...
No thread is started until the previous one has finished!
You need separate loops. FIRST start all threads in one loop, THEN wait for them to finish in a second loop.

You should use the Thread.join() method on the main thread. It ensures that wait for the thread until finish it's job. Otherwise main thread can be terminated before your threads are done.

The class could implement comparable<arraylist> and you could use the Collections.max(arrayList) method.

Related

Java: threading divided into blocks array - executor service

I am creating a program to calculate values of two arrays in steps of simulation (they are initialized from the beginning, I did not put it here). I would like to do it with threads and ExecutorService. I divided arrays into blocks and I want values of these blocks to be calculated by threads, one block = one thread. These two arrays - X and Y - take values from each other (as you can see in run()), I want X to be calculated first and Y after that, so I made two separate runnables:
public static class CountX implements Runnable {
private int start;
private int end;
private CountDownLatch cdl;
public CountX(int s, int e, CountDownLatch c) {
this.start = s;
this.end = e;
this.cdl = c;
}
public void run() {
for (int i = start + 1; i < end - 1; i++) {
x[i] = x[i] - (y[i-1] - 2 * y[i] + y[i+1]) + y[i];
}
cdl.countDown();
}
}
And same for CountY. I would like to give to it the information where the start and end of value for every block is.
This is, in a short, how my main looks like and this is the main problem of mine:
int NN = 400; //length of X and Y
int threads = 8;
int block_size = (int) NN/threads;
final ExecutorService executor_X = Executors.newFixedThreadPool(threads);
final ExecutorService executor_Y = Executors.newFixedThreadPool(threads);
CountDownLatch cdl = new CountDownLatch(threads);
CountX[] runnables_X = new CountX[threads];
CountY[] runnables_Y = new CountY[threads];
for (int r = 0; r < threads; r++) {
runnables_X[r] = new CountX((r*block_size), ((r+1)*block_size), cdl);
}
for (int r = 0; r < threads; r++) {
runnables_Y[r] = new CountY((r*block_size), ((r+1)*block_size), cdl);
}
int sim_steps = 4000;
for(int m = 0; m < sim_steps; m++) {
for (int e = 0; e < threads; e++) {
executor_X.execute(runnables_X[e]);
}
for (int e = 0; e < threads; e++) {
executor_Y.execute(runnables_Y[e]);
}
}
executor_X.shutdown();
executor_Y.shutdown();
I get wrong values of arrays X and Y from this program, because I also did it without threads.
Is CountDownLatch necessary here? Am I supposed to do for loop of runnables_X[r] = new CountX((r*block_size), ((r+1)*block_size), cdl); in every m (sim_step) loop? Or maybe I should use ExecutorService in a different way? I tried many options but the results are still wrong.
Thank you in advance!

Your approach is one I probably wouldn't take for this task.
You can work with references and Runnables, but in your case a Callable might be the better choice. With a Callable, you just give it the array and let it calculate a partial value, if possible and await the Futures. For me, it's not really clear what you actually want to calculate though, thus I am taking a blind guess here.
You don't need a CountDownLatch nor two ExecutorServices - one EXS is enough.
If you really want to use a Runnable for this, you should implement some sort of synchronization, either with a concurrent list, Atomic variables, volatile or a lock.

Not expected result with multithread programming

I'm in troubles with a multithreading java program.
The program consists of a splitted sum of an array of integers with multithreads and than the total sum of the slices.
The problem is that computing time does not decrements by incrementing number of threads (I know that there is a limit number of threads after that the computing time is slower than less threads). I expect to see a decrease of execution time before that limit number of threads (benefits of parallel execution). I use the variable fake in run method to make time "readable".
public class MainClass {
private final int MAX_THREAD = 8;
private final int ARRAY_SIZE = 1000000;
private int[] array;
private SimpleThread[] threads;
private int numThread = 1;
private int[] sum;
private int start = 0;
private int totalSum = 0;
long begin, end;
int fake;
MainClass() {
fillArray();
for(int i = 0; i < MAX_THREAD; i++) {
threads = new SimpleThread[numThread];
sum = new int[numThread];
begin = (long) System.currentTimeMillis();
for(int j = 0 ; j < numThread; j++) {
threads[j] = new SimpleThread(start, ARRAY_SIZE/numThread, j);
threads[j].start();
start+= ARRAY_SIZE/numThread;
}
for(int k = 0; k < numThread; k++) {
try {
threads[k].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
end = (long) System.currentTimeMillis();
for(int g = 0; g < numThread; g++) {
totalSum+=sum[g];
}
System.out.printf("Result with %d thread-- Sum = %d Time = %d\n", numThread, totalSum, end-begin);
numThread++;
start = 0;
totalSum = 0;
}
}
public static void main(String args[]) {
new MainClass();
}
private void fillArray() {
array = new int[ARRAY_SIZE];
for(int i = 0; i < ARRAY_SIZE; i++)
array[i] = 1;
}
private class SimpleThread extends Thread{
int start;
int size;
int index;
public SimpleThread(int start, int size, int sumIndex) {
this.start = start;
this.size = size;
this.index = sumIndex;
}
public void run() {
for(int i = start; i < start+size; i++)
sum[index]+=array[i];
for(long i = 0; i < 1000000000; i++) {
fake++;
}
}
}
Unexpected Result Screenshot

As a general rule, you won't get a speedup from multi-threading if the "work" performed by each thread is less than the overheads of using the threads.
One of the overheads is the cost of starting a new thread. This is surprisingly high. Each time you start a thread the JVM needs to perform syscalls to allocate the thread stack memory segment and the "red zone" memory segment, and initialize them. (The default thread stack size is typically 500KB or 1MB.) Then there are further syscalls to create the native thread and schedule it.
In this example, you have 1,000,000 elements to sum and you divide this work among N threads. As N increases, the amount of work performed by each thread decreases.
It is not hard to see that the time taken to sum 1,000,000 elements is going to be less than the time needed to start 4 threads ... just based on counting the memory read and write operations. Then you need to take into account that the child threads are created one at a time by the parent thread.
If you do the analysis completely, it is clear that there is a point at which adding more threads actually slows down the computation even if you have enough to cores to run all threads in parallel. And your benchmarking seems to suggest1 that that point is around about 2 threads.
By the way, there is a second reason why you may not get as much speedup as you expect for a benchmark like this one. The "work" that each thread is doing is basically scanning a large array. Reading and writing arrays will generate requests to the memory system. Ideally, these requests will be satisfied by the (fast) on-chip memory caches. However, if you try to read / write an array that is larger than the memory cache, then many / most of those requests turn into (slow) main memory requests. Worse still, if you have N cores all doing this then you can find that the number of main memory requests is too much for the memory system to keep up .... and the threads slow down.
The bottom line is that multi-threading does not automatically make an application faster, and it certainly won't if you do it the wrong way.
In your example:
the amount of work per thread is too small compared with the overheads of creating and starting threads, and
memory bandwidth effects are likely to be a problem if can "factor out" the thread creation overheads
1 - I don't understand the point of the "fake" computation. It probably invalidates the benchmark, though it is possible that the JIT compiler optimizes it away.

Why sum is wrong sometimes?
Because ARRAY_SIZE/numThread may have fractional part (e.g. 1000000/3=333333.3333333333) which gets rounded down so start variable loses some hence the sum maybe less than 1000000 depending on the value of divisor.
Why the time taken is increasing as the number of threads increases?
Because in the run function of each thread you do this:
for(long i = 0; i < 1000000000; i++) {
fake++;
}
which I do not understand from your question :
I use the variable fake in run method to make time "readable".
what that means. But every thread needs to increment your fake variable 1000000000 times.

As a side note, for what you're trying to do there is the Fork/Join-Framework. It allows you easily split tasks recursively and implements an algorithm which will distribute your workload automatically.
There is a guide available here; it's example is very similar to your case, which boils down to a RecursiveTask like this:
class Adder extends RecursiveTask<Integer>
{
private int[] toAdd;
private int from;
private int to;
/** Add the numbers in the given array */
public Adder(int[] toAdd)
{
this(toAdd, 0, toAdd.length);
}
/** Add the numbers in the given array between the given indices;
internal constructor to split work */
private Adder(int[] toAdd, int fromIndex, int upToIndex)
{
this.toAdd = toAdd;
this.from = fromIndex;
this.to = upToIndex;
}
/** This is the work method */
#Override
protected Integer compute()
{
int amount = to - from;
int result = 0;
if (amount < 500)
{
// base case: add ints and return the result
for (int i = from; i < to; i++)
{
result += toAdd[i];
}
}
else
{
// array too large: split it into two parts and distribute the actual adding
int newEndIndex = from + (amount / 2);
Collection<Adder> invokeAll = invokeAll(Arrays.asList(
new Adder(toAdd, from, newEndIndex),
new Adder(toAdd, newEndIndex, to)));
for (Adder a : invokeAll)
{
result += a.invoke();
}
}
return result;
}
}
To actually run this, you can use
RecursiveTask adder = new Adder(fillArray(ARRAY_LENGTH));
int result = ForkJoinPool.commonPool().invoke(adder);

Starting threads is heavy and you'll only see the benefit of it on large processes that don't compete for the same resources (none of it applies here).

Java Thread instantiation with incorrect variables

I'm trying to use multithreading to break down a large block of processing data into smaller chunks.
The problem I am having is that my threads aren't running against their specified portion of the data space.
For example:
thread-2 should process range of 0 - 1000
thread-3 should process range of 1001 - 2000
When I call my new threads back to back i get:
thread-2 = 0 - 1000
thread-3 = 0 - 1000
When I add a Thread sleep(3000) in between the two thread calls i get:
thread-2 = 0 - 1000
thread-3 = 0 - 2000
I'm not sure what I'm doing wrong and would really appreciate some guidance.
Note on Snippits below, I abbreviated the above numbers actual call range in example below is
1,000,000 - 1,001,000 and 1,001,001 - 1,002,000
Snippet from main method detailing thread call:
try {
int start = 1000000;
int end = 1001000;
new Thread(new MyThread(start, end, PUBLIC_INVALID_LIST)).start();
Thread.sleep(3000);
start = 1001001;
end = 1002000;
new Thread(new MyThread(start, end, PUBLIC_INVALID_LIST)).start();
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
Snippet from MyThread which extends Thread
This details how I am passing params from the main method through to the run() method:
//variables to pass from constructor to run()
private int startIndex;
private int endIndex;
private ArrayList PUBLIC_INVALID_LIST;
MyThread(int startIndex, int endIndex, ArrayList PUBLIC_INVALID_LIST) {
this.startIndex = startIndex;
this.endIndex = endIndex;
this.PUBLIC_INVALID_LIST = PUBLIC_INVALID_LIST;
}//end of initializer
public void run() {

still not sure what the issue was with the variable pass in, but I got an alternate pass in method to work using int[]
int sizeSet = (max /100000) + 1;//max is highest path_id value
int[] start_range = new int[sizeSet];
int[] end_range = new int[sizeSet];
start_range[0] = 1;//first possible path_id
end_range[0] = 100000;//end point of first possible range
int rangeIndex = 1;//create counter for while loop
while(rangeIndex < sizeSet){
start_range[rangeIndex] = start_range[rangeIndex - 1] + 100000;
end_range[rangeIndex] = end_range[rangeIndex - 1] + 100000;
++ rangeIndex;
}//end range setting while block
rangeIndex = 0;//reset index counter
while(rangeIndex < sizeSet){
new Thread(new MyThread(start_range[rangeIndex], end_range[rangeIndex], PUBLIC_INVALID_LIST)).start();
++ rangeIndex;
}

Multithreading only .4 of a second faster?

so for my programming class we have to do the following:
Fill an integer array with 5 million integers ranging from 0-9.
Then find the number of times each number (0-9) occurs and display this.
We have to measure the time it takes to count the occurences for both single threaded, and multi-threaded. Currently I average 9.3ms for single threaded, and 8.9 ms multithreaded with 8 threads on my 8 core cpu, why is this?
Currently for multithreading I have one array filled with numbers and am calculating lower and upper bounds for each thread to count occurences. here is my current attempt:
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
threads[i].join();
}
}
Could anyone shed some light? Cheers.

You are essentially doing all the work sequentially because each thread you create you immediately join it.
Move the threads[i].join() outside the main construction loop into it's own loop. While you're at it you should probably also start all of the threads outside of the loop as starting them while new threads are still being created is not a good idea because creating threads takes time.
class ThreadTester {
private final int threadCount;
private final int numberCount;
int[] numbers = new int[5_000_000];
AtomicIntegerArray occurences;
Thread[] threads;
AtomicLong milliseconds = new AtomicLong();
public ThreadTester(int threadCount, int numberCount) {
this.threadCount = threadCount;
this.numberCount = numberCount;
occurences = new AtomicIntegerArray(numberCount);
threads = new Thread[threadCount];
Random r = new Random();
for (int i = 0; i < numbers.length; i++) {
numbers[i] = r.nextInt(numberCount);
}
}
public void createThreads() throws InterruptedException {
final int divisionSize = numbers.length / threadCount;
for (int i = 0; i < threads.length; i++) {
final int lower = (i * divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
#Override
public void run() {
long start = System.nanoTime();
for (int i = lower; i <= upper; i++) {
occurences.addAndGet(numbers[i], 1);
}
long end = System.nanoTime();
milliseconds.addAndGet(end - start);
}
});
}
}
private void startThreads() {
for (Thread thread : threads) {
thread.start();
}
}
private void finishThreads() throws InterruptedException {
for (Thread thread : threads) {
thread.join();
}
}
public long test() throws InterruptedException {
createThreads();
startThreads();
finishThreads();
return milliseconds.get();
}
}
public void test() throws InterruptedException {
for (int threads = 1; threads < 50; threads++) {
ThreadTester tester = new ThreadTester(threads, 10);
System.out.println("Threads=" + threads + " ns=" + tester.test());
}
}
Note that even here the fastest solution is using one thread but you can clearly see that an even number of threads does it quicker as I am using an i5 which has 2 cores but works as 4 via hyperthreading.
Interestingly though - as suggested by #biziclop - removing all contention between threads via the occurrences by giving each thread its own `occurrences array we get a more expected result:

The other answers all explored the immediate problems with your code, I'll give you a different angle: one that's about design of multi-threading in general.
The idea of parallel computing speeding up calculations depends on the assumption that the small bits you broke the problem up into can indeed be run in parallel, independently of each other.
And at first glance, your problem is exactly like that, chop the input range up into 8 equal parts, fire up 8 threads and off they go.
There is a catch though:
occurences[numbers[i]]++;
The occurences array is a resource shared by all threads, and therefore you must control access to it to ensure correctness: either by explicit synchronization (which is slow) or something like an AtomicIntegerArray. But the Atomic* classes are only really fast if access to them is rarely contested. And in your case access will be contested a lot, because most of what your inner loop does is incrementing the number of occurrences.
So what can you do?
The problem is caused partly by the fact that occurences is such a small structure (an array with 10 elements only, regardless of input size), threads will continuously try to update the same element. But you can turn that to your advantage: make all the threads keep their own separate tally, and when they all finished, just add up their results. This will add a small, constant overhead to the end of the process but will make the calculations go truly parallel.

The join method allows one thread to wait for the completion of another, so the second thread will start only after the first will finish.
Join each thread after you started all threads.
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
}
for(int i = 0; i < threads.length; i++) {
threads[i].join();
}
}
Also there seem to be a race condition in code at occurences[numbers[i]]++
So most probably if you update the code and use more threads the output wouldn't be correct. You should use an AtomicIntegerArray: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicIntegerArray.html

Use an ExecutorService with Callable and invoke all tasks then you can safely aggregate them. Also use TimeUnit for elapsing time manipulations (sleep, joining, waiting, convertion, ...)
Start by defining the task with his input/output :
class Task implements Callable<Task> {
// input
int[] source;
int sliceStart;
int sliceEnd;
// output
int[] occurences = new int[10];
String runner;
long elapsed = 0;
Task(int[] source, int sliceStart, int sliceEnd) {
this.source = source;
this.sliceStart = sliceStart;
this.sliceEnd = sliceEnd;
}
#Override
public Task call() {
runner = Thread.currentThread().getName();
long start = System.nanoTime();
try {
compute();
} finally {
elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
}
return this;
}
void compute() {
for (int i = sliceStart; i < sliceEnd; i++) {
occurences[source[i]]++;
}
}
}
Then let's define some variable to manage parameters:
// Parametters
int size = 5_000_000;
int parallel = Runtime.getRuntime().availableProcessors();
int slices = parallel;
Then generates random input:
// Generated source
int[] source = new int[size];
ThreadLocalRandom random = ThreadLocalRandom.current();
for (int i = 0; i < source.length; i++) source[i] = random.nextInt(10);
Start timing total computation and prepare tasks:
long start = System.nanoTime();
// Prepare tasks
List<Task> tasks = new ArrayList<>(slices);
int sliceSize = source.length / slices;
for (int sliceStart = 0; sliceStart < source.length;) {
int sliceEnd = Math.min(sliceStart + sliceSize, source.length);
Task task = new Task(source, sliceStart, sliceEnd);
tasks.add(task);
sliceStart = sliceEnd;
}
Executes all task on threading configuration (don't forget to shutdown it !):
// Execute tasks
ExecutorService executor = Executors.newFixedThreadPool(parallel);
try {
executor.invokeAll(tasks);
} finally {
executor.shutdown();
}
Then task have been completed, just aggregate data:
// Collect data
int[] occurences = new int[10];
for (Task task : tasks) {
for (int i = 0; i < occurences.length; i++) {
occurences[i] += task.occurences[i];
}
}
Finally you can output computation result:
// Display result
long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.printf("Computation done in %tT.%<tL%n", calendar(elapsed));
System.out.printf("Results: %s%n", Arrays.toString(occurences));
You can also output partial computations:
// Print debug output
int idxSize = (String.valueOf(size).length() * 4) / 3;
String template = "Slice[%," + idxSize + "d-%," + idxSize + "d] computed in %tT.%<tL by %s: %s%n";
for (Task task : tasks) {
System.out.printf(template, task.sliceStart, task.sliceEnd, calendar(task.elapsed), task.runner, Arrays.toString(task.occurences));
}
Which gives on my workstation:
Computation done in 00:00:00.024
Results: [500159, 500875, 500617, 499785, 500017, 500777, 498394, 498614, 499498, 501264]
Slice[ 0-1 250 000] computed in 00:00:00.013 by pool-1-thread-1: [125339, 125580, 125338, 124888, 124751, 124608, 124463, 124351, 125023, 125659]
Slice[1 250 000-2 500 000] computed in 00:00:00.014 by pool-1-thread-2: [124766, 125423, 125111, 124756, 125201, 125695, 124266, 124405, 125083, 125294]
Slice[2 500 000-3 750 000] computed in 00:00:00.013 by pool-1-thread-3: [124903, 124756, 124934, 125640, 124954, 125452, 124556, 124816, 124737, 125252]
Slice[3 750 000-5 000 000] computed in 00:00:00.014 by pool-1-thread-4: [125151, 125116, 125234, 124501, 125111, 125022, 125109, 125042, 124655, 125059]
the small trick to convert elapsed millis in a stopwatch calendar:
static final TimeZone UTC= TimeZone.getTimeZone("UTC");
public static Calendar calendar(long millis) {
Calendar calendar = Calendar.getInstance(UTC);
calendar.setTimeInMillis(millis);
return calendar;
}

Shuffling array in multiple threads

I have an array of size N. I want to shuffle its elements in 2 threads (or more). Each thread should work with it's own part of the array.
Lets say, the first thread shuffles elements from 0 to K, and the second thread shuffles elements from K to N (where 0 < K < N). So, it can look like this:
//try-catch stuff is ommited
static void shuffle(int[] array) {
Thread t1 = new ShufflingThread(array, 0, array.length / 2);
Thread t2 = new ShufflingThread(array, array.length / 2, array.length);
t1.start();
t2.start();
t1.join();
t2.join();
}
public static void main(String[] args) {
int array = generateBigSortedArray();
shuffle(array);
}
Are there any guaranties from JVM that I will see changes in the array from the main method after such shuffling?
How should I implement ShufflingThread (or, how should I run it, maybe within a synchronized block or whatever else) in order to get such guaranties?

The join() calls are sufficient to ensure memory coherency: when t1.join() returns, the main thread "sees" whatever operations thread t1 did on the array.
Also, Java guarantees that there is no word-tearing on arrays: distinct threads may use distinct elements of the same array without needing synchronization.

I think this is a good exercise in thread control, where (1) a job can be broken up into several parts (2) the parts can run independently and asynchronously and (3) A master thread monitors the completion of all such jobs in their respective threads. All you need is for this master thread to wait() and be notify()-ed jobCount times, every time a thread completes execution. Here is a sample code that you can compile/run. Uncomment the println()'s to see more.
Notes: [1] JVM doesnt guarantee the order of execution of the threads [2] You need to synchronize when your master thread access the big array, to not have corrupted data....
public class ShufflingArray {
private int nPart = 4, // Count of jobs distributed, resource dependent
activeThreadCount, // Currently active, monitored with notify
iRay[]; // Array the threads will work on
public ShufflingArray (int[] a) {
iRay = a;
printArray (a);
}
private void printArray (int[] ia) {
for (int i = 0 ; i < ia.length ; i++)
System.out.print (" " + ((ia[i] < 10) ? " " : "") + ia[i]);
System.out.println();
}
public void shuffle () {
int startNext = 0, pLen = iRay.length / nPart; // make a bunch of parts
for (int i = 0 ; i < nPart ; i++, activeThreadCount++) {
int start = (i == 0) ? 0 : startNext,
stop = start + pLen;
startNext = stop;
if (i == (nPart-1))
stop = iRay.length;
new Thread (new ShuffleOnePart (start, stop, (i+1))).start();
}
waitOnShufflers (0); // returns when activeThreadCount == 0
printArray (iRay);
}
synchronized private void waitOnShufflers (int bump) {
if (bump == 0) {
while (activeThreadCount > 0) {
// System.out.println ("Waiting on " + activeThreadCount + " threads");
try {
wait();
} catch (InterruptedException intex) {
}}} else {
activeThreadCount += bump;
notify();
}}
public class ShuffleOnePart implements Runnable {
private int startIndex, stopIndex; // Operate on global array iRay
public ShuffleOnePart (int i, int j, int k) {
startIndex = i;
stopIndex = j;
// System.out.println ("Shuffler part #" + k);
}
// Suppose shuffling means interchanging the first and last pairs
public void run () {
int tmp = iRay[startIndex+1];
iRay[startIndex+1] = iRay[startIndex]; iRay[startIndex] = tmp;
tmp = iRay[stopIndex-1];
iRay[stopIndex-1] = iRay[stopIndex-2]; iRay[stopIndex-2] = tmp;
try { // Lets imagine it needs to do something else too
Thread.sleep (157);
} catch (InterruptedException iex) { }
waitOnShufflers (-1);
}}
public static void main (String[] args) {
int n = 25, ia[] = new int[n];
for (int i = 0 ; i < n ; i++)
ia[i] = i+1;
new ShufflingArray(ia).shuffle();
}}

Thread.start() and Thread.join() are enough to give you the happens-before relationships between the array initialisation, its hand-off to the threads and then the read back in the main method.
Actions that cause happens-before are documented here.
As mentioned elsewhere, ForkJoin is very well suited to this kind of divide-and-conquer algorithm and frees you from a lot of the book-keeping that you'd otherwise need to implement.

using ExecutorService from java.util.Concurrent package along with Callable Task to return the part of the array's from each thread run, once both thread are completed is another way to do for consistent behaviour.

Well, they can't BOTH be accessing the same array and if you use a lock, or a mutex or any other synchronizing mechanism, you kinda lose the power of the threads (since one will have to wait for another, either to finish the shuffling or finish a bit of the shuffling).
Why don't you just divide the array in half, give each thread its bit of the array and then merge the two arrays?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.