How to find a missing number with a range of millions of numbers in an array where duplicates are present? - java

Im asking this question in relation to this question posted a few months ago
Currently, in the app I am working, I get series of numbers where numbers may be missing and duplicated but are ordered in ascending manner.
There were two problems.
If there were not duplicates, finding the missing number was pretty easy using the method suggested in the accepted answer of the mention question.
But, if there are duplicates, that approach doesnt work anymore.
How can I solve the problem? No logic seems to work out. And even if it would(using loop), it wouldn't be efficient.
NOTE: I also searched for some libraries but couldn't find any.

As far as I know, there is no way to detect a missing number in a list except for looping through.
If your array is sorted, it should look like something like this:
[1,2,3,3,4,6]
So, this code should do the work:
int getMissingNumber(int[] numbers){
for (int i=0; i<numbers.length -1; i++){
int current = numbers[i];
int next = numbers[i+1];
if(next - current > 1){
return current + 1;
}
}
return -1;
}
Other than this, there is the option of changing the Array to the Set and then back to Array and then, use the previous approach. Be sure to use LinkedHashSet to preserve insertion order. But I don't know if it would be any faster.

I just tried to see if breaking down the tasks as much as I can using Fork Join as a fun exercise to get to know the library better (also because I thought dividing the task down into smaller tasks and processing it parallelly would take less time) and contrasted it with a simple for loop.
public class misc {
public void getMissingNumbers(int[] numbers){
for (int i=0; i<numbers.length -1; i++){
int current = numbers[i];
int next = numbers[i+1];
if(current+1 != next){
System.out.println("Problem! - "+current+" "+next);
}
}
}
public static void main(String []args){
int[] range = IntStream.rangeClosed(1, 50_000_000).toArray();
int index = 50000;
range[index] = range[index-1]; //duplicate
index = 390;
range[index] = range[index-1];
index = 500390;
range[index] = range[index-1];
index = 2500390;
range[index] = range[index-1];
ZonedDateTime now = ZonedDateTime.now();
misc m = new misc();
m.getMissingNumbers(range);
System.out.printf("%s exec time: %dms\n",
m.getClass().getSimpleName(),
ChronoUnit.MILLIS.between(now, ZonedDateTime.now()));
now = ZonedDateTime.now();
ForkJoinPool forkJoinPool = ForkJoinPool.commonPool();
breakDownRecursively bdr = new breakDownRecursively(range);
forkJoinPool.invoke(bdr);
System.out.printf("%s exec time: %dms\n",
bdr.getClass().getSimpleName(),
ChronoUnit.MILLIS.between(now, ZonedDateTime.now()));
}
}
class breakDownRecursively extends RecursiveAction {
private final int[] arr;
private final ArrayList<Integer> arrlst = new ArrayList<>();
public breakDownRecursively(int[] arr) {
this.arr = arr;
}
public void compute() {
int n = arr.length;
if (arr.length < 2) return;
int mid = arr.length / 2;
int[] left = new int[mid];
System.arraycopy(arr, 0, left, 0, mid);
int[] right = new int[arr.length - mid];
System.arraycopy(arr, mid, right, 0, arr.length - mid);
invokeAll(new breakDownRecursively(left), new breakDownRecursively(right));
compare(left, right);
}
private void compare(int[] left, int[] right) {
if (left.length == 1 && right.length == 1) {
if (left[0]+1 != right[0]) {
//System.out.println("Problem! - "+left[0]+" "+right[0]);
}
}
}
}
Output:
Problem! - 390 390
Problem! - 390 392
Problem! - 50000 50000
Problem! - 50000 50002
Problem! - 500390 500390
Problem! - 500390 500392
Problem! - 2500390 2500390
Problem! - 2500390 2500392
misc exec time: 60ms
Problem! - 390 392
Problem! - 500390 500392
Problem! - 2500390 2500392
breakDownRecursively exec time: 2435ms
I suppose I probably made a mistake somewhere during implementation of the fork join, but at the very least you should see that a for loop isn't THAT bad.
and when I used Runnable:
int mid = range.length/2;
int[] half1 = new int[mid+1];
System.arraycopy(range, 0, half1, 0, mid+1);
int[] half2 = new int[mid];
System.arraycopy(range, mid, half2, 0, range.length - mid);
RunnableTask r1 = new RunnableTask(half1);
RunnableTask r2 = new RunnableTask(half2);
now = ZonedDateTime.now();
Thread t1 = new Thread(r1);
Thread t2 = new Thread(r2);
t1.start();
t2.start();
t1.join();
t2.join();
System.out.printf("%s exec time: %dms\n",
r1.getClass().getSimpleName(),
ChronoUnit.MILLIS.between(now, ZonedDateTime.now()));
class RunnableTask implements Runnable{
private final int[] arr;
public RunnableTask(int[] arr) {
this.arr = arr;
}
#Override
public void run() {
// TODO Auto-generated method stub
for (int i=0; i<arr.length -1; i++){
int current = arr[i];
int next = arr[i+1];
if(current+1 != next){
System.out.println("Problem! - "+current+" "+next);
}
}
}
}
Output:
Problem! - 390 390
Problem! - 390 392
Problem! - 50000 50000
Problem! - 50000 50002
Problem! - 500390 500390
Problem! - 500390 500392
Problem! - 2500390 2500390
Problem! - 2500390 2500392
RunnableTask exec time: 49ms
Only slightly better than a for loop.

A binary search benefits from being able to cut a problem space in half, and then eliminating one of the halves. In this case, any half that contains both a missing value and a duplicate is indistinguishable from one that doesn't, no matter how many additional duplicates exist, so you'd end up having to process both halves.
Millions of integer comparisons requires very little compute time. A linear solution would still be very fast, and in this case, is as efficient as you can be on a worst-case basis.
I ran the code multiple times below on my desktop, and came up with an average of about 5ms to process an array of 10 million elements, and in all cases, it found the results in under 10ms.
public class Millions {
public static int[] fillArray(int size) {
int[] ar=new int[size];
int randomPos=(int)(Math.random()*size);
System.out.println("Placing missing value at position " + randomPos);
int nextNum=1;
for (int i=0; i<size; i++) {
if (i==randomPos) {
nextNum+=2;
} else {
if (Math.random() > 0.999995) {
nextNum++;
System.out.println("Placing duplicate value at position " + i);
}
}
ar[i] = nextNum;
}
return ar;
}
public static int missingValue(int[] ar) {
for (int i=1; i<ar.length; i++) {
if (ar[i]-ar[i-1]==2) return i;
}
return -1;
}
public static void main(String[] args) {
int SIZE=10000000;
int[] ar=fillArray(SIZE);
long start=System.currentTimeMillis();
int missing=missingValue(ar);
long duration=System.currentTimeMillis()-start;
if (missing<0) {
System.out.println("No missing value found.");
} else {
System.out.println("Missing value = " + missing);
}
System.out.println("Duration : " + duration + " ms");
}
}

Related

How can I build this tree with O(n) space complexity?

The Problem
Given a set of integers, find a subset of those integers which sum to 100,000,000.
Solution
I am attempting to build a tree containing all the combinations of the given set along with the sum. For example, if the given set looked like 0,1,2, I would build the following tree, checking the sum at each node:
{}
{} {0}
{} {1} {0} {0,1}
{} {2} {1} {1,2} {0} {2} {0,1} {0,1,2}
Since I keep both the array of integers at each node and the sum, I should only need the bottom (current) level of the tree in memory.
Issues
My current implementation will maintain the entire tree in memory and therefore uses way too much heap space.
How can I change my current implementation so that the GC will take care of my upper tree levels?
(At the moment I am just throwing a RuntimeException when I have found the target sum but this is obviously just for playing around)
public class RecursiveSolver {
static final int target = 100000000;
static final int[] set = new int[]{98374328, 234234123, 2341234, 123412344, etc...};
Tree initTree() {
return nextLevel(new Tree(null), 0);
}
Tree nextLevel(Tree currentLocation, int current) {
if (current == set.length) { return null; }
else if (currentLocation.sum == target) throw new RuntimeException(currentLocation.getText());
else {
currentLocation.left = nextLevel(currentLocation.copy(), current + 1);
Tree right = currentLocation.copy();
right.value = add(currentLocation.value, set[current]);
right.sum = currentLocation.sum + set[current];
currentLocation.right = nextLevel(right, current + 1);
return currentLocation;
}
}
int[] add(int[] array, int digit) {
if (array == null) {
return new int[]{digit};
}
int[] newValue = new int[array.length + 1];
for (int i = 0; i < array.length; i++) {
newValue[i] = array[i];
}
newValue[array.length] = digit;
return newValue;
}
public static void main(String[] args) {
RecursiveSolver rs = new RecursiveSolver();
Tree subsetTree = rs.initTree();
}
}
class Tree {
Tree left;
Tree right;
int[] value;
int sum;
Tree(int[] value) {
left = null;
right = null;
sum = 0;
this.value = value;
if (value != null) {
for (int i = 0; i < value.length; i++) sum += value[i];
}
}
Tree copy() {
return new Tree(this.value);
}
}
The time and space you need for building the tree here is absolutely nothing at all.
The reason is because, if you're given
A node of the tree
The depth of the node
The ordered array of input elements
you can simply compute its parent, left, and right children nodes using O(1) operations. And you have access to each of those things while you're traversing the tree, so you don't need anything else.
The problem is NP-complete.
If you really want to improve performance, then you have to forget about your tree implementation. You either have to just generate all the subsets and sum them up or to use dynamic programming.
The choice depends on the number of elements to sum and the sum you want to achieve. You know the sum it is 100,000,000, bruteforce exponential algorithm runs in O(2^n * n) time, so for number below 22 it makes sense.
In python you can achieve this with a simple:
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
You can significantly improve this complexity (sacrificing the memory) by using meet in the middle technique (read the wiki article). This will decrease it to O(2^(n/2)), which means that it will perform better than DP solution for n <~ 53
After thinking more about erip's comments, I realized he is correct - I shouldn't be using a tree to implement this algorithm.
Brute force usually is O(n*2^n) because there are n additions for 2^n subsets. Because I only do one addition per node, the solution I came up with is O(2^n) where n is the size of the given set. Also, this algorithm is only O(n) space complexity. Since the number of elements in the original set in my particular problem is small (around 25) O(2^n) complexity is not too much of a problem.
The dynamic solution to this problem is O(t*n) where t is the target sum and n is the number of elements. Because t is very large in my problem, the dynamic solution ends up with a very long runtime and a high memory usage.
This completes my particular solution in around 311 ms on my machine, which is a tremendous improvement over the dynamic programming solutions I have seen for this particular class of problem.
public class TailRecursiveSolver {
public static void main(String[] args) {
final long starttime = System.currentTimeMillis();
try {
step(new Subset(null, 0), 0);
}
catch (RuntimeException ex) {
System.out.println(ex.getMessage());
final long endtime = System.currentTimeMillis();
System.out.println(endtime - starttime);
}
}
static final int target = 100000000;
static final int[] set = new int[]{ . . . };
static void step(Subset current, int counter) {
if (current.sum == target) throw new RuntimeException(current.getText());
else if (counter == set.length) {}
else {
step(new Subset(add(current.subset, set[counter]), current.sum + set[counter]), counter + 1);
step(current, counter + 1);
}
}
static int[] add(int[] array, int digit) {
if (array == null) {
return new int[]{digit};
}
int[] newValue = new int[array.length + 1];
for (int i = 0; i < array.length; i++) {
newValue[i] = array[i];
}
newValue[array.length] = digit;
return newValue;
}
}
class Subset {
int[] subset;
int sum;
Subset(int[] subset, int sum) {
this.subset = subset;
this.sum = sum;
}
public String getText() {
String ret = "";
for (int i = 0; i < (subset == null ? 0 : subset.length); i++) {
ret += " + " + subset[i];
}
if (ret.startsWith(" ")) {
ret = ret.substring(3);
ret = ret + " = " + sum;
} else ret = "null";
return ret;
}
}
EDIT -
The above code still runs in O(n*2^n) time - since the add method runs in O(n) time. This following code will run in true O(2^n) time, and is MUCH more performant, completing in around 20 ms on my machine.
It is limited to sets less than 64 elements due to storing the current subset as the bits in a long.
public class SubsetSumSolver {
static boolean found = false;
static final int target = 100000000;
static final int[] set = new int[]{ . . . };
public static void main(String[] args) {
step(0,0,0);
}
static void step(long subset, int sum, int counter) {
if (sum == target) {
found = true;
System.out.println(getText(subset, sum));
}
else if (!found && counter != set.length) {
step(subset + (1 << counter), sum + set[counter], counter + 1);
step(subset, sum, counter + 1);
}
}
static String getText(long subset, int sum) {
String ret = "";
for (int i = 0; i < 64; i++) if((1 & (subset >> i)) == 1) ret += " + " + set[i];
if (ret.startsWith(" ")) ret = ret.substring(3) + " = " + sum;
else ret = "null";
return ret;
}
}
EDIT 2 -
Here is another version uses a meet in the middle attack, along with a little bit shifting in order to reduce the complexity from O(2^n) to O(2^(n/2)).
If you want to use this for sets with between 32 and 64 elements, you should change the int which represents the current subset in the step function to a long although performance will obviously drastically decrease as the set size increases. If you want to use this for a set with odd number of elements, you should add a 0 to the set to make it even numbered.
import java.util.ArrayList;
import java.util.List;
public class SubsetSumMiddleAttack {
static final int target = 100000000;
static final int[] set = new int[]{ ... };
static List<Subset> evens = new ArrayList<>();
static List<Subset> odds = new ArrayList<>();
static int[][] split(int[] superSet) {
int[][] ret = new int[2][superSet.length / 2];
for (int i = 0; i < superSet.length; i++) ret[i % 2][i / 2] = superSet[i];
return ret;
}
static void step(int[] superSet, List<Subset> accumulator, int subset, int sum, int counter) {
accumulator.add(new Subset(subset, sum));
if (counter != superSet.length) {
step(superSet, accumulator, subset + (1 << counter), sum + superSet[counter], counter + 1);
step(superSet, accumulator, subset, sum, counter + 1);
}
}
static void printSubset(Subset e, Subset o) {
String ret = "";
for (int i = 0; i < 32; i++) {
if (i % 2 == 0) {
if ((1 & (e.subset >> (i / 2))) == 1) ret += " + " + set[i];
}
else {
if ((1 & (o.subset >> (i / 2))) == 1) ret += " + " + set[i];
}
}
if (ret.startsWith(" ")) ret = ret.substring(3) + " = " + (e.sum + o.sum);
System.out.println(ret);
}
public static void main(String[] args) {
int[][] superSets = split(set);
step(superSets[0], evens, 0,0,0);
step(superSets[1], odds, 0,0,0);
for (Subset e : evens) {
for (Subset o : odds) {
if (e.sum + o.sum == target) printSubset(e, o);
}
}
}
}
class Subset {
int subset;
int sum;
Subset(int subset, int sum) {
this.subset = subset;
this.sum = sum;
}
}

Java: how ot optimize sum of big array

I try to solve one problem on codeforces. And I get Time limit exceeded judjment. The only time consuming operation is calculation sum of big array. So I've tried to optimize it, but with no result.
What I want: Optimize the next function:
//array could be Integer.MAX_VALUE length
private long canocicalSum(int[] array) {
int sum = 0;
for (int i = 0; i < array.length; i++)
sum += array[i];
return sum;
}
Question1 [main]: Is it possible to optimize canonicalSum?
I've tried: to avoid operations with very big numbers. So i decided to use auxiliary data. For instance, I convert array1[100] to array2[10], where array2[i] = array1[i] + array1[i+1] + array1[i+9].
private long optimizedSum(int[] array, int step) {
do {
array = sumItr(array, step);
} while (array.length != 1);
return array[0];
}
private int[] sumItr(int[] array, int step) {
int length = array.length / step + 1;
boolean needCompensation = (array.length % step == 0) ? false : true;
int aux[] = new int[length];
for (int i = 0, auxSum = 0, auxPointer = 0; i < array.length; i++) {
auxSum += array[i];
if ((i + 1) % step == 0) {
aux[auxPointer++] = auxSum;
auxSum = 0;
}
if (i == array.length - 1 && needCompensation) {
aux[auxPointer++] = auxSum;
}
}
return aux;
}
Problem: But it appears that canonicalSum is ten times faster than optimizedSum. Here my test:
#Test
public void sum_comparison() {
final int ARRAY_SIZE = 100000000;
final int STEP = 1000;
int[] array = genRandomArray(ARRAY_SIZE);
System.out.println("Start canonical Sum");
long beg1 = System.nanoTime();
long sum1 = canocicalSum(array);
long end1 = System.nanoTime();
long time1 = end1 - beg1;
System.out.println("canon:" + TimeUnit.MILLISECONDS.convert(time1, TimeUnit.NANOSECONDS) + "milliseconds");
System.out.println("Start optimizedSum");
long beg2 = System.nanoTime();
long sum2 = optimizedSum(array, STEP);
long end2 = System.nanoTime();
long time2 = end2 - beg2;
System.out.println("custom:" + TimeUnit.MILLISECONDS.convert(time2, TimeUnit.NANOSECONDS) + "milliseconds");
assertEquals(sum1, sum2);
assertTrue(time2 <= time1);
}
private int[] genRandomArray(int size) {
int[] array = new int[size];
Random random = new Random();
for (int i = 0; i < array.length; i++) {
array[i] = random.nextInt();
}
return array;
}
Question2: Why optimizedSum works slower than canonicalSum?
As of Java 9, vectorisation of this operation has been implemented but disabled, based on benchmarks measuring the all-in cost of the code plus its compilation. Depending on your processor, this leads to the relatively entertaining result that if you introduce artificial complications into your reduction loop, you can trigger autovectorisation and get a quicker result! So the fastest code, for now, assuming numbers small enough not to overflow, is:
public int sum(int[] data) {
int value = 0;
for (int i = 0; i < data.length; ++i) {
value += 2 * data[i];
}
return value / 2;
}
This isn't intended as a recommendation! This is more to illustrate that the speed of your code in Java is dependent on the JIT, its trade-offs, and its bugs/features in any given release. Writing cute code to optimise problems like this is at best vain and will put a shelf life on the code you write. For instance, had you manually unrolled a loop to optimise for an older version of Java, your code would be much slower in Java 8 or 9 because this decision would completely disable autovectorisation. You'd better really need that performance to do it.
Question1 [main]: Is it possible to optimize canonicalSum?
Yes, it is. But I have no idea with what factor.
Some things you can do are:
use the parallel pipelines introduced in Java 8. The processor has instruction for doing parallel sum of 2 arrays (and more). This can be observed in Octave when you sum two vectors with ".+" (parallel addition) or "+" it is way faster than using a loop.
use multithreading. You could use a divide and conquer algorithm. Maybe like this:
divide the array into 2 or more
keep dividing recursively until you get an array with manageable size for a thread.
start computing the sum for the sub arrays (divided arrays) with separate threads.
finally add the sum generated (from all the threads) for all sub arrays together to produce final result
maybe unrolling the loop would help a bit, too. By loop unrolling I mean reducing the steps the loop will have to make by doing more operations in the loop manually.
An example from http://en.wikipedia.org/wiki/Loop_unwinding :
for (int x = 0; x < 100; x++)
{
delete(x);
}
becomes
for (int x = 0; x < 100; x+=5)
{
delete(x);
delete(x+1);
delete(x+2);
delete(x+3);
delete(x+4);
}
but as mentioned this must be done with caution and profiling since the JIT could do this kind of optimizations itself probably.
A implementation for mathematical operations for the multithreaded approach can be seen here.
The example implementation with the Fork/Join framework introduced in java 7 that basically does what the divide and conquer algorithm above does would be:
public class ForkJoinCalculator extends RecursiveTask<Double> {
public static final long THRESHOLD = 1_000_000;
private final SequentialCalculator sequentialCalculator;
private final double[] numbers;
private final int start;
private final int end;
public ForkJoinCalculator(double[] numbers, SequentialCalculator sequentialCalculator) {
this(numbers, 0, numbers.length, sequentialCalculator);
}
private ForkJoinCalculator(double[] numbers, int start, int end, SequentialCalculator sequentialCalculator) {
this.numbers = numbers;
this.start = start;
this.end = end;
this.sequentialCalculator = sequentialCalculator;
}
#Override
protected Double compute() {
int length = end - start;
if (length <= THRESHOLD) {
return sequentialCalculator.computeSequentially(numbers, start, end);
}
ForkJoinCalculator leftTask = new ForkJoinCalculator(numbers, start, start + length/2, sequentialCalculator);
leftTask.fork();
ForkJoinCalculator rightTask = new ForkJoinCalculator(numbers, start + length/2, end, sequentialCalculator);
Double rightResult = rightTask.compute();
Double leftResult = leftTask.join();
return leftResult + rightResult;
}
}
Here we develop a RecursiveTask splitting an array of doubles until
the length of a subarray doesn't go below a given threshold. At this
point the subarray is processed sequentially applying on it the
operation defined by the following interface
The interface used is this:
public interface SequentialCalculator {
double computeSequentially(double[] numbers, int start, int end);
}
And the usage example:
public static double varianceForkJoin(double[] population){
final ForkJoinPool forkJoinPool = new ForkJoinPool();
double total = forkJoinPool.invoke(new ForkJoinCalculator(population, new SequentialCalculator() {
#Override
public double computeSequentially(double[] numbers, int start, int end) {
double total = 0;
for (int i = start; i < end; i++) {
total += numbers[i];
}
return total;
}
}));
final double average = total / population.length;
double variance = forkJoinPool.invoke(new ForkJoinCalculator(population, new SequentialCalculator() {
#Override
public double computeSequentially(double[] numbers, int start, int end) {
double variance = 0;
for (int i = start; i < end; i++) {
variance += (numbers[i] - average) * (numbers[i] - average);
}
return variance;
}
}));
return variance / population.length;
}
If you want to add N numbers then the runtime is O(N). So in this aspect your canonicalSum can not be "optimized".
What you can do to reduce runtime is make the summation parallel. I.e. break the array to parts and pass it to separate threads and in the end sum the result returned by each thread.
Update: This implies multicore system but there is a java api to get the number of cores

Using ExecutorService with a multithreaded version of Merge Sort

I am working on a homework problem where I have to create a Multithreaded version of Merge Sort. I was able to implement it, but I am not able to stop the creation of threads. I looked into using an ExecutorService to limit the creation of threads but I cannot figure out how to implement it within my current code.
Here is my current Multithreaded Merge Sort. We are required to implement a specific strategy pattern so that is where my sort() method comes from.
#Override
public int[] sort(int[] list) {
int array_size = list.length;
list = msort(list, 0, array_size-1);
return list;
}
int[] msort(int numbers[], int left, int right) {
final int mid;
final int leftRef = left;
final int rightRef = right;
final int array[] = numbers;
if (left<right) {
mid = (right + left) / 2;
//new thread
Runnable r1 = new Runnable(){
public void run(){
msort(array, leftRef, mid);
}
};
Thread t1 = new Thread(r1);
t1.start();
//new thread
Runnable r2 = new Runnable(){
public void run(){
msort(array, mid+1, rightRef);
}
};
Thread t2 = new Thread(r2);
t2.start();
//join threads back together
try {
t1.join();
t2.join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
merge(numbers, leftRef, mid, mid+1, rightRef);
}
return numbers;
}
void merge(int numbers[], int startA, int endA, int startB, int endB) {
int finalStart = startA;
int finalEnd = endB;
int indexC = 0;
int[] listC = new int[numbers.length];
while(startA <= endA && startB <= endB){
if(numbers[startA] < numbers[startB]){
listC[indexC] = numbers[startA];
startA = startA+1;
}
else{
listC[indexC] = numbers[startB];
startB = startB +1;
}
indexC++;
}
if(startA <= endA){
for(int i = startA; i < endA; i++){
listC[indexC]= numbers[i];
indexC++;
}
}
indexC = 0;
for(int i = finalStart; i <= finalEnd; i++){
numbers[i]=listC[indexC];
indexC++;
}
}
Any pointers would be gratefully received.
Following #mcdowella's comment, I also think that the fork/join framework is your best bet if you want to limit the number of threads that run in parallel.
I know that this won't give you any help on your homework, because you are probably not allowed to use the fork/join framework in Java7. However it is about to learn something, isn't it?;)
As I commented, I think your merge method is wrong. I can't pinpoint the failure, but I have rewritten it. I strongly suggest you to write a testcase with all the edge cases that can happen during that merge method and if you verified it works, plant it back to your multithreaded code.
#lbalazscs also gave you the hint that the fork/join sort is mentioned in the javadocs, however I had nothing else to do- so I will show you the solution if you'd implemented it with Java7.
public class MultithreadedMergeSort extends RecursiveAction {
private final int[] array;
private final int begin;
private final int end;
public MultithreadedMergeSort(int[] array, int begin, int end) {
this.array = array;
this.begin = begin;
this.end = end;
}
#Override
protected void compute() {
if (end - begin < 2) {
// swap if we only have two elements
if (array[begin] > array[end]) {
int tmp = array[end];
array[end] = array[begin];
array[begin] = tmp;
}
} else {
// overflow safe method to calculate the mid
int mid = (begin + end) >>> 1;
// invoke recursive sorting action
invokeAll(new MultithreadedMergeSort(array, begin, mid),
new MultithreadedMergeSort(array, mid + 1, end));
// merge both sides
merge(array, begin, mid, end);
}
}
void merge(int[] numbers, int startA, int startB, int endB) {
int[] toReturn = new int[endB - startA + 1];
int i = 0, k = startA, j = startB + 1;
while (i < toReturn.length) {
if (numbers[k] < numbers[j]) {
toReturn[i] = numbers[k];
k++;
} else {
toReturn[i] = numbers[j];
j++;
}
i++;
// if we hit the limit of an array, copy the rest
if (j > endB) {
System.arraycopy(numbers, k, toReturn, i, startB - k + 1);
break;
}
if (k > startB) {
System.arraycopy(numbers, j, toReturn, i, endB - j + 1);
break;
}
}
System.arraycopy(toReturn, 0, numbers, startA, toReturn.length);
}
public static void main(String[] args) {
int[] toSort = { 55, 1, 12, 2, 25, 55, 56, 77 };
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new MultithreadedMergeSort(toSort, 0, toSort.length - 1));
System.out.println(Arrays.toString(toSort));
}
Note that the construction of your threadpool limits the number of active parallel threads to the number of cores of your processor.
ForkJoinPool pool = new ForkJoinPool();
According to it's javadoc:
Creates a ForkJoinPool with parallelism equal to
java.lang.Runtime.availableProcessors, using the default thread
factory, no UncaughtExceptionHandler, and non-async LIFO processing
mode.
Also notice how my merge method differs from yours, because I think that is your main problem. At least your sorting works if I replace your merge method with mine.
As mcdowella pointed out, the Fork/Join framework in Java 7 is exactly for tasks that can be broken into smaller pieces recursively.
Actually, the Javadoc for RecursiveAction has a merge sort as the first example :)
Also note that ForkJoinPool is an ExecutorService.

Parallelism in Java. Divide and Conquer, Quick Sort [duplicate]

I am experimenting with parallelizing algorithms in Java. I began with merge sort, and posted my attempt in this question. My revised attempt is in the code below, where I now try to parallelize quick sort.
Are there any rookie mistakes in my multi-threaded implementation or approach to this problem? If not, shouldn't I expect more than a 32% speed increase between a sequential and a parallelized algorithm on a duel-core (see timings at bottom)?
Here is the multithreading algorithm:
public class ThreadedQuick extends Thread
{
final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
CountDownLatch doneSignal;
static int num_threads = 1;
int[] my_array;
int start, end;
public ThreadedQuick(CountDownLatch doneSignal, int[] array, int start, int end) {
this.my_array = array;
this.start = start;
this.end = end;
this.doneSignal = doneSignal;
}
public static void reset() {
num_threads = 1;
}
public void run() {
quicksort(my_array, start, end);
doneSignal.countDown();
num_threads--;
}
public void quicksort(int[] array, int start, int end) {
int len = end-start+1;
if (len <= 1)
return;
int pivot_index = medianOfThree(array, start, end);
int pivotValue = array[pivot_index];
swap(array, pivot_index, end);
int storeIndex = start;
for (int i = start; i < end; i++) {
if (array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, end);
if (num_threads < MAX_THREADS) {
num_threads++;
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, start, storeIndex - 1).start();
quicksort(array, storeIndex + 1, end);
try {
completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex) {
ex.printStackTrace();
}
} else {
quicksort(array, start, storeIndex - 1);
quicksort(array, storeIndex + 1, end);
}
}
}
Here is how I start it off:
ThreadedQuick.reset();
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, 0, array.length-1).start();
try {
completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex){
ex.printStackTrace();
}
I tested this against Arrays.sort and a similar sequential quick sort algorithm. Here are the timing results on an intel duel-core dell laptop, in seconds:
Elements: 500,000,
sequential: 0.068592,
threaded: 0.046871,
Arrays.sort: 0.079677
Elements: 1,000,000,
sequential: 0.14416,
threaded: 0.095492,
Arrays.sort: 0.167155
Elements: 2,000,000,
sequential: 0.301666,
threaded: 0.205719,
Arrays.sort: 0.350982
Elements: 4,000,000,
sequential: 0.623291,
threaded: 0.424119,
Arrays.sort: 0.712698
Elements: 8,000,000,
sequential: 1.279374,
threaded: 0.859363,
Arrays.sort: 1.487671
Each number above is the average time of 100 tests, throwing out the 3 lowest and 3 highest cases. I used Random.nextInt(Integer.MAX_VALUE) to generate an array for each test, which was initialized once every 10 tests with the same seed. Each test consisted of timing the given algorithm with System.nanoTime. I rounded to six decimal places after averaging. And obviously, I did check to see if each sort worked.
As you can see, there is about a 32% increase in speed between the sequential and threaded cases in every set of tests. As I asked above, shouldn't I expect more than that?
Making numThreads static can cause problems, it is highly likely that you will end up with more than MAX_THREADS running at some point.
Probably the reason why you don't get a full double up in performance is that your quick sort can not be fully parallelised. Note that the first call to quicksort will do a pass through the whole array in the initial thread before it starts to really run in parallel. There is also an overhead in parallelising an algorithm in the form of context switching and mode transitions when farming off to separate threads.
Have a look at the Fork/Join framework, this problem would probably fit quite neatly there.
A couple of points on the implementation. Implement Runnable rather than extending Thread. Extending a Thread should be used only when you create some new version of Thread class. When you just want to do some job to be run in parallel you are better off with Runnable. While iplementing a Runnable you can also still extend another class which gives you more flexibility in OO design. Use a thread pool that is restricted to the number of threads you have available in the system. Also don't use numThreads to make the decision on whether to fork off a new thread or not. You can calculate this up front. Use a minimum partition size which is the size of the total array divided by the number of processors available. Something like:
public class ThreadedQuick implements Runnable {
public static final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
static final ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS);
final int[] my_array;
final int start, end;
private final int minParitionSize;
public ThreadedQuick(int minParitionSize, int[] array, int start, int end) {
this.minParitionSize = minParitionSize;
this.my_array = array;
this.start = start;
this.end = end;
}
public void run() {
quicksort(my_array, start, end);
}
public void quicksort(int[] array, int start, int end) {
int len = end - start + 1;
if (len <= 1)
return;
int pivot_index = medianOfThree(array, start, end);
int pivotValue = array[pivot_index];
swap(array, pivot_index, end);
int storeIndex = start;
for (int i = start; i < end; i++) {
if (array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, end);
if (len > minParitionSize) {
ThreadedQuick quick = new ThreadedQuick(minParitionSize, array, start, storeIndex - 1);
Future<?> future = executor.submit(quick);
quicksort(array, storeIndex + 1, end);
try {
future.get(1000, TimeUnit.SECONDS);
} catch (Exception ex) {
ex.printStackTrace();
}
} else {
quicksort(array, start, storeIndex - 1);
quicksort(array, storeIndex + 1, end);
}
}
}
You can kick it off by doing:
ThreadedQuick quick = new ThreadedQuick(array / ThreadedQuick.MAX_THREADS, array, 0, array.length - 1);
quick.run();
This will start the sort in the same thread, which avoids an unnecessary thread hop at start up.
Caveat: Not sure the above implementation will actually be faster as I haven't benchmarked it.
This uses a combination of quick sort and merge sort.
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ParallelSortMain {
public static void main(String... args) throws InterruptedException {
Random rand = new Random();
final int[] values = new int[100*1024*1024];
for (int i = 0; i < values.length; i++)
values[i] = rand.nextInt();
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(threads);
int blockSize = (values.length + threads - 1) / threads;
for (int i = 0; i < values.length; i += blockSize) {
final int min = i;
final int max = Math.min(min + blockSize, values.length);
es.submit(new Runnable() {
#Override
public void run() {
Arrays.sort(values, min, max);
}
});
}
es.shutdown();
es.awaitTermination(10, TimeUnit.MINUTES);
for (int blockSize2 = blockSize; blockSize2 < values.length / 2; blockSize2 *= 2) {
for (int i = 0; i < values.length; i += blockSize2) {
final int min = i;
final int mid = Math.min(min + blockSize2, values.length);
final int max = Math.min(min + blockSize2 * 2, values.length);
mergeSort(values, min, mid, max);
}
}
}
private static boolean mergeSort(int[] values, int left, int mid, int end) {
int[] results = new int[end - left];
int l = left, r = mid, m = 0;
for (; l < left && r < mid; m++) {
int lv = values[l];
int rv = values[r];
if (lv < rv) {
results[m] = lv;
l++;
} else {
results[m] = rv;
r++;
}
}
while (l < mid)
results[m++] = values[l++];
while (r < end)
results[m++] = values[r++];
System.arraycopy(results, 0, values, left, results.length);
return false;
}
}
Couple of comments if I understand your code right:
I don't see a lock around the numthreads object even though it could be accessed via multiple threads. Perhaps you should make it an AtomicInteger.
Use a thread pool and arrange the tasks, i.e. a single call to quicksort, to take advantange of a thread pool. Use Futures.
Your current method of dividing things the way you're doing could leave a smaller division with a thread and a larger division without a thread. That is to say, it doesn't prioritize larger segments with their own threads.

Java: Parallelizing quick sort via multi-threading

I am experimenting with parallelizing algorithms in Java. I began with merge sort, and posted my attempt in this question. My revised attempt is in the code below, where I now try to parallelize quick sort.
Are there any rookie mistakes in my multi-threaded implementation or approach to this problem? If not, shouldn't I expect more than a 32% speed increase between a sequential and a parallelized algorithm on a duel-core (see timings at bottom)?
Here is the multithreading algorithm:
public class ThreadedQuick extends Thread
{
final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
CountDownLatch doneSignal;
static int num_threads = 1;
int[] my_array;
int start, end;
public ThreadedQuick(CountDownLatch doneSignal, int[] array, int start, int end) {
this.my_array = array;
this.start = start;
this.end = end;
this.doneSignal = doneSignal;
}
public static void reset() {
num_threads = 1;
}
public void run() {
quicksort(my_array, start, end);
doneSignal.countDown();
num_threads--;
}
public void quicksort(int[] array, int start, int end) {
int len = end-start+1;
if (len <= 1)
return;
int pivot_index = medianOfThree(array, start, end);
int pivotValue = array[pivot_index];
swap(array, pivot_index, end);
int storeIndex = start;
for (int i = start; i < end; i++) {
if (array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, end);
if (num_threads < MAX_THREADS) {
num_threads++;
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, start, storeIndex - 1).start();
quicksort(array, storeIndex + 1, end);
try {
completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex) {
ex.printStackTrace();
}
} else {
quicksort(array, start, storeIndex - 1);
quicksort(array, storeIndex + 1, end);
}
}
}
Here is how I start it off:
ThreadedQuick.reset();
CountDownLatch completionSignal = new CountDownLatch(1);
new ThreadedQuick(completionSignal, array, 0, array.length-1).start();
try {
completionSignal.await(1000, TimeUnit.SECONDS);
} catch(Exception ex){
ex.printStackTrace();
}
I tested this against Arrays.sort and a similar sequential quick sort algorithm. Here are the timing results on an intel duel-core dell laptop, in seconds:
Elements: 500,000,
sequential: 0.068592,
threaded: 0.046871,
Arrays.sort: 0.079677
Elements: 1,000,000,
sequential: 0.14416,
threaded: 0.095492,
Arrays.sort: 0.167155
Elements: 2,000,000,
sequential: 0.301666,
threaded: 0.205719,
Arrays.sort: 0.350982
Elements: 4,000,000,
sequential: 0.623291,
threaded: 0.424119,
Arrays.sort: 0.712698
Elements: 8,000,000,
sequential: 1.279374,
threaded: 0.859363,
Arrays.sort: 1.487671
Each number above is the average time of 100 tests, throwing out the 3 lowest and 3 highest cases. I used Random.nextInt(Integer.MAX_VALUE) to generate an array for each test, which was initialized once every 10 tests with the same seed. Each test consisted of timing the given algorithm with System.nanoTime. I rounded to six decimal places after averaging. And obviously, I did check to see if each sort worked.
As you can see, there is about a 32% increase in speed between the sequential and threaded cases in every set of tests. As I asked above, shouldn't I expect more than that?
Making numThreads static can cause problems, it is highly likely that you will end up with more than MAX_THREADS running at some point.
Probably the reason why you don't get a full double up in performance is that your quick sort can not be fully parallelised. Note that the first call to quicksort will do a pass through the whole array in the initial thread before it starts to really run in parallel. There is also an overhead in parallelising an algorithm in the form of context switching and mode transitions when farming off to separate threads.
Have a look at the Fork/Join framework, this problem would probably fit quite neatly there.
A couple of points on the implementation. Implement Runnable rather than extending Thread. Extending a Thread should be used only when you create some new version of Thread class. When you just want to do some job to be run in parallel you are better off with Runnable. While iplementing a Runnable you can also still extend another class which gives you more flexibility in OO design. Use a thread pool that is restricted to the number of threads you have available in the system. Also don't use numThreads to make the decision on whether to fork off a new thread or not. You can calculate this up front. Use a minimum partition size which is the size of the total array divided by the number of processors available. Something like:
public class ThreadedQuick implements Runnable {
public static final int MAX_THREADS = Runtime.getRuntime().availableProcessors();
static final ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS);
final int[] my_array;
final int start, end;
private final int minParitionSize;
public ThreadedQuick(int minParitionSize, int[] array, int start, int end) {
this.minParitionSize = minParitionSize;
this.my_array = array;
this.start = start;
this.end = end;
}
public void run() {
quicksort(my_array, start, end);
}
public void quicksort(int[] array, int start, int end) {
int len = end - start + 1;
if (len <= 1)
return;
int pivot_index = medianOfThree(array, start, end);
int pivotValue = array[pivot_index];
swap(array, pivot_index, end);
int storeIndex = start;
for (int i = start; i < end; i++) {
if (array[i] <= pivotValue) {
swap(array, i, storeIndex);
storeIndex++;
}
}
swap(array, storeIndex, end);
if (len > minParitionSize) {
ThreadedQuick quick = new ThreadedQuick(minParitionSize, array, start, storeIndex - 1);
Future<?> future = executor.submit(quick);
quicksort(array, storeIndex + 1, end);
try {
future.get(1000, TimeUnit.SECONDS);
} catch (Exception ex) {
ex.printStackTrace();
}
} else {
quicksort(array, start, storeIndex - 1);
quicksort(array, storeIndex + 1, end);
}
}
}
You can kick it off by doing:
ThreadedQuick quick = new ThreadedQuick(array / ThreadedQuick.MAX_THREADS, array, 0, array.length - 1);
quick.run();
This will start the sort in the same thread, which avoids an unnecessary thread hop at start up.
Caveat: Not sure the above implementation will actually be faster as I haven't benchmarked it.
This uses a combination of quick sort and merge sort.
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ParallelSortMain {
public static void main(String... args) throws InterruptedException {
Random rand = new Random();
final int[] values = new int[100*1024*1024];
for (int i = 0; i < values.length; i++)
values[i] = rand.nextInt();
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(threads);
int blockSize = (values.length + threads - 1) / threads;
for (int i = 0; i < values.length; i += blockSize) {
final int min = i;
final int max = Math.min(min + blockSize, values.length);
es.submit(new Runnable() {
#Override
public void run() {
Arrays.sort(values, min, max);
}
});
}
es.shutdown();
es.awaitTermination(10, TimeUnit.MINUTES);
for (int blockSize2 = blockSize; blockSize2 < values.length / 2; blockSize2 *= 2) {
for (int i = 0; i < values.length; i += blockSize2) {
final int min = i;
final int mid = Math.min(min + blockSize2, values.length);
final int max = Math.min(min + blockSize2 * 2, values.length);
mergeSort(values, min, mid, max);
}
}
}
private static boolean mergeSort(int[] values, int left, int mid, int end) {
int[] results = new int[end - left];
int l = left, r = mid, m = 0;
for (; l < left && r < mid; m++) {
int lv = values[l];
int rv = values[r];
if (lv < rv) {
results[m] = lv;
l++;
} else {
results[m] = rv;
r++;
}
}
while (l < mid)
results[m++] = values[l++];
while (r < end)
results[m++] = values[r++];
System.arraycopy(results, 0, values, left, results.length);
return false;
}
}
Couple of comments if I understand your code right:
I don't see a lock around the numthreads object even though it could be accessed via multiple threads. Perhaps you should make it an AtomicInteger.
Use a thread pool and arrange the tasks, i.e. a single call to quicksort, to take advantange of a thread pool. Use Futures.
Your current method of dividing things the way you're doing could leave a smaller division with a thread and a larger division without a thread. That is to say, it doesn't prioritize larger segments with their own threads.

Categories