I am trying to implement several sorting algorithms in Java, to compare the performances. From what I've read, I was expecting quickSort to be faster than mergeSort, but on my code it is not, so I assume there must be a problem with my quickSort algorithm:
public class quickSortExample{
public static void main(String[] args){
Random gen = new Random();
int n = 1000000;
int max = 1500000;
ArrayList<Integer> d = new ArrayList<Integer>();
for(int i = 0; i < n; i++){
d.add(gen.nextInt(max));
}
ArrayList<Integer> r;
long start, end;
start = System.currentTimeMillis();
r = quickSort(d);
end = System.currentTimeMillis();
System.out.println("QuickSort:");
System.out.println("Time: " + (end-start));
//System.out.println(display(d));
//System.out.println(display(r));
}
public static ArrayList<Integer> quickSort(ArrayList<Integer> data){
if(data.size() > 1){
int pivotIndex = getPivotIndex(data);
int pivot = data.get(pivotIndex);
data.remove(pivotIndex);
ArrayList<Integer> smallers = new ArrayList<Integer>();
ArrayList<Integer> largers = new ArrayList<Integer>();
for(int i = 0; i < data.size(); i++){
if(data.get(i) <= pivot){
smallers.add(data.get(i));
}else{
largers.add(data.get(i));
}
}
smallers = quickSort(smallers);
largers = quickSort(largers);
return concat(smallers, pivot, largers);
}else{
return data;
}
}
public static int getPivotIndex(ArrayList<Integer> d){
return (int)Math.floor(d.size()/2.0);
}
public static ArrayList<Integer> concat(ArrayList<Integer> s, int p, ArrayList<Integer> l){
ArrayList<Integer> arr = new ArrayList<Integer>(s);
arr.add(p);
arr.addAll(l);
return arr;
}
public static String display(ArrayList<Integer> data){
String s = "[";
for(int i=0; i < data.size(); i++){
s += data.get(i) + ", ";
}
return (s+"]");
}
}
Results (on 1 million integer between 0 and 1500000):
mergeSort (implemented with arrayList too): 1.3sec (average) (0.7sec with int[] instead)
quickSort: 3sec (average)
Is it just the choice of my pivot which is bad, or are there some flaws in the algo too.
Also, is there a faster way to code it with int[] instead of ArrayList()? (How do you declare the size of the array for largers/smallers arrays?)
PS: I now it is possible to implement it in an inplace manner so it uses less memory, but this is not the point of this.
EDIT 1: I earned 1 sec by changing the concat method.
Thanks!
PS: I now it is possible to implement it in an inplace manner so it uses less memory, but this is not the point of this.
It's not just to use less memory. All that extra work you do in the "concat" routine instead of doing a proper in-place QuickSort is almost certainly what's costing so much. If you can use extra space anyway, you should always code up a merge sort because it'll tend to do fewer comparisons than a QuickSort will.
Think about it: in "concat()" you inevitably have to make another pass over the sub-lists, doing more comparisons. If you did the interchange in-place, all in a single array, then once you've made the decision to interchange two places, you don't make the decision again.
I think the major problem with your quicksort, like you say, is that it's not done in place.
The two main culprits are smallers and largers. The default size for an ArrayList is 10. In the initial call to quickSort a good pivot will mean that smallers and largers grow to 500,000. Since the ArrayList only doubles in size when it reaches capacity, it will have to be resized at around 19 times.
Since you are make a new smaller and larger with each level of recursion your going to be performing approximately 2*(19+18+...+2+1) resizes. That's around 400 resizes the ArrayList objects have to perform before they are even concatenated. The concatenation process will probably perform a similar number of resizes.
All in all, this is a lot of extra work.
Oops, just noticed data.remove(pivotIndex). The chosen pivot index (middle of the array) is also going to be causing additional memory operations (even though middle is usual a better choice than beginning or end or the array). That is arraylist will copy the entire block of memory to the 'right' of the pivot one step to the left in the backing array.
A quick note on the chosen pivot, since the integers you are sorting are evenly distributed between n and 0 (if Random lives up to its name), you can use this to choose good pivots. That is, the first level of quick sort should choose max*0.5 as its pivot. The second level with smallers should choose max*0.25 and the second level with largers should choose max*0.75 (and so on).
I think, that your algo is quite inefficient because you're using intermediate arrays = more memory + more time for allocation/copy. Here is the code in C++ but the idea is the same: you have to swap the items, and not copy them to another arrays
template<class T> void quickSortR(T* a, long N) {
long i = 0, j = N;
T temp, p;
p = a[ N/2 ];
do {
while ( a[i] < p ) i++;
while ( a[j] > p ) j--;
if (i <= j) {
temp = a[i]; a[i] = a[j]; a[j] = temp;
i++; j--;
}
} while ( i<=j );
if ( j > 0 ) quickSortR(a, j);
if ( N > i ) quickSortR(a+i, N-i);
}
Fundamentals of OOP and data structures in Java By Richard Wiener, Lewis J. Pinson lists quicksort as following, which may or may not be faster (I suspect it is) than your implementation...
public static void quickSort (Comparable[] data, int low, int high) {
int partitionIndex;
if (high - low > 0) {
partitionIndex = partition(data, low, high);
quickSort(data, low, partitionIndex - 1);
quickSort(data, partitionIndex + 1, high);
}
}
private static int partition (Comparable[] data, int low, int high) {
int k, j;
Comparable temp, p;
p = data[low]; // Partition element
// Find partition index(j).
k = low;
j = high + 1;
do {
k++;
} while (data[k].compareTo(p) <= 0 && k < high);
do {
j--;
} while (data[j].compareTo(p) > 0);
while (k < j) {
temp = data[k];
data[k] = data[j];
data[j] = temp;
do {
k++;
} while (data[k].compareTo(p) <= 0);
do {
j--;
} while (data[j].compareTo(p) > 0);
}
// Move partition element(p) to partition index(j).
if (low != j) {
temp = data[low];
data[low] = data[j];
data[j] = temp;
}
return j; // Partition index
}
I agree that the reason is unnecessary copying. Some more notes follow.
The choice of pivot index is bad, but it's not an issue here, because your numbers are random.
(int)Math.floor(d.size()/2.0) is equivalent to d.size()/2.
data.remove(pivotIndex); is unnecessary copying of n/2 elements. Instead, you should check in the following loop whether i == pivotIndex and skip this element. (Well, what you really need to do is inplace sort, but I'm just suggesting straightforward improvements.)
Putting all elements that are equal to pivot in the same ('smaller') part is a bad idea. Imagine what happens when all elements of the array are equal. (Again, not an issue in this case.)
for(i = 0; i < s.size(); i++){
arr.add(s.get(i));
}
is equivalent to arr.addAll(s). And of course, unnecessary copying here again. You could just add all elements from the right part to the left one instead of creating new list.
(How do you declare the size of the array for largers/smallers arrays?)
I'm not sure if I got you right, but do you want array.length?
So, I think that even without implementing in-place sort you can significantly improve performance.
Technically, Mergesort has a better time-behavior ( Θ(nlogn) worst and average cases ) than Quicksort ( Θ(n^2) worst case, Θ(nlogn) average case). So it is quite possible to find inputs for which Mergesort outperforms Quicksort. Depending on how you pick your pivots, you can make the worst-case rare. But for a simple version of Quicksort, the "worst case" will be sorted (or nearly sorted) data, which can be a rather common input.
Here's what Wikipedia says about the two:
On typical modern architectures,
efficient quicksort implementations
generally outperform mergesort for
sorting RAM-based arrays. On the other
hand, merge sort is a stable sort,
parallelizes better, and is more
efficient at handling slow-to-access
sequential media.[citation needed]
Merge sort is often the best choice
for sorting a linked list: in this
situation it is relatively easy to
implement a merge sort in such a way
that it requires only Θ(1) extra
space, and the slow random-access
performance of a linked list makes
some other algorithms (such as
quicksort) perform poorly, and others
(such as heapsort) completely
impossible.
Related
I have a quick sort implementation, this sort implementation is working fine with small arrays and 10000 random numbers but it is throwing a stackoverflow error when the input was 10000 sequence numbers (from 1 to 10000)
public class QuickSort<T extends Comparable> extends Sort<T>{
public void sort(Comparable[] input) {
sort(input, 0, input.length-1);
}
private void sort(Comparable[] a, int lo, int hi) {
if (hi <= lo) return;
int j = partition(a, lo, hi);
sort(a, lo, j-1);
sort(a, j+1, hi);
}
private int partition(Comparable[] a, int lo, int hi) {
int i=lo;
int j=hi + 1;
Comparable v = a[lo];
while(true) {
while (less(a[++i], v)) {
if (i == hi) break;
}
while (less(v, a[--j])) {
if (j == lo) break;
}
if(i >= j) break;
exch(a, i, j);
}
exch(a, lo, j);
return j;
}
public static boolean less (Comparable a, Comparable b) {
return a.compareTo(b) < 0;
}
public static void exch(Comparable[] array, int i, int j) {
Comparable temp = array[i];
array[i] = array[j];
array[j] = temp;
}
public static void main(String...args) {
Integer[] array = new Integer[] {10,2,9,11,1,19,9,4,6,2,1,4,5,6};
new QuickSort<>().sort(array);
for (int temp : array) {
System.out.print(temp + " ");
}
}
}
It is working for 10000 random numbers and for other input. But throws a stackoverflow error when executed with 10000 sequence numbers (from 1 to 10000)
Simple Quicksort implementation has O(n^2) complexity and O(n) additional memory requirements in the worst-case scenario. You've got this worst-case on ordered sequence because of bad pivot element selection method.
Wikipedia:
In the very early versions of quicksort, the leftmost element of the
partition would often be chosen as the pivot element. Unfortunately,
this causes worst-case behavior on already sorted arrays, which is a
rather common use-case. The problem was easily solved by choosing
either a random index for the pivot, choosing the middle index of the
partition or (especially for longer partitions) choosing the median of
the first, middle and last element of the partition for the pivot (as
recommended by Sedgewick).
Simple way to fix this issue is to take middle element as a pivot. Replace
Comparable v = a[lo];
with
Comparable v = a[lo+(hi-lo)/2];
It is not that hard to create worst-case test for this pivot selection method, but you'll need to do it intentionally on large input cases. If you want sorting algorithm that is similar to Quicksort and without O(n^2) worst-cases, you should look at Introsort algorithm.
The stack overflow occurs because you indeed more try to use more stack space then is available for your java program in the JVM.
This happens because your pivot-selection strategy is simply selecting the first element as a pivot, and this causes a worst-case performance when the array is already sorted - in this case the algorithm runs in O(n^2) time and O(n) space - which means the recursion is 10000 deep, using up more stack than available.
You have a few options :
Increase stack space - Default stack size (in most JVMs) in 512k, but you can change this with the -Xss flag. In my env, when I added the flag -Xss18m (18 MB) I could run the program with an array of 200k without stack overflow. You can increase it even much more than that.
Try to use less stack space - stack space is used by the recursion and the stack variables which are defined in the function.
For example, you could change the array itself to be a member of the class QuickSort, and not pass it as a parameter. This would add only two integers (lo and hi) to the stack in each call, and save up stack space.
(The best option) - Use a random pivot - Select a random element between lo and hi as pivot, not always the first one.
instead of
Comparable v = a[p];
use
int p = lo + r.nextInt(hi - lo);
Comparable v = a[p];
and in the end
exch(a, p, j);
I tried to implement an efficient sorting algorithm in Java. For this reason, I also implemented quicksort and use the following code:
public class Sorting {
private static Random prng;
private static Random getPrng() {
if (prng == null) {
prng = new Random();
}
return prng;
}
public static void sort(int[] array) {
sortInternal(array, 0, array.length - 1);
}
public static void sortInternal(int[] array, int start, int end) {
if (end - start < 50) {
insertionSortInternal(array, start, end);
} else {
quickSortInternal(array, start, end);
}
}
private static void insertionSortInternal(int[] array, int start, int end) {
for (int i=start; i<end - 1; ++i) {
for (int ptr=i; ptr>0 && array[ptr - 1] < array[ptr]; ptr--) {
ArrayUtilities.swap(array, ptr, ptr - 1);
}
}
}
private static void quickSortInternal(int[] array, int start, int end) {
int pivotPos = getPrng().nextInt(end - start);
int pivot = array[start + pivotPos];
ArrayUtilities.swap(array, start + pivotPos, end - 1);
int left = start;
int right = end - 2;
while (left < right) {
while (array[left] <= pivot && left < right) {
++left;
}
if (left == right) break;
while (array[right] >= pivot && left < right) {
right--;
}
if (left == right) break;
ArrayUtilities.swap(array, left, right);
}
ArrayUtilities.swap(array, left, end - 1);
sortInternal(array, start, left);
sortInternal(array, left + 1, end);
}
}
ArrayUtilities.swap just swaps the two given elements in the array. From this code, I expect O(n log(n)) runtime behaviour. But, some different lengths of arrays to sort gave the following results:
10000 elements: 32ms
20000 elements: 128ms
30000 elements: 296ms
The test ran 100 times in each case, and then the arithmetic mean of the running times was calculated. But clearly, as opposed to the expected behaviour, the runtime is O(n^2). What's wrong with my algorithm?
In your insertion-sort implementation your array will be sorted in descending order, while in your quick-sort the array is sorted in ascending order. So replace(for descending order):
for (int ptr=i; ptr>0 && array[ptr - 1] < array[ptr]; ptr--)
with
for (int ptr=i; ptr>0 && array[ptr - 1] > array[ptr]; ptr--)
It also seems like your indexing is not correct.
Try to replace:
sortInternal(array, 0, array.length - 1);
with:
sortInternal(array, 0, array.length);
And in the insertions sort first for loop you don't need to do end - 1, i.e. use:
for (int i=start; i<end; ++i)
Finally, add if (start >= end) return; at the beginning of the quick-sort method.
And as #ljeabmreosn mentioned, 50 is a little bit too large, I would have chosen something between 5 and 20.
Hope that helps!
The QuickSort "optimized" with Insertion Sort for arrays with length less than 50 elements seems to be a problem.
Imagine I had an array of size 65, and the pivot happened to be the median of that array. If I ran the array through your code, your code would use Insertion Sort on the two 32 length subarrays to the left and right of the pivot. This would result in ~O(2*(n/2)^2 + n) = ~O(n^2) average case. Using quick sort and implementing a pivot picking strategy for the first pivot, the time average case would be ~O((nlog(n)) + n) = ~O(n(log(n) + 1)) = ~O(n*log(n)). Don't use Insertion Sort as it is only used when the array is almost sorted. If you are using Insertion Sort solely because of the real running time of sorting small arrays might run faster than the standard quick sort algorithm (deep recursion), you can always utilize a non-recursive quick sort algorithm which runs faster than Insertion Sort.
Maybe change the "50" to "20" and observe the results.
A zig-zag method which takes an array as argument and returns a zig-zag array.
Example : Input 2,6,1,7,9,3
Output 9,1,7,2,6,3
The array returned must have alternative highest numbers and lowest numbers.
I can think of this method.
//Pseudo code
public static int [] zig-zag(int arr[])
{
arr.sort();
int returnArr[] = new int[arr.length];
int begindex = 0, endindex = arr.length -1;
int idx = 0;
while(begindex<arr.length/2-1 && endindex>=arr.length/2)
{
returnArr[idx++] = arr[endindex];
returnArr[idx++] = arr[begindex];
begindex++;endindex--;
}
if(arr.length%2 == 1)
reurnArr[idx] = arr[begindex];
return returnArr;
}
This method has a time complexity of O(nlogn) (because of the sort) and space complexity of O(n).
Is there any other way/algorithm so that it can do better than O(nlogn) ? or with O(nlogn) and space complexity being O(1) ?
There's one more method with TC O(n^2) and SC O(1). But not interested in TC of O(n^2).
Here is an algorithm that can do it with time complexity O(nlogn) and space complexity O(1) using a linked list.
The method works for lists with duplicate values.
It is as follows:
First, get your list, l, sorted in descending order, with the second half reversed. (Note that your sorting algorithm must work in place on a linked list, such as in place merge sort.)
For example, with l = 2, 6, 1, 7, 9, 3, this form would be l = 9, 7, 6, 1, 2, 3. If your list was of odd length, the first half would be one element longer than the second.
An easy way to do this would be to sort l in descending order, and then reverse the elements in the second half.
Next, we create some temporary variables:
Node upper = list.head; //Upper half of list pointer
Node lower = list.get(l.length/2); //Lower half of list pointer
Node temp = null; //Utility pointer to hold whatever we need
//Let's set up our initial state
list.get(l.length/2-1) = null; //Disconnect two halves of the list
temp = upper.next; //Hold upper half minus head
upper.next = lower; //First element of upper half stitched to bottom half
//Note that lower would need to be at `l.length/2+1` for an odd length list
//This also applies to list.get in the set up
//The code could be generalized to both even and odd lenghts by using `Math.ceil`
// or a similar function to round up instead of Java's default of rounding down
zigZag(upper, lower, temp); //Call to algorithm
Finally, the algorithm:
public static void zigZag(Node upper, Node lower, Node temp){
int i = 0; //Controls alternation
while(temp != null){ //Until temp gets set to null by lower.next or upper.next
if(i%2==0){ //On even iterations
upper = temp;
temp = lower.next;
lower.next = upper;
}
else{ //On odd iterations
lower = temp;
temp = upper.next;
upper.next = lower;
}
i++;
}
}
Alternatively, here's the recursive version:
public static void zigZag(Node upper, Node lower, Node temp){
if(temp == null) // temp got set to null by lower.next or upper.next
return; // we're done
upper = temp;
temp = lower.next;
lower.next = upper;
zigZag(lower, upper, temp); //swap upper/lower for next cycle
}
You now have a zig-zagged linked list, stored in l.
Finding time and space complexity:
Sorting: time O(nlogn), space O(1)
Sorting takes your original time complexity and, as it sorts in place, constant space
Reversing: time O(n), space O(1)
Reversing the second half of your list is O(n/2) => O(n)
Temporaries: time O(1), space O(1)
Simple variable assignments of constant number and size take both constant time and space
Algorithm: time O(n), space O(1)
The algorithm simply changes the next pointer of each node once, so it runs in O(n) time. It doesn't create any new variables, and thus has constant space complexity, O(1).
The recursive version is tail recursive, which means it can only use a single stack frame, giving it theoretically constant space complexity, O(1). (Though not in Java, as it does not support tail-recursion optimization.)
Adding it all up:
As you can see, space complexity is constant throughout, giving our overall program O(1) space usage.
Time complexity is O(nlogn)+O(n)+O(1)+O(n), which is clearly dominated by O(nlogn).
Extra reversing of your linked list because you used an ascending order sort will slow the program, but won't change the overall time complexity.
Similarly, you could come up with a sort that gives the desired form of half descending, half ascending to save some time, but it too would not change overall time complexity.
Potential for Speedup:
As mentioned by #flkes in his answer, you can reduce the time complexity of your whole program by reducing the time complexity of the sort, as it produces the dominating term.
If you found an implementation that sorted in place in O(n) time (such as this linked-list radix sort algorithm or a similar bucket sort algorithm), you could achieve total time complexity of O(n) with constant, O(1), space complexity, which is really incredibly good.
I would recommend implementing a radix sort first, which has a complexity of O(n). An example of that can be found here
Once you radix sort the list you can easily map it to the zigzag pattern using a container with a simple for loop. This should push the complexity to some O(n + kn) which still resolves to O(n)
After sorting, invert the second half of the array:
now the rest of the problem is to do a perfect shuffle of the array elements - a problem to come up time and again.
If you want to apply a permutation in-place and know how to transform indices, you can keep a "scoreboard" of indices handled - but even a single bit per item is O(n) storage. (Find the next index still needing handling and perform the cycle containing it, keeping scores, until all indices are handled.)
A pretty nice rendition of an in-place perfect shuffle in linear time and constant space in addition to the array is Aryabhata's over at CS. The method has been placed at arxiv.org by Peiyush Jain.
(The complexity of the sort as a first step may dominate the permutation/shuffle step(s).)
There is another interpretation of this task, or the sort step: sort into a folded array.
The sort lending itself most readily to this task got to be the double-ended selection sort:
In each pass over the data not yet placed, determine the min and max in 3/2n comparisons and swap into their positions, until one value or none at all is left.
Or take a standard sort method, and have the indexes mapped. For the hell of it:
/** Anything with accessors with int parameter */
interface Indexable<T> {
T get(int index);
T set(int index, T value);
// int size(); // YAGNI?
}
/** The accessors have this folded in half,
* while iterator() is not overridden */
#SuppressWarnings("serial")
class FoldedList<T> extends ArrayList<T>
implements Indexable<T> {
public FoldedList(#SuppressWarnings("unchecked") T...elements) {
super(Arrays.asList(elements));
}
int map(int index) {
final int last = size()-1;
index = 2*index;
return last <= index ? 2*last-index : index+1;
}
#Override
public T get(int index) { return super.get(map(index)); }
#Override
public T set(int index, T element) {
return super.set(map(index), element);
}
}
/** Sort an Indexable<T> */
public class Sort {
// Hoare/Sedgewick using middle index for pivot
private static <T extends Comparable<T>>
int split(Indexable<T> ixable, int lo, int hi) {
int
mid = lo + (hi-lo)/2,
left = lo+1,
right= hi-1;
T pivot = ixable.get(mid),
l = null, r = null;
ixable.set(mid, ixable.get(lo));
scan:
while (true) {
while ((l = ixable.get(left)).compareTo(pivot) < 0)
if (right < ++left) {
left--;
break scan;
}
while (pivot.compareTo(r = ixable.get(right)) < 0)
if (--right <= left) {
left -= 1;
l = ixable.get(left);
break scan;
}
ixable.set(left, r); // place misplaced items
ixable.set(right, l);
if (--right < ++left) {
left = right;
l = r;
break;
}
}
ixable.set(lo, l); // put last left value into first position
ixable.set(left, pivot); // place pivot at split index
return left;
}
private static <T extends Comparable<T>>
void sort(Indexable<T> ixable, int lo, int hi) {
while (lo+2 < hi) { // more than 2 Ts
int split = split(ixable, lo, hi);
if (split - lo < hi - split) {
sort(ixable, lo, split); // left part smaller
lo = split + 1;
} else {
sort(ixable, split+1, hi); // right part smaller
hi = split;
}
}
T l, h;
if (lo < --hi // 2 Ts
&& (l = ixable.get(lo)).compareTo(h = ixable.get(hi)) > 0) {
ixable.set(lo, h); // exchange
ixable.set(hi, l);
}
}
public static <T extends Comparable<T>>
void main(String[] args) {
Indexable<Number> nums = new FoldedList<>( //2,6,1,7,9,3);
7, 3, 9, 3, 0, 6, 1, 2, 8, 6, 5, 4, 7);
sort((Indexable<T>) nums);
System.out.println(nums);
}
}
I have a task to write quicksort (on only posivite numbers) algorythm in Java (I can't use any imports but Scanner) but without recursion and without stack.
I have two question about it :
I do understeand iterative quicksort with stack and recursive version but i cannot imagine how to do it without it.
I have heard about some 'in place' implementation but i dont really get it - is it solution for my problem?
I would appreciate if anyone could show me a way to do it ( dont post implementation if you can, I just want to understeand it not copy someone's code) or recommend some book where I can find it ( or some similar problem ).
Is implementing sort by insertion for some small arrays a good idea? If so how big should be N in this code :
if (arraySize < N)
insertionSort
else
quickSort
fi
Apparently my task was to find only posivite numbers, here is my solution:
public static void quickSort(final int size) {
int l = 0;
int r = size - 1;
int q, i = 0;
int tmpr = r;
while (true) {
i--;
while (l < tmpr) {
q = partition(l, tmpr);
arr[tmpr] = -arr[tmpr];
tmpr = q - 1;
++i;
}
if (i < 0)
break;
l++;
tmpr = findNextR(l, size);
arr[tmpr] = -arr[tmpr];
}
}
private static int findNextR(final int l, final int size) {
for (int i = l; i < size; ++i) {
if (arr[i] < 0)
return i;
}
return size - 1;
}
private static int partition(int l, int r) {
long pivot = arr[(l + r) / 2];
while (l <= r) {
while (arr[r] > pivot)
r--;
while (arr[l] < pivot)
l++;
if (l <= r) {
long tmp = arr[r];
arr[r] = arr[l];
arr[l] = tmp;
l++;
r--;
}
}
return l;
}
My array to sort is an static array in my class.
It is based on finding and creating negative numbers.
Partition is created by using middle element in array but using median is also good (it depends on array).
I hope someone will find this usefull.
Just as a reference the Java8 implementation of Arrays.sort(int[]) uses a threshold of 47, anything less than that is sorted using insertion. Their quick sort implementation is however very complex with some initial overhead, so look upon 47 as an upper limit.
A Google of "non-recursive quicksort" produced a slew of answers ... including this one: Non recursive QuickSort "Your language may vary," but the basic principle won't.
I personally think that, if you're going to sort something, you might as well use Quicksort in all cases . . .
Unless, of course, you can simply use a sort() function in your favorite target-language and leave it to the language implementors to have chosen a clever algorithm (uhhhh, it's probably Quicksort...) for you. If you don't have to specify an algorithm to do such a common task, "don't!" :-)
I am working on a project for a class. We are to write a quick-sort that transitions to a insertion sort at the specified value. Thats no problem, where I am now having difficulty is figuring out why I am not getting the performance I expect.
One of the requirements is that it must sort an array of 5,00,000 ints in under 1,300 ms (this is on standard machines, so CPU speed is not an issue). First of all, I can't get it to work on 5,000,000 because of a stack overflow error (too many recursive calls...). If I increase the heap size, I am still getting a lot slower than that.
Below is the code. Any hints anyone?
Thanks in advance
public class MyQuickSort {
public static void sort(int [] toSort, int moveToInsertion)
{
sort(toSort, 0, toSort.length - 1, moveToInsertion);
}
private static void sort(int[] toSort, int first, int last, int moveToInsertion)
{
if (first < last)
{
if ((last - first) < moveToInsertion)
{
insertionSort(toSort, first, last);
}
else
{
int split = quickHelper(toSort, first, last);
sort(toSort, first, split - 1, moveToInsertion);
sort(toSort, split + 1, last, moveToInsertion);
}
}
}
private static int quickHelper(int[] toSort, int first, int last)
{
sortPivot(toSort, first, last);
swap(toSort, first, first + (last - first)/2);
int left = first;
int right = last;
int pivotVal = toSort[first];
do
{
while ( (left < last) && (toSort[left] <= pivotVal))
{
left++;
}
while (toSort[right] > pivotVal)
{
right--;
}
if (left < right)
{
swap(toSort, left, right);
}
} while (left < right);
swap(toSort, first, right);
return right;
}
private static void sortPivot(int[] toSort, int first, int last)
{
int middle = first + (last - first)/2;
if (toSort[middle] < toSort[first]) swap(toSort, first, middle);
if (toSort[last] < toSort[middle]) swap(toSort, middle, last);
if (toSort[middle] < toSort[first]) swap(toSort, first, middle);
}
private static void insertionSort(int [] toSort, int first, int last)
{
for (int nextVal = first + 1; nextVal <= last; nextVal++)
{
int toInsert = toSort[nextVal];
int j = nextVal - 1;
while (j >= 0 && toInsert < toSort[j])
{
toSort[j + 1] = toSort[j];
j--;
}
toSort[j + 1] = toInsert;
}
}
private static void swap(int[] toSort, int i, int j)
{
int temp = toSort[i];
toSort[i] = toSort[j];
toSort[j] = temp;
}
}
I haven't tested this with your algorithm, and I don't know what kind of data set you're running with, but consider choosing a better pivot than the leftmost element. From Wikipedia on Quicksort:
Choice of pivot In very early versions
of quicksort, the leftmost element of
the partition would often be chosen as
the pivot element. Unfortunately, this
causes worst-case behavior on already
sorted arrays, which is a rather
common use-case. The problem was
easily solved by choosing either a
random index for the pivot, choosing
the middle index of the partition or
(especially for longer partitions)
choosing the median of the first,
middle and last element of the
partition for the pivot
Figured it out.
Actually, not my sorts fault at all. I was generating numbers between the range of 0-100 (for testing to make sure it was sorted). This resulted in tons of duplicates, which meant way to many partitions. Changing the range to min_int and max_int made it go a lot quicker.
Thanks for your help though :D
When the input array is large, its natural to expect that recursive functions run into stack overflow issues. which is what is happening here when you try with the above code. I would recommend you to write iterative Quicksort using your own stack. It should be fast because there is no stack frame allocations/deallocations done at run time. You won't run into stack overflow issues also. Performance also depends on at what point you are running insertion sort. I don't have a particular input size where insertion sort performs badly compared to quicksort. I would suggest you to try with different sizes and I'm sure you will notice difference.
You might also want to use binary search in insertion sort to improve performance. I don't know how much it improves when you run on smaller input but its a nice trick to play.
I don't want to share code because that doesn't make you learn how to convert recursive quicksort to iterative one. If you have problems in converting to iterative one let me know.