choosing pivot for the quickselect using median implemented in java? - java

I have found this code in github for quickselect algorithm otherwise known as order-statistics. This code works fine.
I do not understand medianOf3 method, which is supposed to arrange the first, middle and last index in sorted order. but actually it does not when I ouput the array, after calling the medianof3 method.
I could follow this method as to what it is doing except the last call of swap(list, centerIndex, rightIndex - 1);. can someone explain why this is called?
import java.util.Arrays;
/**
* This program determines the kth order statistic (the kth smallest number in a
* list) in O(n) time in the average case and O(n^2) time in the worst case. It
* achieves this through the Quickselect algorithm.
*
* #author John Kurlak <john#kurlak.com>
* #date 1/17/2013
*/
public class Quickselect {
/**
* Runs the program with an example list.
*
* #param args The command-line arguments.
*/
public static void main(String[] args) {
int[] list = { 3, 5, 9, 10, 7, 40, 23, 45, 21, 2 };
int k = 6;
int median = medianOf3(list, 0, list.length-1);
System.out.println(median);
System.out.println("list is "+ Arrays.toString(list));
Integer kthSmallest = quickselect(list, k);
if (kthSmallest != null) {
System.out.println("The kth smallest element in the list where k=" + k + " is " + kthSmallest + ".");
} else {
System.out.println("There is no kth smallest element in the list where k=" + k + ".");
}
System.out.println(Arrays.toString(list));
}
/**
* Determines the kth order statistic for the given list.
*
* #param list The list.
* #param k The k value to use.
* #return The kth order statistic for the list.
*/
public static Integer quickselect(int[] list, int k) {
return quickselect(list, 0, list.length - 1, k);
}
/**
* Recursively determines the kth order statistic for the given list.
*
* #param list The list.
* #param leftIndex The left index of the current sublist.
* #param rightIndex The right index of the current sublist.
* #param k The k value to use.
* #return The kth order statistic for the list.
*/
public static Integer quickselect(int[] list, int leftIndex, int rightIndex, int k) {
// Edge case
if (k < 1 || k > list.length) {
return null;
}
// Base case
if (leftIndex == rightIndex) {
return list[leftIndex];
}
// Partition the sublist into two halves
int pivotIndex = randomPartition(list, leftIndex, rightIndex);
int sizeLeft = pivotIndex - leftIndex + 1;
// Perform comparisons and recurse in binary search / quicksort fashion
if (sizeLeft == k) {
return list[pivotIndex];
} else if (sizeLeft > k) {
return quickselect(list, leftIndex, pivotIndex - 1, k);
} else {
return quickselect(list, pivotIndex + 1, rightIndex, k - sizeLeft);
}
}
/**
* Randomly partitions a set about a pivot such that the values to the left
* of the pivot are less than or equal to the pivot and the values to the
* right of the pivot are greater than the pivot.
*
* #param list The list.
* #param leftIndex The left index of the current sublist.
* #param rightIndex The right index of the current sublist.
* #return The index of the pivot.
*/
public static int randomPartition(int[] list, int leftIndex, int rightIndex) {
int pivotIndex = medianOf3(list, leftIndex, rightIndex);
int pivotValue = list[pivotIndex];
int storeIndex = leftIndex;
swap(list, pivotIndex, rightIndex);
for (int i = leftIndex; i < rightIndex; i++) {
if (list[i] <= pivotValue) {
swap(list, storeIndex, i);
storeIndex++;
}
}
swap(list, rightIndex, storeIndex);
return storeIndex;
}
/**
* Computes the median of the first value, middle value, and last value
* of a list. Also rearranges the first, middle, and last values of the
* list to be in sorted order.
*
* #param list The list.
* #param leftIndex The left index of the current sublist.
* #param rightIndex The right index of the current sublist.
* #return The index of the median value.
*/
public static int medianOf3(int[] list, int leftIndex, int rightIndex) {
int centerIndex = (leftIndex + rightIndex) / 2;
if (list[leftIndex] > list[rightIndex]) {
swap(list, leftIndex, centerIndex);
}
if (list[leftIndex] > list[rightIndex]) {
swap(list, leftIndex, rightIndex);
}
if (list[centerIndex] > list[rightIndex]) {
swap(list, centerIndex, rightIndex);
}
swap(list, centerIndex, rightIndex - 1);
return rightIndex - 1;
}
/**
* Swaps two elements in a list.
*
* #param list The list.
* #param index1 The index of the first element to swap.
* #param index2 The index of the second element to swap.
*/
public static void swap(int[] list, int index1, int index2) {
int temp = list[index1];
list[index1] = list[index2];
list[index2] = temp;
}
}

So I wrote the original code, but I did a poor job of making it readable.
Looking back at it, I don't think that line of code is necessary, but I think it is a small optimization. If we remove the line of code and return centerIndex, it seems to work without any issues.
Unfortunately, the optimization it performs should be refactored out of medianOf3() and moved into randomPartition().
Essentially, the optimization is that we want to "partially sort" our subarray as much as possible before partitioning it. The reason being this: the more sorted our data is, the better our future partition choices will be, which means our run time will hopefully be closer to O(n) than O(n^2). In the randomPartition() method, we move the pivot value to the far right of the subarray we're looking at. This moves the far right value into the middle of the subarray. This is not desired since the far right value is supposed to be a "larger value". My code tries to prevent this by placing the pivot index right next to the rightmost index. Then, when the pivot index is swapped with the rightmost index in randomPartition(), the "larger" rightmost value doesn't move into the middle of the subarray, but stays near the right.

Function medianOf3 is to define order of left median and right. Last statement
swap(list, centerIndex, rightIndex - 1)
is used to achieve following precondition of a sort:
However,
instead of recursing into both sides, as in quicksort, quickselect
only recurses into one side – the side with the element it is
searching for. This reduces the average complexity from O(n log n) (in
quicksort) to O(n) (in quickselect).
And then algorithm continues with:
for (int i = leftIndex; i < rightIndex; i++) {
if (list[i] <= pivotValue) {
swap(list, storeIndex, i);
storeIndex++;
}
}
in order to
that the values to the left of the pivot are less than or equal to
the pivot and the values to the right of the pivot are greater than
the pivot.

Related

Implementing a custom quick sort algorithm in Java

Mr professor has assigned us the task of writing a custom qucksort algorithm that we must implement using his outline ( I can't write my own from scratch, I must use his). He calls it smartQuickSort, and what makes this algorithm "custom" is that we have to calculate the averages on each side of the pivot point which is then used to sort the array. The algorithm uses a class called SmartQuickSortPivot which has int values left and right to hold the averages on the left/right side respectively.
I've written numerous quick sort algorithms in several languages but I cannot, for the life of me, get this one to sort correctly. I've spent 3 days rewriting and debugging this thing with no success, so i'm really hoping someone could help me out as i'm about to pull all of my hair out. Starting from the "skeleton code" he gave us (which includes commented instructions), this is my latest attempt:
/**
* split4SmartQuickSort splits the array (from first to last) into two subarrays, left and right, using the
* provided splitVal. It needs to calculate on the fly the average of all the elements of the left subarray
* and average of all elements of the right subarray, and store the two averages in the #pivot object.
* The following implementation is only copy of the code from
* the split function (from line 247) and you should enhance the function to implement what we need to calculate the averages
* as the pivot for the left and right subarray.
*
* Please be noted that splitVal may not even exist in the array since we choose the average.
* But this should not impact the correctness algorithm of splitting and sorting.
* #param first
* #param last
* #param splitVal
* #param leftRightAverages
* #return
*/
static int split4SmartQuickSort(int first, int last, int splitVal, SmartQuickSortPivot leftRightAverages)
{
int saveF = first;
int leftAvg = 0;
int leftCount = 0;
int rightAvg = 0;
int rightCount = 0;
boolean onCorrectSide;
first++;
do
{
onCorrectSide = true;
while (onCorrectSide) // move first toward last
if (values[first] > splitVal)
onCorrectSide = false;
else
{
//I think my average calculations here are wrong,
//but nothing I have tried works correctly
leftAvg += first;
leftCount++;
first++;
leftRightAverages.left = leftAvg / leftCount;
onCorrectSide = (first <= last);
}
onCorrectSide = (first <= last);
while (onCorrectSide) // move last toward first
if (values[last] <= splitVal)
onCorrectSide = false;
else
{
//I think my average calculations here are wrong,
//but nothing I have tried works correctly
rightAvg += last;
rightCount++;
last--;
leftRightAverages.right = rightAvg / rightCount;
onCorrectSide = (first <= last);
}
if (first < last)
{
swap(first, last);
first++;
last--;
}
} while (first <= last);
swap(saveF, last);
//I think this is one of my problems. Not sure
//what I should be returning here
return last;
}
/**
* Smart quick sort allows the use of a better splitting value (the pivot value) when to split the array
* into two. In this algorithm, we will use the average of the array (subarray) of all elements as the pivot.
*
* Each call to split (split4SmartQuickSort method), the splitValue will be passed and also the split4SmartQuickSort
* will return the averages of left subarray and right subarray. The two averages, each will be used for the
* following calls to smartQuickSort.
*
* #param first the first element
* #param last the last element
* #param splitVal the pivot value for splitting the array
*/
static void smartQuickSort(int first, int last, int splitVal)
{
if (first < last)
{
int splitPoint;
SmartQuickSortPivot leftRightAverages = new SmartQuickSortPivot();
splitPoint = split4SmartQuickSort(first, last, splitVal, leftRightAverages);
if (first <= splitPoint)
{
smartQuickSort(first, splitPoint - 1, leftRightAverages.left);
}
if (last >= splitPoint)
{
smartQuickSort(splitPoint + 1, last, leftRightAverages.right);
}
}
}
Here is the class used to store the averages to the left/right of the pivot point:
public class SmartQuickSortPivot {
public int left;
public int right;
}
And finally the main method used for testing:
public static void main(String[] args)
{
//initValues();
printValues();
System.out.println("values is sorted: " + isSorted());
System.out.println();
//quickSort(0, values.length - 1);
/** you can either compute the average first as the first pivot or simplify choose the first one as the pivot */
smartQuickSort(0, values.length - 1, values[4]);
printValues();
System.out.println("values is sorted: " + isSorted());
System.out.println();
}
}
The line I commented out, //quickSort(0, values.length - 1); is the algorithm I wrote that does not include the leftRightAverages object argument but is essentially the same, and it works perfectly, so i'm very confused why I can't get the "custom" smartQuickSort to work. For simplicity, I commented out the initValues() method and instead used a preset array that looks like this:
static int[] values = {2,5,1,66,89,44,32,51,8,6}; // values to be sorted
Things I've tried (and failed at):
1.) Move the lines leftRightAverages.left = leftAvg / leftCount; , leftRightAverages.right = rightAvg / rightCount; outside of the do-while loop, which (I think) due to the recursive nature of the function, eventually gives me a divide by zero RTE.
2.) Change the return value of split4SmartQuickSort() from last to different combinations of rightLeftAverages.left and rightLeftAverages.right, which causes a stack overflow from the recursion. This is where I am really confused, as I'm not exactly sure what this method should be returning in this particular implementation of quick sort (and more importantly, how to properly calculate it).
I think my issue here is twofold; I'm either not correctly calculating the averages on each side of the pivot (I've used numerous pivot points and none of them seem to make a difference), and I'm not returning the proper calculation from the split4SmartQuickSort() method itself. If I remove the rightLeftAverages object from the method argument and use a more traditional approach to quick sort, the algorithm works fine. This is why I think those 2 issues I listed are why the algorithm doesn't function correctly. the return value from split4SmartQuickSort() (I think) acts as the new pivot point for sorting, using the splitVal argument as the original pivot point.
Yes this is my homework, but I've put hours of genuine effort into this thing, with no luck. My prof doesn't answer emails over the weekend and his office hours are during one of my other classes, so I have nowhere else to turn.
I think that you have problems with this because it's hard in this case to use one integer split point. Here is why:
Imagine that at some of the algorithm you got 44, 51, 89, 66 to partition with the average of 62.5 ~ 62. If you use 62 as pivot element there is uncertainty what to return as a split point (because you can return index 1 or 2 (values 51 or 89 correspondingly)).
Let's suppose that you pick 2. This will lead to invalid algorithm (let's remember that the split point (pivot) a_j is the point that divides array into two subarrays such for each i < j a_i < a_j and for each k > j a_j < a_k) because 89 !< 66 and cannot be a split point.
What you kind of need to do is to return something in the middle as a split point. To do this you need to return SmartQuickSortPivot object instead of int and use its left/right values as ending/starting indexes for your left/right arrays.
import java.util.Arrays;
public class Temp {
public static class SmartQuickSortPivot {
public int left;
public int right;
}
static int[] values = {2,5,1,66,89,44,32,51,8,6}; // values to be sorted
/**
* split4SmartQuickSort splits the array (from first to last) into two subarrays, left and right, using the
* provided splitVal. It needs to calculate on the fly the average of all the elements of the left subarray
* and average of all elements of the right subarray, and store the two averages in the #pivot object.
* The following implementation is only copy of the code from
* the split function (from line 247) and you should enhance the function to implement what we need to calculate the averages
* as the pivot for the left and right subarray.
*
* Please be noted that splitVal may not even exist in the array since we choose the average.
* But this should not impact the correctness algorithm of splitting and sorting.
* #param first
* #param last
* #param splitVal
* #param leftRightAverages
* #return
*/
static SmartQuickSortPivot split4SmartQuickSort(int first, int last, int splitVal, SmartQuickSortPivot leftRightAverages)
{
int i = first,j = last;
int sumLeft = 0;
int sumRight = 0;
while (i < j) {
while (values[i] < splitVal){
sumLeft += values[i];
i++;
}
while (values[j] > splitVal){
sumRight += values[j];
j--;
}
if (i < j) {
swap(i, j);
}
}
leftRightAverages.left = (i - first == 0) ? values[first] : sumLeft / (i - first);
leftRightAverages.right = (last - j == 0) ? values[last] : sumRight / (last - j);
SmartQuickSortPivot smartQuickSortPivot = new SmartQuickSortPivot();
smartQuickSortPivot.left = i;
smartQuickSortPivot.right = j;
return smartQuickSortPivot;
}
private static void swap(int i, int j) {
int temp = values[i];
values[i] = values[j];
values[j] = temp;
}
/**
* Smart quick sort allows the use of a better splitting value (the pivot value) when to split the array
* into two. In this algorithm, we will use the average of the array (subarray) of all elements as the pivot.
*
* Each call to split (split4SmartQuickSort method), the splitValue will be passed and also the split4SmartQuickSort
* will return the averages of left subarray and right subarray. The two averages, each will be used for the
* following calls to smartQuickSort.
*
* #param first the first element
* #param last the last element
* #param splitVal the pivot value for splitting the array
*/
static void smartQuickSort(int first, int last, int splitVal)
{
if (first < last)
{
SmartQuickSortPivot splitPoint;
SmartQuickSortPivot leftRightAverages = new SmartQuickSortPivot();
splitPoint = split4SmartQuickSort(first, last, splitVal, leftRightAverages);
if (first < splitPoint.left)
{
smartQuickSort(first, splitPoint.left - 1, leftRightAverages.left);
}
if (last > splitPoint.right)
{
smartQuickSort(splitPoint.right + 1, last, leftRightAverages.right);
}
}
}
public static void main(String[] args)
{
/** you can either compute the average first as the first pivot or simplify choose the first one as the pivot */
smartQuickSort(0, values.length - 1, values[5]);
System.out.println(Arrays.toString(values));
}
}
Thanks to the great advice below, I got the algorithm working, but it still was not sorting duplicates correctly (infinite loop when dupes encountered). After playing with the code, I now have a complete working algorithm. The change was in the split4SmartQuickSort() only, so here is that method updated:
static SmartQuickSortPivot split4SmartQuickSort
(int first, int last, int splitVal, SmartQuickSortPivot leftRightAverages)
{
int f = first;
int l = last;
int sumLeft = 0;
int sumRight = 0;
while (f < l)
{
while (values[f] < splitVal)
{
sumLeft += values[f];
f++;
}
while (values[l] > splitVal)
{
sumRight += values[l];
l--;
}
if (f <= l)
{
swap(f, l);
//handling duplicates in the list
if (values[f] == values[l])
{
f++;
}
}
}
if (f - first == 0)
{
leftRightAverages.left = values[first];
}
else
{
leftRightAverages.left = sumLeft / (f - first);
}
if (last - l == 0)
{
leftRightAverages.right = values[last];
}
else
{
leftRightAverages.right = sumRight / (last - l);
}
//create SmartQuickSortPivot object to be returned. Used in
//smartQuickSort as the split point for sorting
SmartQuickSortPivot sqsp = new SmartQuickSortPivot();
sqsp.left = f;
sqsp.right = l;
return sqsp;
}
And finally, the smartQuickSort() algorithm:
static void smartQuickSort(int first, int last, int splitVal)
{
if (first < last)
{
SmartQuickSortPivot splitPoint;
SmartQuickSortPivot leftRightAverages = new SmartQuickSortPivot();
splitPoint = split4SmartQuickSort(first, last, splitVal, leftRightAverages);
if (first <= splitPoint.left)
{
smartQuickSort(first, splitPoint.left - 1, leftRightAverages.left);
}
if (last >= splitPoint.right)
{
smartQuickSort(splitPoint.right + 1, last, leftRightAverages.right);
}
}
}
Thanks again to #shyyko-serhiy, as they deserve most of the credit for getting this thing working :)

Quick Sort Comparison Count

I'm trying to use these quick sort methods to figure out how many comparison are happening. We are given a global variable that does the counting but we aren't able to use the global variable when we hand it in. Instead we need to recursively count the comparisons. Now I am trying to figure out how to do that and I'm not looking for the answer, I'm trying to get on the right steps on how to solve this problem. I've been trying things for a couple hours now and no luck.
static int qSortCompares = 0; // GLOBAL var declaration
/**
* The swap method swaps the contents of two elements in an int array.
*
* #param The array containing the two elements.
* #param a The subscript of the first element.
* #param b The subscript of the second element.
*/
private static void swap(int[] array, int a, int b) {
int temp;
temp = array[a];
array[a] = array[b];
array[b] = temp;
}
public static void quickSort(int array[]) {
qSortCompares = 0;
int qSCount = 0;
doQuickSort(array, 0, array.length - 1);
}
/**
* The doQuickSort method uses the QuickSort algorithm to sort an int array.
*
* #param array The array to sort.
* #param start The starting subscript of the list to sort
* #param end The ending subscript of the list to sort
*/
private static int doQuickSort(int array[], int start, int end) {
int pivotPoint;
int qSTotal = 0;
if (start < end) {
// Get the pivot point.
pivotPoint = partition(array, start, end);
// Note - only one +/=
// Sort the first sub list.
doQuickSort(array, start, pivotPoint - 1);
// Sort the second sub list.
doQuickSort(array, pivotPoint + 1, end);
}
return qSTotal;
}
/**
* The partition method selects a pivot value in an array and arranges the
* array into two sub lists. All the values less than the pivot will be
* stored in the left sub list and all the values greater than or equal to
* the pivot will be stored in the right sub list.
*
* #param array The array to partition.
* #param start The starting subscript of the area to partition.
* #param end The ending subscript of the area to partition.
* #return The subscript of the pivot value.
*/
private static int partition(int array[], int start, int end) {
int pivotValue; // To hold the pivot value
int endOfLeftList; // Last element in the left sub list.
int mid; // To hold the mid-point subscript
int qSCount = 0;
// see http://www.cs.cmu.edu/~fp/courses/15122-s11/lectures/08-qsort.pdf
// for discussion of middle point - This improves the almost sorted cases
// of using quicksort
// Find the subscript of the middle element.
// This will be our pivot value.
mid = (start + end) / 2;
// Swap the middle element with the first element.
// This moves the pivot value to the start of
// the list.
swap(array, start, mid);
// Save the pivot value for comparisons.
pivotValue = array[start];
// For now, the end of the left sub list is
// the first element.
endOfLeftList = start;
// Scan the entire list and move any values that
// are less than the pivot value to the left
// sub list.
for (int scan = start + 1; scan <= end; scan++) {
qSortCompares++;
qSCount++;
if (array[scan] < pivotValue) {
endOfLeftList++;
// System.out.println("Pivot=" + pivotValue + "=" + endOfLeftList + ":" + scan);
swap(array, endOfLeftList, scan);
}
}
// Move the pivot value to end of the
// left sub list.
swap(array, start, endOfLeftList);
// Return the subscript of the pivot value.
return endOfLeftList;
}
/**
* Print an array to the Console
*
* #param A
*/
public static void printArray(int[] A) {
for (int i = 0; i < A.length; i++) {
System.out.printf("%5d ", A[i]);
}
System.out.println();
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
final int SIZE = 10;
int[] A = new int[SIZE];
// Create random array with elements in the range of 0 to SIZE - 1;
System.out.printf("Lab#2 Sorting Algorithm Performance Analysis\n\n");
for (int i = 0; i < SIZE; i++) {
A[i] = (int) (Math.random() * SIZE);
}
System.out.printf("Unsorted Data = %s\n", Arrays.toString(A));
int[] B;
// Measure comparisons and time each of the 4 sorts
B = Arrays.copyOf(A, A.length); // Need to do this before each sort
long startTime = System.nanoTime();
quickSort(B);
long timeRequired = (System.nanoTime() - startTime) / 1000;
System.out.printf("Sorted Data = %s\n", Arrays.toString(B));
System.out.printf("Number of compares for quicksort = %8d time = %8d us Ratio = %6.1f compares/us\n", qSortCompares, timeRequired, qSortCompares / (double) timeRequired);
// Add code for the other sorts here ...
}
The instructions give some hints but I am still lost:
The quicksort method currently counts the # of comparisons by using a global variable. This is not a good programming technique. Modify the quicksort method to count comparisons by passing a parameter. This is a little trickier as the comparisons are done in the partition method. You should be able to see that the number of comparisons can be determined before the call to the partition method. You will need to return this value from the Quicksort method and modify the quickSort header to pass this value into each recursive call. You will need to add the counts recursively.
As an alternative to the recursive counting, you can leave the code as is and complete the lab without the modification.
The way I have been looking at this assignment I made a variable in the partition method called qSCount which when it is called will count how many comparisons were made. However I can't use that variable because I am not returning it. And I'm not sure how I would use recursion in that state. My idea was after each time qSCount had a value I could somehow store it in doQuickSort method under qSTotal. But then again the hint is saying I need to make a parameter in quicksort so I am all sorts of confused.
In order to count something with a recursive method (without a global variable) we need to return it. You have:
private static int doQuickSort(int array[], int start, int end)
This is the right idea. But since the comparisons actually happen within
private static int partition(int array[], int start, int end)
you need to have partition return how many comparisons were made.
This leaves us with two options:
We can either create or use an existing Pair class to have this method return a pair of integers instead of just one (the pivot).
We can create a counter class and pass a counter object around and have the counting done there. This eliminates the need to return another value since the parameter could be used to increase the count.

How to implement a recursive mergeSort with generics in java?

I'm implementing a mergeSort function. I understand the logic of divide and conquer, but the actual merging part is confusing me. This is a past homework problem, but I'm trying to understand it.
/**
* Implement merge sort.
*
* It should be:
* stable
* Have a worst case running time of:
* O(n log n)
*
* And a best case running time of:
* O(n log n)
*
* You can create more arrays to run mergesort, but at the end,
* everything should be merged back into the original T[]
* which was passed in.
*
* ********************* IMPORTANT ************************
* FAILURE TO DO SO MAY CAUSE ClassCastException AND CAUSE
* YOUR METHOD TO FAIL ALL THE TESTS FOR MERGE SORT
* ********************************************************
*
* Any duplicates in the array should be in the same relative position
* after sorting as they were before sorting.
*
* #throws IllegalArgumentException if the array or comparator is null
* #param <T> data type to sort
* #param arr the array to be sorted
* #param comparator the Comparator used to compare the data in arr
*/
For this method, the parameters, public, static, and generics can't be changed. I don't know how to do the recursive merge function.
public static <T> void mergesort(T[] arr, Comparator<T> comparator) {
if (arr == null || comparator == null) {
throw new IllegalArgumentException("Null arguments were passed.");
}
if (arr.length >= 2) {
//Midpoint from which we will split the array.
int middle = arr.length / 2;
//Each half of the split array
T[] left = (T[]) new Object[middle];
T[] right = (T[]) new Object[arr.length - middle];
//Copy items from original into each half
for (int i = 0; i < middle; i++) {
left[i] = arr[i];
}
for (int i = middle; i < length; i++) {
right[i] = arr[i];
}
//Keep splitting until length is 1
mergesort(left, comparator);
mergesort(right, comparator);
//merge each array back into original which would now be sorted.
merge(left, right, middle, arr, comparator);
merge(right, middle, arr, comparator);
}
}
private static <T> T[] merge(T[] left, T[] right, int middle, T[] arr,
Comparator<T>
comparator) {
int i = 1, j = middle + 1, k = 1;
while (i <= middle && j <= arr.length) {
arr[k++] = (comparator.compare(arr[k], partioned[i]) < 0)
? arr[j++] : partioned[i++];
}
while (i <= middle) {
arr[k++] = partioned[k++];
}
}
Here's one possible solution. I don't remember if there's a better way to do this, since I generally just use Collections.sort().
Note that there's no need to return a value, since the contents of the original array are going to be modified.
There's also no need to pass in the middle index.
private static <T> void merge(T[] left, T[] right, T[] dest) {
if (left.length + right.length != dest.length)
throw new IllegalArgumentException(
"left length + right length must equal destination length");
int leftIndex = 0;
int rightIndex = 0;
int destIndex = 0;
while (destIndex < dest.length) {
if (leftIndex >= left.length) //no more entries in left array, use right
dest[destIndex++] = right[rightIndex++];
else if (rightIndex >= right.length) // no more entries in right array, use left
dest[destIndex++] = left[leftIndex++];
else if (left[leftIndex] < right[rightIndex]) //otherwise pick the lower value
dest[destIndex++] = left[leftIndex++];
else
dest[destIndex++] = right[rightIndex++];
}
}

QuickSort (select) method, ArrayIndex out of bounds error

I am working on a coms assignment and have hit a huge wall. We are working with quicksort / partition.
/**
* An implementation of the SELECT algorithm of Figure 1 of the project
* specification. Returns the ith order statistic in the subarray
* arr[first], ..., arr[last]. The method must run in O(n) expected time,
* where n = (last - first + 1).
*
* #param arr
* - The data to search in
* #param first
* - The leftmost boundary of the subarray (inclusive)
* #param last
* - The rightmost boundary of the subarray (inclusive)
* #param i
* - The requested order statistic to find
* #return - The ith order statistic in the subarray
*
* #throws IllegalArgumentException
* - If i < 1 or i > n
*/
public static int select(int[] arr, int first, int last, int i) {
if(first == last){
return first;
}
int pivot = (arr.length / 2);
pivot = partition(arr, first, last, pivot-1);
if(i == pivot){
return arr[i];
} else if(i < pivot){
return select(arr, first, pivot-1, i);
} else {
return select(arr, pivot+1, last, i);
}
}
The above is a 'select' method, I believe it is an implementation of quicksort. When I run it with my partition method, I keep getting errors of ArrayIndex out of bounds, I am wondering if the way I chose my pivot is causing these errors...
The below is a partition and a swap method I have written. The partition method works from what I can tell, I made a int[] of 10 and ran it multiple times, using different pivot points. Each time it threw out the arr sorted the way it should be.
public static int partition(int[] arr, int first, int last, int pivot){
int pValue = arr[pivot];
swap(arr, last, pivot);
int temp = first;
for (int i = first; i < last; i++) {
if(arr[i] < pValue){
swap(arr, temp, i);
temp++;
}
}
swap(arr, last, temp);
return arr[pivot];
}
public static void swap(int[] arr, int a, int b){
int temp = arr[a];
arr[a] = arr[b];
arr[b] = temp;
}
The rest of the assignment builds off of the select method, and I have been slamming me head into a wall for two days to get it to work correctly.... To the point actually where I am second guessing my choice in degree. I guess as a secondary question how many of you guys hit these walls and lost all confidence in yourselves? The last few assignments, with a bit of help, made sense, but now I am here and just lost in the dark...
PS. Sorry if I sound all sappy, it's been a rough weekend and the above has been a huge pain.
First off, the best way to debug this at this point is put print statements before every array access that prints what index is being looked at. This will tell you right away where the error is happening. The following is a guess at what might be happening. There may be other culprits.
What happens if the pivot is the first or last element (like which will always happen on an array of 2 elements)? Then when you select on pivot + 1 or pivot - 1, as you do at the end of the select function, you'll go off the end of the array. Or when you partition on pivot - 1. It seems you need to flesh out the base case your recursive function. This is the most likely culprit.
You're picking your pivot based on the length of the entire array rather than the length of the subarray, which is from first to last instead of from 0 to arr.length. If first and last have always been the the first and last element of the entire array, then I doubt this is the problem (though it still will be a problem when tested more thoroughly).

Median of Medians in Java

I am trying to implement Median of Medians in Java for a method like this:
Select(Comparable[] list, int pos, int colSize, int colMed)
list is a list of values of which to find a specified position
pos is the specified position
colSize is the size of the columns that I create in the first stage
colMed is the position in those columns that I use as the medX
I am not sure which sorting algorithm would be the best to use or how to implement this exactly..
I don't know if you still need this problem solved, but http://www.ics.uci.edu/~eppstein/161/960130.html has an algorithm:
select(L,k)
{
if (L has 10 or fewer elements)
{
sort L
return the element in the kth position
}
partition L into subsets S[i] of five elements each
(there will be n/5 subsets total).
for (i = 1 to n/5) do
x[i] = select(S[i],3)
M = select({x[i]}, n/10)
partition L into L1<M, L2=M, L3>M
if (k <= length(L1))
return select(L1,k)
else if (k > length(L1)+length(L2))
return select(L3,k-length(L1)-length(L2))
else return M
}
Good luck!
The question asked for Java, so here it is
import java.util.*;
public class MedianOfMedians {
private MedianOfMedians() {
}
/**
* Returns median of list in linear time.
*
* #param list list to search, which may be reordered on return
* #return median of array in linear time.
*/
public static Comparable getMedian(ArrayList<Comparable> list) {
int s = list.size();
if (s < 1)
throw new IllegalArgumentException();
int pos = select(list, 0, s, s / 2);
return list.get(pos);
}
/**
* Returns position of k'th largest element of sub-list.
*
* #param list list to search, whose sub-list may be shuffled before
* returning
* #param lo first element of sub-list in list
* #param hi just after last element of sub-list in list
* #param k
* #return position of k'th largest element of (possibly shuffled) sub-list.
*/
public static int select(ArrayList<Comparable> list, int lo, int hi, int k) {
if (lo >= hi || k < 0 || lo + k >= hi)
throw new IllegalArgumentException();
if (hi - lo < 10) {
Collections.sort(list.subList(lo, hi));
return lo + k;
}
int s = hi - lo;
int np = s / 5; // Number of partitions
for (int i = 0; i < np; i++) {
// For each partition, move its median to front of our sublist
int lo2 = lo + i * 5;
int hi2 = (i + 1 == np) ? hi : (lo2 + 5);
int pos = select(list, lo2, hi2, 2);
Collections.swap(list, pos, lo + i);
}
// Partition medians were moved to front, so we can recurse without making another list.
int pos = select(list, lo, lo + np, np / 2);
// Re-partition list to [<pivot][pivot][>pivot]
int m = triage(list, lo, hi, pos);
int cmp = lo + k - m;
if (cmp > 0)
return select(list, m + 1, hi, k - (m - lo) - 1);
else if (cmp < 0)
return select(list, lo, m, k);
return lo + k;
}
/**
* Partition sub-list into 3 parts [<pivot][pivot][>pivot].
*
* #param list
* #param lo
* #param hi
* #param pos input position of pivot value
* #return output position of pivot value
*/
private static int triage(ArrayList<Comparable> list, int lo, int hi,
int pos) {
Comparable pivot = list.get(pos);
int lo3 = lo;
int hi3 = hi;
while (lo3 < hi3) {
Comparable e = list.get(lo3);
int cmp = e.compareTo(pivot);
if (cmp < 0)
lo3++;
else if (cmp > 0)
Collections.swap(list, lo3, --hi3);
else {
while (hi3 > lo3 + 1) {
assert (list.get(lo3).compareTo(pivot) == 0);
e = list.get(--hi3);
cmp = e.compareTo(pivot);
if (cmp <= 0) {
if (lo3 + 1 == hi3) {
Collections.swap(list, lo3, lo3 + 1);
lo3++;
break;
}
Collections.swap(list, lo3, lo3 + 1);
assert (list.get(lo3 + 1).compareTo(pivot) == 0);
Collections.swap(list, lo3, hi3);
lo3++;
hi3++;
}
}
break;
}
}
assert (list.get(lo3).compareTo(pivot) == 0);
return lo3;
}
}
Here is a Unit test to check it works...
import java.util.*;
import junit.framework.TestCase;
public class MedianOfMedianTest extends TestCase {
public void testMedianOfMedianTest() {
Random r = new Random(1);
int n = 87;
for (int trial = 0; trial < 1000; trial++) {
ArrayList list = new ArrayList();
int[] a = new int[n];
for (int i = 0; i < n; i++) {
int v = r.nextInt(256);
a[i] = v;
list.add(v);
}
int m1 = (Integer)MedianOfMedians.getMedian(list);
Arrays.sort(a);
int m2 = a[n/2];
assertEquals(m1, m2);
}
}
}
However, the above code is too slow for practical use.
Here is a simpler way to get the k'th element that does not guarantee performance, but is much faster in practice:
/**
* Returns position of k'th largest element of sub-list.
*
* #param list list to search, whose sub-list may be shuffled before
* returning
* #param lo first element of sub-list in list
* #param hi just after last element of sub-list in list
* #param k
* #return position of k'th largest element of (possibly shuffled) sub-list.
*/
static int select(double[] list, int lo, int hi, int k) {
int n = hi - lo;
if (n < 2)
return lo;
double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot
// Triage list to [<pivot][=pivot][>pivot]
int nLess = 0, nSame = 0, nMore = 0;
int lo3 = lo;
int hi3 = hi;
while (lo3 < hi3) {
double e = list[lo3];
int cmp = compare(e, pivot);
if (cmp < 0) {
nLess++;
lo3++;
} else if (cmp > 0) {
swap(list, lo3, --hi3);
if (nSame > 0)
swap(list, hi3, hi3 + nSame);
nMore++;
} else {
nSame++;
swap(list, lo3, --hi3);
}
}
assert (nSame > 0);
assert (nLess + nSame + nMore == n);
assert (list[lo + nLess] == pivot);
assert (list[hi - nMore - 1] == pivot);
if (k >= n - nMore)
return select(list, hi - nMore, hi, k - nLess - nSame);
else if (k < nLess)
return select(list, lo, lo + nLess, k);
return lo + k;
}
I agree with the answer/solution from Chip Uni. I will just comment the sorting part and provide some further explanations:
You do not need any sorting algorithm. The algorithm is similar to quicksort, with the difference that only one partition is solved (left or right). We just need to find an optimal pivot so that left and right parts are as equal as possible, which would mean N/2 + N/4 + N/8 ... = 2N iterations, and thus the time complexity of O(N). The above algorithms, called median of medians, computes the median of medians of 5, which turns out to yield linear time complexity of the algorithm.
However, sorting algorithm is used when the range being searched for nth smallest/greatest element (which I suppose you are implementing with this algorithm) in order to speed up the algorithm. Insertion sort is particularly fast on small arrays up to 7 to 10 elements.
Implementation note:
M = select({x[i]}, n/10)
actually means taking the median of all those medians of 5-element groups. You can accomplish that by creating another array of size (n - 1)/5 + 1 and call the same algorithm recursively to find the n/10-th element (which is median of the newly created array).
#android developer :
for (i = 1 to n/5) do
x[i] = select(S[i],3)
is really
for (i = 1 to ceiling(n/5) do
x[i] = select(S[i],3)
with a ceiling function appropriate for your data(eg in java 2 doubles)
This affects the median as well wrt simply taking n/10, but we are finding closest to the mean that occurs in the array, not the true mean.
Another note is that S[i] may have fewer than 3 elements, so we want to find the median with respect to length; passing it into select with k=3 won't always work.( eg n =11, we have 3 subgroups 2 w 5, 1 w 1 element)
I know it's a very old post and you might not remember about it any more. But I wonder did you measure the running time of your implementation when you implemented it?
I tried this algorithm and compare it with the simple approach using java sorting method (Arrays.sort() ), then pick the kth element from sorted array. The result that I received is that this algorithm only out-beat java sorting algorithm when the size of the array is about hundred thousand elements or more. And it's only about 2 or 3 times faster, which is obviously not log(n) time faster.
Do you have any comment on that?

Categories