tukey's ninther for different shufflings of same data

tukey's ninther for different shufflings of same data - java

While implementing improvements to quicksort partitioning,I tried to use Tukey's ninther to find the pivot (borrowing almost everything from sedgewick's implementation in QuickX.java)
My code below gives different results each time the array of integers is shuffled.
import java.util.Random;
public class TukeysNintherDemo{
public static int tukeysNinther(Comparable[] a,int lo,int hi){
int N = hi - lo + 1;
int mid = lo + N/2;
int delta = N/8;
int m1 = median3a(a,lo,lo+delta,lo+2*delta);
int m2 = median3a(a,mid-delta,mid,mid+delta);
int m3 = median3a(a,hi-2*delta,hi-delta,hi);
int tn = median3a(a,m1,m2,m3);
return tn;
}
// return the index of the median element among a[i], a[j], and a[k]
private static int median3a(Comparable[] a, int i, int j, int k) {
return (less(a[i], a[j]) ?
(less(a[j], a[k]) ? j : less(a[i], a[k]) ? k : i) :
(less(a[k], a[j]) ? j : less(a[k], a[i]) ? k : i));
}
private static boolean less(Comparable x,Comparable y){
return x.compareTo(y) < 0;
}
public static void shuffle(Object[] a) {
Random random = new Random(System.currentTimeMillis());
int N = a.length;
for (int i = 0; i < N; i++) {
int r = i + random.nextInt(N-i); // between i and N-1
Object temp = a[i];
a[i] = a[r];
a[r] = temp;
}
}
public static void show(Comparable[] a){
int N = a.length;
if(N > 20){
System.out.format("a[0]= %d\n", a[0]);
System.out.format("a[%d]= %d\n",N-1, a[N-1]);
}else{
for(int i=0;i<N;i++){
System.out.print(a[i]+",");
}
}
System.out.println();
}
public static void main(String[] args) {
Integer[] a = new Integer[]{17,15,14,13,19,12,11,16,18};
System.out.print("data= ");
show(a);
int tn = tukeysNinther(a,0,a.length-1);
System.out.println("ninther="+a[tn]);
}
}
Running this a cuople of times gives
data= 11,14,12,16,18,19,17,15,13,
ninther=15
data= 14,13,17,16,18,19,11,15,12,
ninther=14
data= 16,17,12,19,18,13,14,11,15,
ninther=16
Will tuckey's ninther give different values for different shufflings of the same dataset? when I tried to find the median of medians by hand ,I found that the above calculations in the code are correct.. which means that the same dataset yield different results unlike a median of the dataset.Is this the proper behaviour? Can someone with more knowledge in statistics comment?

Tukey's ninther examines 9 items and calculates the median using only those.
For different random shuffles, you may very well get a different Tukey's ninther, because different items may be examined. After all, you always examine the same array slots, but a different shuffle may have put different items in those slots.
The key here is that Tukey's ninther is not the median of the given array. It is an attempted appromixation of the median, made with very little effort: we only have to read 9 items and make 12 comparisons to get it. This is much faster than getting the actual median, and has a smaller chance of resulting in an undesirable pivot compared to the 'median of three'. Note that the chance still exists.
Does this answer you question?
On a side note, does anybody know if quicksort using Tukey's ninther still requires shuffling? I'm assuming yes, but I'm not certain.

Related

Why sentinel search slower than linear?

I decided to reduce the number of comparisons required to find an element in an array. Here we replace the last element of the list with the search element itself and run a while loop to see if there exists any copy of the search element in the list and quit the loop as soon as we find the search element. See the code snippet for clarification.
import java.util.Random;
public class Search {
public static void main(String[] args) {
int n = 10000000;
int key = 10000;
int[] arr = generateRandomSize(n);
long start = System.nanoTime();
int find = sentinels(arr, key);
long end = System.nanoTime();
System.out.println(find);
System.out.println(end - start);
arr = generateRandomSize(n);
start = System.nanoTime();
find = linear(arr, key);
end = System.nanoTime();
System.out.println(find);
System.out.println(end - start);
}
public static int[] generateRandomSize(int n) {
int[] arr = new int[n];
Random rand = new Random();
for (int i = 0; i < n; ++i) {
arr[i] = rand.nextInt(5000);
}
return arr;
}
public static int linear(int[] a, int key) {
for(int i = 0; i < a.length; ++i) {
if (a[i] == key) {
return i;
}
}
return -1;
}
public static int sentinels(int[] a, int key) {
int n = a.length;
int last = a[n-1];
a[n-1] = key;
int i = 0;
while (a[i] != key) {
++i;
}
a[n-1] = last;
if ((i < n - 1) || a[n-1] == key ) {
return i;
}
return -1;
}
}
So using sentinel search we are not doing 10000000 comparisons like i < arr.length. But why linear search always shows up better performance?

You'd have to look at the byte code, and even deeper to see what hotspot is making from this. But I am quite sure that this statement is not true:
using sentinel search we are not doing 10000000 comparisons like i <
arr.length
Why? Because when you access a[i], i has to be bounds checked. In the linear case on the other hand, the optimiser can deduce that it can omit the bounds check since it "knows" that i>=0 (because of the loop structure) and also i<arr.length because it has already been tested in the loop condition.
So the sentinel approach just adds overhead.
This makes me think of a smart C++ optimisation (called "Template Meta Programming" and "Expression Templates") I did about 20 years ago that led to faster execution times (at cost of a much higher compilation time), and after the next compiler version was released, I discovered that the new version was able to optimise the original source to produce the exact same assembly - in short I should have rather used my time differently and stayed with the more readable (=easier to maintain) version of the code.

How to improve efficiency

Write a function:
class Solution{
public int solution(int[] A);
}
that, given an array A of N integers, returns the smallest positive integer(greater than 0)
that does not occur in A.
For example, given A = [1,3,6,4,1,2], the function should return 5.
Given A = [1,2,3], the function should return 4.
Given A = [-1, -3], the function should return 1.
Write an efficient algorithm for the following assumptions.
N is an integer within the range [1..100,000];
each element of array A is an integer within the range [-1,000,000..1,000,000].
I wrote the following algorithm in Java:
public class TestCodility {
public static void main(String args[]){
int a[] = {1,3,6,4,1,2};
//int a[] = {1,2,3};
//int b[] = {-1,-3};
int element = 0;
//checks if the array "a" was traversed until the last position
int countArrayLenght = 0;
loopExtern:
for(int i = 0; i < 1_000_000; i++){
element = i + 1;
countArrayLenght = 0;
loopIntern:
for(int j = 0; j < a.length; j++){
if(element == a[j]){
break loopIntern;
}
countArrayLenght++;
}
if(countArrayLenght == a.length && element > 0){
System.out.println("Smallest possible " + element);
break loopExtern;
}
}
}
}
It does the job but I am pretty sure that it is not efficient. So my question is, how to improve this algorithm so that it becomes efficient?

You should get a grasp on Big O, and runtime complexities.
Its a universal construct for better understanding the implementation of efficiency in code.
Check this website out, it shows the graph for runtime complexities in terms of Big O which can aid you in your search for more efficient programming.
http://bigocheatsheet.com/
However, long story short...
The least amount of operations and memory consumed by an arbitrary program is the most efficient way to achieve something you set out to do with your code.
You can make something more efficient by reducing redundancy in your algorithms and getting rid of any operation that does not need to occur to achieve what you are trying to do

Point is to sort your array and then iterate over it. With sorted array you can simply skip all negative numbers and then find minimal posible element that you need.
Here more general solution for your task:
import java.util.Arrays;
public class Main {
public static int solution(int[] A) {
int result = 1;
Arrays.sort(A);
for(int a: A) {
if(a > 0) {
if(result == a) {
result++;
} else if (result < a){
return result;
}
}
}
return result;
}
public static void main(String args[]){
int a[] = {1,3,6,4,1,2};
int b[] = {1,2,3};
int c[] = {-1,-3};
System.out.println("a) Smallest possible " + solution(a)); //prints 5
System.out.println("b) Smallest possible " + solution(b)); //prints 4
System.out.println("c) Smallest possible " + solution(c)); //prints 1
}
}
Complexity of that algorithm should be O(n*log(n))

The main idea is the same as Denis.
First sort, then process but using java8 feature.
There are few methods that may increase timings.(not very sure how efficient java 8 process them:filter,distinct and even take-while ... in the worst case you have here something similar with 3 full loops. One additional loop is for transforming array into stream). Overall you should get the same run-time complexity.
One advantage could be on verbosity, but also need some additional knowledge compared with Denis solution.
import java.util.function.Supplier;
import java.util.stream.IntStream;
public class AMin
{
public static void main(String args[])
{
int a[] = {-2,-3,1,2,3,-7,5,6};
int[] i = {1} ;
// get next integer starting from 1
Supplier<Integer> supplier = () -> i[0]++;
//1. transform array into specialized int-stream
//2. keep only positive numbers : filter
//3. keep no duplicates : distinct
//4. sort by natural order (ascending)
//5. get the maximum stream based on criteria(predicate) : longest consecutive numbers starting from 1
//6. get the number of elements from the longest "sub-stream" : count
long count = IntStream.of(a).filter(t->t>0).distinct().sorted().takeWhile(t->t== supplier.get()).count();
count = (count==0) ? 1 : ++count;
//print 4
System.out.println(count);
}
}

There are many solutions with O(n) space complexity and O(n) type complexity. You can convert array to;
set: array to set and for loop (1...N) check contains number or not. If not return number.
hashmap: array to map and for loop (1...N) check contains number or not. If not return number.
count array: convert given array to positive array count array like if arr[i] == 5, countArr[5]++, if arr[i] == 1, countArr[1]++ then check each item in countArr with for loop (1...N) whether greate than 1 or not. If not return it.
For now, looking more effective algoritm like #Ricola mentioned. Java solution with O(n) time complexity and O(1) space complexity:
static void swap(final int arr[], final int i,final int j){
final int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
static boolean isIndexInSafeArea(final int arr[], final int i){
return arr[i] > 0 && arr[i] - 1 < arr.length && arr[i] != i + 1 ;
}
static int solution(final int arr[]){
for (int i = 0; i < arr.length; i++) {
while (isIndexInSafeArea(arr,i) && arr[i] != arr[arr[i] - 1]) {
swap(arr, i, arr[i] - 1);
}
}
for (int i = 0; i < arr.length; i++) {
if (arr[i] != i + 1) {
return i+1;
}
}
return arr.length + 1;
}

Why my java program is showing StackOverflowError?

I have written a program to sort elements of an array based on the principle of quicksort. So what the program does is that it accepts an array, assumes the first element as the pivot and then compares it with rest of the elements of the array. If the element found greater then it will store at the last of another identical array(say b) and if the element is less than the smaller than it puts that element at the beginning of the array b. in this way the pivot will find its way to the middle of the array where the elements that are on the left-hand side are smaller and at the right-hand side are greater than the pivot. Then the elements of array b are copied to the main array and this whole function is called via recursion. This is the required code.
package sorting;
import java.util.*;
public class AshishSort_Splitting {
private static Scanner dogra;
public static void main(String[] args)
{
dogra=new Scanner(System.in);
System.out.print("Enter the number of elements ");
int n=dogra.nextInt();
int[] a=new int[n];
for(int i=n-1;i>=0;i--)
{
a[i]=i;
}
int start=0;
int end=n-1;
ashishSort(a,start,end);
for(int i=0;i<n;i++)
{
System.out.print(+a[i]+"\n");
}
}
static void ashishSort(int[]a,int start,int end)
{
int p;
if(start<end)
{
p=ashishPartion(a,start,end);
ashishSort(a,start,p-1);
ashishSort(a,p+1,end);
}
}
public static int ashishPartion(int[] a,int start,int end)
{
int n=start+end+1;
int[] b=new int[n];
int j=start;
int k=end;
int equal=a[start];
for(int i=start+1;i<=end;i++)
{
if(a[i]<equal)
{
b[j]=a[i];
j++;
}
else if(a[i]>equal)
{
b[k]=a[i];
k--;
}
}
b[j]=equal;
for(int l=0;l<=end;l++)
{
a[l]=b[l];
}
return j;
}
}
this code works fine when I enter the value of n up to 13930, but after that, it shows
Exception in thread "main" java.lang.StackOverflowError
at sorting.AshishSort_Splitting.ashishSort(AshishSort_Splitting.java:28)
at sorting.AshishSort_Splitting.ashishSort(AshishSort_Splitting.java:29)
I know the fact the error caused due to bad recursion but I tested my code multiple times and didn't find any better alternative. please help. thanks in advance.
EDIT: can someone suggest a way to overcome this.

I see perfrmance issues first. I see in your partition method:
int n = start+end+1
Right there, if the method was called on an int[1000] with start=900 and end=999, you are allocating an int[1900]... Not intended, I think...!
If you are really going to trash memory instead of an in-place partitioning,
assume
int n = end-start+1
instead for a much smaller allocation, and j and k indexes b[], they would be j=0 and k=n, and return start + j.
Second, your
else if(a[i]<equal)
is not necessary and causes a bug. A simple else suffice. If you don't replace the 0's in b[j..k] you'll be in trouble when you refill a[].
Finally, your final copy is bogus, from [0 to end] is beyond the bounds of the invocation [start..end], AND most importantly, there is usually nothing of interest in b[nearby 0] with your b[] as it is. The zone of b[] (in your version) is [start..end] (in my suggested version it would be [0..n-1])
Here is my version, but it still has the O(n) stack problem that was mentioned in the comments.
public static int ashishPartion(int[] a, int start, int end) {
int n = end-start + 1;
int[] b = new int[n];
int bj = 0;
int bk = n-1;
int pivot = a[start];
for (int i = start + 1; i <= end; i++) {
if (a[i] < pivot) {
b[bj++] = a[i];
} else {
b[bk--] = a[i];
}
}
b[bj] = pivot;
System.arraycopy(b, 0, a, start, n);
return start+bj;
}
If you are free to choose a sorting algo, then a mergesort would be more uniform on performance, with logN stack depth. Easy to implement.
Otherwise, you will have to de-recurse your algo, using a manual stack and that is a nice homework that I won't do for you... LOL

Timing quick sort algorithm

Hey I seem to be having a problem trying to implement some Java quick sort code over an array of 10,000 random numbers. I have a text file containing the numbers which are placed into an array, which is then passed to the sorting algorithm to be sorted. My aim is to time how long it takes to time the sorting increasing the numbers sorted each time using the timing loop I have. But for some reason using this code gives me a curved graph instead of a straight linear line. I know the timing loop and array code work fine so there seems to be a problem with the sorting code but can't seem to find anything! Any help is greatly appreciated thanks!
import java.io.*;
import java.util.*;
public class Quicksort {
public static void main(String args[]) throws IOException {
//Import the random integer text file into an integer array
File fil = new File("randomASC.txt");
FileReader inputFil = new FileReader(fil);
int [] myarray = new int [10000];
Scanner in = new Scanner(inputFil);
for(int q = 0; q < myarray.length; q++)
{
myarray[q] = in.nextInt();
}
in.close();
for (int n = 100; n < 10000; n += 100) {
long total = 0;
for (int r = 0; r < 10; ++r) {
long start = System.nanoTime ();
quickSort(myarray,0,n-1);
total += System.nanoTime() - start;
}
System.out.println (n + "," + (double)total / 10.0);
}
}
public static void quickSort(int[] a, int p, int r)
{
if(p<r)
{
int q=partition(a,p,r);
quickSort(a,p,q);
quickSort(a,q+1,r);
}
}
private static int partition(int[] a, int p, int r) {
int x = a[p];
int i = p-1 ;
int j = r+1 ;
while (true) {
i++;
while ( i< r && a[i] < x)
i++;
j--;
while (j>p && a[j] > x)
j--;
if (i < j)
swap(a, i, j);
else
return j;
}
}
private static void swap(int[] a, int i, int j) {
// TODO Auto-generated method stub
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}
}

Only the first iteration of the inner loop actually sorts the array that you've read from the file. All the subsequent iterations are applied to the already-sorted array.
But for some reason using this code gives me a curved graph instead of a straight linear line.
If you mean that the run time grows non-linearly in n, that's to be expected since quicksort is not a linear-time algorithm (no comparison sort is).
Your performance graph looks like a nice quadratic function:
You're getting quadratic rather than O(n log(n)) time due to your choice of pivot: since most of the time you're calling your function on a sorted array, your method of choosing the pivot means you're hitting the worst case every single time.

Tips implementing permutation algorithm in Java

As part of a school project, I need to write a function that will take an integer N and return a two-dimensional array of every permutation of the array {0, 1, ..., N-1}. The declaration would look like public static int[][] permutations(int N).
The algorithm described at http://www.usna.edu/Users/math/wdj/book/node156.html is how I've decided to implement this.
I wrestled for quite a while with arrays and arrays of ArrayLists and ArrayLists of ArrayLists, but so far I've been frustrated, especially trying to convert a 2d ArrayList to a 2d array.
So I wrote it in javascript. This works:
function allPermutations(N) {
// base case
if (N == 2) return [[0,1], [1,0]];
else {
// start with all permutations of previous degree
var permutations = allPermutations(N-1);
// copy each permutation N times
for (var i = permutations.length*N-1; i >= 0; i--) {
if (i % N == 0) continue;
permutations.splice(Math.floor(i/N), 0, permutations[Math.floor(i/N)].slice(0));
}
// "weave" next number in
for (var i = 0, j = N-1, d = -1; i < permutations.length; i++) {
// insert number N-1 at index j
permutations[i].splice(j, 0, N-1);
// index j is N-1, N-2, N-3, ... , 1, 0; then 0, 1, 2, ... N-1; then N-1, N-2, etc.
j += d;
// at beginning or end of the row, switch weave direction
if (j < 0 || j >= N) {
d *= -1;
j += d;
}
}
return permutations;
}
}
So what's the best strategy to port that to Java? Can I do it with just primitive arrays? Do I need an array of ArrayLists? Or an ArrayList of ArrayLists? Or is there some other data type that's better? Whatever I use, I need to be able to convert it back into a an array of primitive arrays.
Maybe's there a better algorithm that would simplify this for me...
Thank you in advance for your advice!

As you know the number of permutations beforehand (it's N!) and also you want/have to return an int[][] I would go for an array directly. You can declare it right at the beginning with correct dimensions and return it at the end. Thus you don't have to worry about converting it afterwards at all.

Since you pretty much had it completed on your own in javascript, I'll go ahead and give you the Java code for implementing Steinhaus' permutation algorithm. I basically just ported your code to Java, leaving as much of it the same as I could, including comments.
I tested it up to N = 7. I tried to have it calculate N = 8, but it's been running for almost 10 minutes already on a 2 GHz Intel Core 2 Duo processor, and still going, lol.
I'm sure if you really worked at it you could speed this up significantly, but even then you're probably only going to be able to squeeze maybe a couple more N-values out of it, unless of course you have access to a supercomputer ;-).
Warning - this code is correct, NOT robust. If you need it robust, which you usually don't for homework assignments, then that would be an exercise left to you. I would also recommend implementing it using Java Collections, simply because it would be a great way to learn the in's and out's of the Collections API.
There's several "helper" methods included, including one to print a 2d array. Enjoy!
Update: N = 8 took 25 minutes, 38 seconds.
Edit: Fixed N == 1 and N == 2.
public class Test
{
public static void main (String[] args)
{
printArray (allPermutations (8));
}
public static int[][] allPermutations (int N)
{
// base case
if (N == 2)
{
return new int[][] {{1, 2}, {2, 1}};
}
else if (N > 2)
{
// start with all permutations of previous degree
int[][] permutations = allPermutations (N - 1);
for (int i = 0; i < factorial (N); i += N)
{
// copy each permutation N - 1 times
for (int j = 0; j < N - 1; ++j)
{
// similar to javascript's array.splice
permutations = insertRow (permutations, i, permutations [i]);
}
}
// "weave" next number in
for (int i = 0, j = N - 1, d = -1; i < permutations.length; ++i)
{
// insert number N at index j
// similar to javascript's array.splice
permutations = insertColumn (permutations, i, j, N);
// index j is N-1, N-2, N-3, ... , 1, 0; then 0, 1, 2, ... N-1; then N-1, N-2, etc.
j += d;
// at beginning or end of the row, switch weave direction
if (j < 0 || j > N - 1)
{
d *= -1;
j += d;
}
}
return permutations;
}
else
{
throw new IllegalArgumentException ("N must be >= 2");
}
}
private static void arrayDeepCopy (int[][] src, int srcRow, int[][] dest,
int destRow, int numOfRows)
{
for (int row = 0; row < numOfRows; ++row)
{
System.arraycopy (src [srcRow + row], 0, dest [destRow + row], 0,
src[row].length);
}
}
public static int factorial (int n)
{
return n == 1 ? 1 : n * factorial (n - 1);
}
private static int[][] insertColumn (int[][] src, int rowIndex,
int columnIndex, int columnValue)
{
int[][] dest = new int[src.length][0];
for (int i = 0; i < dest.length; ++i)
{
dest [i] = new int [src[i].length];
}
arrayDeepCopy (src, 0, dest, 0, src.length);
int numOfColumns = src[rowIndex].length;
int[] rowWithExtraColumn = new int [numOfColumns + 1];
System.arraycopy (src [rowIndex], 0, rowWithExtraColumn, 0, columnIndex);
System.arraycopy (src [rowIndex], columnIndex, rowWithExtraColumn,
columnIndex + 1, numOfColumns - columnIndex);
rowWithExtraColumn [columnIndex] = columnValue;
dest [rowIndex] = rowWithExtraColumn;
return dest;
}
private static int[][] insertRow (int[][] src, int rowIndex,
int[] rowElements)
{
int srcRows = src.length;
int srcCols = rowElements.length;
int[][] dest = new int [srcRows + 1][srcCols];
arrayDeepCopy (src, 0, dest, 0, rowIndex);
arrayDeepCopy (src, rowIndex, dest, rowIndex + 1, src.length - rowIndex);
System.arraycopy (rowElements, 0, dest [rowIndex], 0, rowElements.length);
return dest;
}
public static void printArray (int[][] array)
{
for (int row = 0; row < array.length; ++row)
{
for (int col = 0; col < array[row].length; ++col)
{
System.out.print (array [row][col] + " ");
}
System.out.print ("\n");
}
System.out.print ("\n");
}
}

The java arrays are not mutable (in the sense, you cannot change their length). For direct translation of this recursive algorithm you probably want to use List interface (and probably LinkedList implementation as you want put numbers in the middle). That is List<List<Integer>>.
Beware the factorial grows rapidly: for N = 13, there is 13! permutations that is 6 227 020 800. But I guess you need to run it for only small values.
The algorithm above is quite complex, my solution would be:
create List<int[]> to hold all permutations
create one array of size N and fill it with identity ({1,2,3,...,N})
program function that in place creates next permutation in lexicographical ordering
repeat this until you get the identity again:
put a copy of the array at the end of the list
call the method to get next permutation.
If your program just needs to output all permutations, I would avoid to store them and just print them right away.
The algorithm to compute next permutation can be found on internet. Here for example

Use whatever you want, arrays or lists, but don't convert them - it just makes it harder. I can't tell what's better, probably I'd go for ArrayList<int[]>, since the outer List allows me to add the permutation easily and the inner array is good enough. That's just a matter of taste (but normally prefer lists, since they're much more flexible).

As per Howard's advice, I decided I didn't want to use anything but the primitive array type. The algorithm I initially picked was a pain to implement in Java, so thanks to stalker's advice, I went with the lexicographic-ordered algorithm described at Wikipedia. Here's what I ended up with:
public static int[][] generatePermutations(int N) {
int[][] a = new int[factorial(N)][N];
for (int i = 0; i < N; i++) a[0][i] = i;
for (int i = 1; i < a.length; i++) {
a[i] = Arrays.copyOf(a[i-1], N);
int k, l;
for (k = N - 2; a[i][k] >= a[i][k+1]; k--);
for (l = N - 1; a[i][k] >= a[i][l]; l--);
swap(a[i], k, l);
for (int j = 1; k+j < N-j; j++) swap(a[i], k+j, N-j);
}
return a;
}
private static void swap(int[] is, int k, int l) {
int tmp_k = is[k];
int tmp_l = is[l];
is[k] = tmp_l;
is[l] = tmp_k;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

tukey's ninther for different shufflings of same data - java

Related

Why sentinel search slower than linear?

How to improve efficiency

Why my java program is showing StackOverflowError?

Timing quick sort algorithm

Tips implementing permutation algorithm in Java

Categories

Resources