Hey I seem to be having a problem trying to implement some Java quick sort code over an array of 10,000 random numbers. I have a text file containing the numbers which are placed into an array, which is then passed to the sorting algorithm to be sorted. My aim is to time how long it takes to time the sorting increasing the numbers sorted each time using the timing loop I have. But for some reason using this code gives me a curved graph instead of a straight linear line. I know the timing loop and array code work fine so there seems to be a problem with the sorting code but can't seem to find anything! Any help is greatly appreciated thanks!
import java.io.*;
import java.util.*;
public class Quicksort {
public static void main(String args[]) throws IOException {
//Import the random integer text file into an integer array
File fil = new File("randomASC.txt");
FileReader inputFil = new FileReader(fil);
int [] myarray = new int [10000];
Scanner in = new Scanner(inputFil);
for(int q = 0; q < myarray.length; q++)
{
myarray[q] = in.nextInt();
}
in.close();
for (int n = 100; n < 10000; n += 100) {
long total = 0;
for (int r = 0; r < 10; ++r) {
long start = System.nanoTime ();
quickSort(myarray,0,n-1);
total += System.nanoTime() - start;
}
System.out.println (n + "," + (double)total / 10.0);
}
}
public static void quickSort(int[] a, int p, int r)
{
if(p<r)
{
int q=partition(a,p,r);
quickSort(a,p,q);
quickSort(a,q+1,r);
}
}
private static int partition(int[] a, int p, int r) {
int x = a[p];
int i = p-1 ;
int j = r+1 ;
while (true) {
i++;
while ( i< r && a[i] < x)
i++;
j--;
while (j>p && a[j] > x)
j--;
if (i < j)
swap(a, i, j);
else
return j;
}
}
private static void swap(int[] a, int i, int j) {
// TODO Auto-generated method stub
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}
}
Only the first iteration of the inner loop actually sorts the array that you've read from the file. All the subsequent iterations are applied to the already-sorted array.
But for some reason using this code gives me a curved graph instead of a straight linear line.
If you mean that the run time grows non-linearly in n, that's to be expected since quicksort is not a linear-time algorithm (no comparison sort is).
Your performance graph looks like a nice quadratic function:
You're getting quadratic rather than O(n log(n)) time due to your choice of pivot: since most of the time you're calling your function on a sorted array, your method of choosing the pivot means you're hitting the worst case every single time.
Related
I decided to reduce the number of comparisons required to find an element in an array. Here we replace the last element of the list with the search element itself and run a while loop to see if there exists any copy of the search element in the list and quit the loop as soon as we find the search element. See the code snippet for clarification.
import java.util.Random;
public class Search {
public static void main(String[] args) {
int n = 10000000;
int key = 10000;
int[] arr = generateRandomSize(n);
long start = System.nanoTime();
int find = sentinels(arr, key);
long end = System.nanoTime();
System.out.println(find);
System.out.println(end - start);
arr = generateRandomSize(n);
start = System.nanoTime();
find = linear(arr, key);
end = System.nanoTime();
System.out.println(find);
System.out.println(end - start);
}
public static int[] generateRandomSize(int n) {
int[] arr = new int[n];
Random rand = new Random();
for (int i = 0; i < n; ++i) {
arr[i] = rand.nextInt(5000);
}
return arr;
}
public static int linear(int[] a, int key) {
for(int i = 0; i < a.length; ++i) {
if (a[i] == key) {
return i;
}
}
return -1;
}
public static int sentinels(int[] a, int key) {
int n = a.length;
int last = a[n-1];
a[n-1] = key;
int i = 0;
while (a[i] != key) {
++i;
}
a[n-1] = last;
if ((i < n - 1) || a[n-1] == key ) {
return i;
}
return -1;
}
}
So using sentinel search we are not doing 10000000 comparisons like i < arr.length. But why linear search always shows up better performance?
You'd have to look at the byte code, and even deeper to see what hotspot is making from this. But I am quite sure that this statement is not true:
using sentinel search we are not doing 10000000 comparisons like i <
arr.length
Why? Because when you access a[i], i has to be bounds checked. In the linear case on the other hand, the optimiser can deduce that it can omit the bounds check since it "knows" that i>=0 (because of the loop structure) and also i<arr.length because it has already been tested in the loop condition.
So the sentinel approach just adds overhead.
This makes me think of a smart C++ optimisation (called "Template Meta Programming" and "Expression Templates") I did about 20 years ago that led to faster execution times (at cost of a much higher compilation time), and after the next compiler version was released, I discovered that the new version was able to optimise the original source to produce the exact same assembly - in short I should have rather used my time differently and stayed with the more readable (=easier to maintain) version of the code.
Write a function:
class Solution{
public int solution(int[] A);
}
that, given an array A of N integers, returns the smallest positive integer(greater than 0)
that does not occur in A.
For example, given A = [1,3,6,4,1,2], the function should return 5.
Given A = [1,2,3], the function should return 4.
Given A = [-1, -3], the function should return 1.
Write an efficient algorithm for the following assumptions.
N is an integer within the range [1..100,000];
each element of array A is an integer within the range [-1,000,000..1,000,000].
I wrote the following algorithm in Java:
public class TestCodility {
public static void main(String args[]){
int a[] = {1,3,6,4,1,2};
//int a[] = {1,2,3};
//int b[] = {-1,-3};
int element = 0;
//checks if the array "a" was traversed until the last position
int countArrayLenght = 0;
loopExtern:
for(int i = 0; i < 1_000_000; i++){
element = i + 1;
countArrayLenght = 0;
loopIntern:
for(int j = 0; j < a.length; j++){
if(element == a[j]){
break loopIntern;
}
countArrayLenght++;
}
if(countArrayLenght == a.length && element > 0){
System.out.println("Smallest possible " + element);
break loopExtern;
}
}
}
}
It does the job but I am pretty sure that it is not efficient. So my question is, how to improve this algorithm so that it becomes efficient?
You should get a grasp on Big O, and runtime complexities.
Its a universal construct for better understanding the implementation of efficiency in code.
Check this website out, it shows the graph for runtime complexities in terms of Big O which can aid you in your search for more efficient programming.
http://bigocheatsheet.com/
However, long story short...
The least amount of operations and memory consumed by an arbitrary program is the most efficient way to achieve something you set out to do with your code.
You can make something more efficient by reducing redundancy in your algorithms and getting rid of any operation that does not need to occur to achieve what you are trying to do
Point is to sort your array and then iterate over it. With sorted array you can simply skip all negative numbers and then find minimal posible element that you need.
Here more general solution for your task:
import java.util.Arrays;
public class Main {
public static int solution(int[] A) {
int result = 1;
Arrays.sort(A);
for(int a: A) {
if(a > 0) {
if(result == a) {
result++;
} else if (result < a){
return result;
}
}
}
return result;
}
public static void main(String args[]){
int a[] = {1,3,6,4,1,2};
int b[] = {1,2,3};
int c[] = {-1,-3};
System.out.println("a) Smallest possible " + solution(a)); //prints 5
System.out.println("b) Smallest possible " + solution(b)); //prints 4
System.out.println("c) Smallest possible " + solution(c)); //prints 1
}
}
Complexity of that algorithm should be O(n*log(n))
The main idea is the same as Denis.
First sort, then process but using java8 feature.
There are few methods that may increase timings.(not very sure how efficient java 8 process them:filter,distinct and even take-while ... in the worst case you have here something similar with 3 full loops. One additional loop is for transforming array into stream). Overall you should get the same run-time complexity.
One advantage could be on verbosity, but also need some additional knowledge compared with Denis solution.
import java.util.function.Supplier;
import java.util.stream.IntStream;
public class AMin
{
public static void main(String args[])
{
int a[] = {-2,-3,1,2,3,-7,5,6};
int[] i = {1} ;
// get next integer starting from 1
Supplier<Integer> supplier = () -> i[0]++;
//1. transform array into specialized int-stream
//2. keep only positive numbers : filter
//3. keep no duplicates : distinct
//4. sort by natural order (ascending)
//5. get the maximum stream based on criteria(predicate) : longest consecutive numbers starting from 1
//6. get the number of elements from the longest "sub-stream" : count
long count = IntStream.of(a).filter(t->t>0).distinct().sorted().takeWhile(t->t== supplier.get()).count();
count = (count==0) ? 1 : ++count;
//print 4
System.out.println(count);
}
}
There are many solutions with O(n) space complexity and O(n) type complexity. You can convert array to;
set: array to set and for loop (1...N) check contains number or not. If not return number.
hashmap: array to map and for loop (1...N) check contains number or not. If not return number.
count array: convert given array to positive array count array like if arr[i] == 5, countArr[5]++, if arr[i] == 1, countArr[1]++ then check each item in countArr with for loop (1...N) whether greate than 1 or not. If not return it.
For now, looking more effective algoritm like #Ricola mentioned. Java solution with O(n) time complexity and O(1) space complexity:
static void swap(final int arr[], final int i,final int j){
final int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
static boolean isIndexInSafeArea(final int arr[], final int i){
return arr[i] > 0 && arr[i] - 1 < arr.length && arr[i] != i + 1 ;
}
static int solution(final int arr[]){
for (int i = 0; i < arr.length; i++) {
while (isIndexInSafeArea(arr,i) && arr[i] != arr[arr[i] - 1]) {
swap(arr, i, arr[i] - 1);
}
}
for (int i = 0; i < arr.length; i++) {
if (arr[i] != i + 1) {
return i+1;
}
}
return arr.length + 1;
}
I have written a program to sort elements of an array based on the principle of quicksort. So what the program does is that it accepts an array, assumes the first element as the pivot and then compares it with rest of the elements of the array. If the element found greater then it will store at the last of another identical array(say b) and if the element is less than the smaller than it puts that element at the beginning of the array b. in this way the pivot will find its way to the middle of the array where the elements that are on the left-hand side are smaller and at the right-hand side are greater than the pivot. Then the elements of array b are copied to the main array and this whole function is called via recursion. This is the required code.
package sorting;
import java.util.*;
public class AshishSort_Splitting {
private static Scanner dogra;
public static void main(String[] args)
{
dogra=new Scanner(System.in);
System.out.print("Enter the number of elements ");
int n=dogra.nextInt();
int[] a=new int[n];
for(int i=n-1;i>=0;i--)
{
a[i]=i;
}
int start=0;
int end=n-1;
ashishSort(a,start,end);
for(int i=0;i<n;i++)
{
System.out.print(+a[i]+"\n");
}
}
static void ashishSort(int[]a,int start,int end)
{
int p;
if(start<end)
{
p=ashishPartion(a,start,end);
ashishSort(a,start,p-1);
ashishSort(a,p+1,end);
}
}
public static int ashishPartion(int[] a,int start,int end)
{
int n=start+end+1;
int[] b=new int[n];
int j=start;
int k=end;
int equal=a[start];
for(int i=start+1;i<=end;i++)
{
if(a[i]<equal)
{
b[j]=a[i];
j++;
}
else if(a[i]>equal)
{
b[k]=a[i];
k--;
}
}
b[j]=equal;
for(int l=0;l<=end;l++)
{
a[l]=b[l];
}
return j;
}
}
this code works fine when I enter the value of n up to 13930, but after that, it shows
Exception in thread "main" java.lang.StackOverflowError
at sorting.AshishSort_Splitting.ashishSort(AshishSort_Splitting.java:28)
at sorting.AshishSort_Splitting.ashishSort(AshishSort_Splitting.java:29)
I know the fact the error caused due to bad recursion but I tested my code multiple times and didn't find any better alternative. please help. thanks in advance.
EDIT: can someone suggest a way to overcome this.
I see perfrmance issues first. I see in your partition method:
int n = start+end+1
Right there, if the method was called on an int[1000] with start=900 and end=999, you are allocating an int[1900]... Not intended, I think...!
If you are really going to trash memory instead of an in-place partitioning,
assume
int n = end-start+1
instead for a much smaller allocation, and j and k indexes b[], they would be j=0 and k=n, and return start + j.
Second, your
else if(a[i]<equal)
is not necessary and causes a bug. A simple else suffice. If you don't replace the 0's in b[j..k] you'll be in trouble when you refill a[].
Finally, your final copy is bogus, from [0 to end] is beyond the bounds of the invocation [start..end], AND most importantly, there is usually nothing of interest in b[nearby 0] with your b[] as it is. The zone of b[] (in your version) is [start..end] (in my suggested version it would be [0..n-1])
Here is my version, but it still has the O(n) stack problem that was mentioned in the comments.
public static int ashishPartion(int[] a, int start, int end) {
int n = end-start + 1;
int[] b = new int[n];
int bj = 0;
int bk = n-1;
int pivot = a[start];
for (int i = start + 1; i <= end; i++) {
if (a[i] < pivot) {
b[bj++] = a[i];
} else {
b[bk--] = a[i];
}
}
b[bj] = pivot;
System.arraycopy(b, 0, a, start, n);
return start+bj;
}
If you are free to choose a sorting algo, then a mergesort would be more uniform on performance, with logN stack depth. Easy to implement.
Otherwise, you will have to de-recurse your algo, using a manual stack and that is a nice homework that I won't do for you... LOL
Exception in thread "main" java.lang.StackOverflowError
at Search.mergeSort(Search.java:41)
at Search.mergeSort(Search.java:43)
at Search.mergeSort(Search.java:43)
at Search.mergeSort(Search.java:43)
at Search.mergeSort(Search.java:43)
I keep getting this error when I try to run my program. My program is supposed to take string input from a file and sort it using this algorithm. Any ideas? Problematic lines from code:
public static void mergeSort(String[] word, int p, int r){
int q;
if(p<r){
q=p+r/2;
mergeSort(word,p,q);
mergeSort(word, q+1,r);
merge(word, p, q, r);
}
}
EDIT
These two functions sort the String array by dividing the array in half, sorting each half separately, and merging them together. Int q is the halfway point, and the arrays being evaluated are from word[p] to word[q] and word[q+1] to word[r]. here's merge function:
public static void merge(String[] word, int p, int q, int r){
int n1 = q-p+1;
int n2 = r-q;
String[] L = new String[n1];
String[] R = new String[n2];
int i, j, k;
for(i=0; i<n1; i++) L[i] = word[p+i];
for(j=0; j<n2; j++) R[j] = word[q+r+1];
i=0; j=0;
for(k=p; k<=r; k++){
if(i<n1 && j<n2){
if(L[i].compareTo(R[j])<0){
word[k] = L[i];
i++;
}else{
word[k] = R[j];
j++;
}
}else if(i<n1){
word[k] = L[i];
i++;
}else if(j<n2){
word[k] = R[j];
j++;
}
}
Walk through with a debugger. You'll see exactly how it's leading to an infinite recursion. IDEs (Eclipse, IntelliJ) have them built in.
The problem is that your calculation of q is incorrect. It is supposed to be the halfway point between p and r, and the way to calculate that is:
q = p + (r - p) / 2;
or
q = (p + r) / 2;
But you've written:
q = p + r / 2;
which is equivalent to
q = p + (r / 2);
When making recursive methods, you need some base case, otherwise you'll get infinite recursion like you are right now.
In your mergeSort method, you don't have a base case. You need to put a check in to see if the part you're sorting is already sorted; if word[p..r] is sorted, then you should not call mergeSort
While implementing improvements to quicksort partitioning,I tried to use Tukey's ninther to find the pivot (borrowing almost everything from sedgewick's implementation in QuickX.java)
My code below gives different results each time the array of integers is shuffled.
import java.util.Random;
public class TukeysNintherDemo{
public static int tukeysNinther(Comparable[] a,int lo,int hi){
int N = hi - lo + 1;
int mid = lo + N/2;
int delta = N/8;
int m1 = median3a(a,lo,lo+delta,lo+2*delta);
int m2 = median3a(a,mid-delta,mid,mid+delta);
int m3 = median3a(a,hi-2*delta,hi-delta,hi);
int tn = median3a(a,m1,m2,m3);
return tn;
}
// return the index of the median element among a[i], a[j], and a[k]
private static int median3a(Comparable[] a, int i, int j, int k) {
return (less(a[i], a[j]) ?
(less(a[j], a[k]) ? j : less(a[i], a[k]) ? k : i) :
(less(a[k], a[j]) ? j : less(a[k], a[i]) ? k : i));
}
private static boolean less(Comparable x,Comparable y){
return x.compareTo(y) < 0;
}
public static void shuffle(Object[] a) {
Random random = new Random(System.currentTimeMillis());
int N = a.length;
for (int i = 0; i < N; i++) {
int r = i + random.nextInt(N-i); // between i and N-1
Object temp = a[i];
a[i] = a[r];
a[r] = temp;
}
}
public static void show(Comparable[] a){
int N = a.length;
if(N > 20){
System.out.format("a[0]= %d\n", a[0]);
System.out.format("a[%d]= %d\n",N-1, a[N-1]);
}else{
for(int i=0;i<N;i++){
System.out.print(a[i]+",");
}
}
System.out.println();
}
public static void main(String[] args) {
Integer[] a = new Integer[]{17,15,14,13,19,12,11,16,18};
System.out.print("data= ");
show(a);
int tn = tukeysNinther(a,0,a.length-1);
System.out.println("ninther="+a[tn]);
}
}
Running this a cuople of times gives
data= 11,14,12,16,18,19,17,15,13,
ninther=15
data= 14,13,17,16,18,19,11,15,12,
ninther=14
data= 16,17,12,19,18,13,14,11,15,
ninther=16
Will tuckey's ninther give different values for different shufflings of the same dataset? when I tried to find the median of medians by hand ,I found that the above calculations in the code are correct.. which means that the same dataset yield different results unlike a median of the dataset.Is this the proper behaviour? Can someone with more knowledge in statistics comment?
Tukey's ninther examines 9 items and calculates the median using only those.
For different random shuffles, you may very well get a different Tukey's ninther, because different items may be examined. After all, you always examine the same array slots, but a different shuffle may have put different items in those slots.
The key here is that Tukey's ninther is not the median of the given array. It is an attempted appromixation of the median, made with very little effort: we only have to read 9 items and make 12 comparisons to get it. This is much faster than getting the actual median, and has a smaller chance of resulting in an undesirable pivot compared to the 'median of three'. Note that the chance still exists.
Does this answer you question?
On a side note, does anybody know if quicksort using Tukey's ninther still requires shuffling? I'm assuming yes, but I'm not certain.