So I have written the insertion sort code properly to where it will successfully create arrays of 10, 1,000, 100,000 and 1,000,000 integers between 1,000 and 9,999 and complete the insertion sort algorithm just fine. However, when I attempt the last step of 10,000,000 integers, the array is created, but the code never fully completes. I have allowed it plenty of time to complete, upwards of 4 or 5 hours, to no avail. Anybody have any ideas of what the issue may be here? Is the executer having issues comprehending that many integers or what could the issue stem from? I have included a copy of the insertion algorithm that I have written.
public static void insertion(int[] a) {
int n = a.length;
for(int i = 1; i < n; i++) {
int j = i -1;
int temp = a[i];
while(j > 0 && temp < a[j]) {
a[j+1] = a[j];
j--;
}
a[j+1] = temp;
}
}
Anybody have any ideas of what the issue may be here?
When you make the array 10x larger you have to wait 100x longer as this is an O(n^2) algorithm.
Is the executer having issues comprehending that many integers or what could the issue stem from?
No, the limit is 2^31-1 and you are a long way from the limit.
Running
interface A {
static void main(String[] a) {
for (int i = 25_000; i <= 10_000_000; i *= 2) {
Random r = new Random();
int[] arr = new int[i];
for (int j = 0; j < i; j++)
arr[j] = r.nextInt();
long start = System.currentTimeMillis();
insertion(arr);
long time = System.currentTimeMillis() - start;
System.out.printf("Insertion sort of %,d elements took %.3f seconds%n",
i, time / 1e3);
}
}
public static void insertion(int[] a) {
int n = a.length;
for (int i = 1; i < n; i++) {
int j = i - 1;
int temp = a[i];
while (j > 0 && temp < a[j]) {
a[j + 1] = a[j];
j--;
}
a[j + 1] = temp;
}
}
}
prints
Insertion sort of 25,000 elements took 0.049 seconds
Insertion sort of 50,000 elements took 0.245 seconds
Insertion sort of 100,000 elements took 1.198 seconds
Insertion sort of 200,000 elements took 4.343 seconds
Insertion sort of 400,000 elements took 19.212 seconds
Insertion sort of 800,000 elements took 71.297 seconds
So my machine could take in the order of 4 hours, but it could take longer as a bigger data set doesn't fit in L3 cache, but rather main memory which is slower.
Related
So I'm preparing for a technical interview, and one of my practice questions is the Kth smallest number.
I know that I can do a sort for O(n * log(n)) time and use a heap for O(n * log(k)). However I also know I can partition it (similar to quicksort) for an average case of O(n).
The actual calculated average time complexity should be:
I've double checked this math using WolframAlpha, and it agrees.
So I've coded my solution, and then I calculated the actual average time complexity on random data sets. For small values of n, it's pretty close. For example n=5 might give me an actual of around 6.2 when I expect around 5.7. This slightly more error is consistent.
This only gets worse as I increase the value of n. For example, for n=5000, I get around 15,000 for my actual average time complexity, when it should be slightly less than 10,000.
So basically, my question is where are these extra iterations coming from? Is my code wrong, or is it my math? My code is below:
import java.util.Arrays;
import java.util.Random;
public class Solution {
static long tc = 0;
static void swap(int[] arr, int i, int j) {
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
static int kMin(int[] arr, int k) {
arr = arr.clone();
int pivot = pivot(arr);
if(pivot > k) {
return kMin(Arrays.copyOfRange(arr, 0, pivot), k);
} else if(pivot < k) {
return kMin(Arrays.copyOfRange(arr, pivot + 1, arr.length), k - pivot - 1);
}
return arr[k];
}
static int pivot(int[] arr) {
Random rand = new Random();
int pivot = rand.nextInt(arr.length);
swap(arr, pivot, arr.length - 1);
int i = 0;
for(int j = 0; j < arr.length - 1; j++) {
tc++;
if(arr[j] < arr[arr.length - 1]) {
swap(arr, i, j);
i++;
}
}
swap(arr, i, arr.length - 1);
return i;
}
public static void main(String args[]) {
int iterations = 10000;
int n = 5000;
for(int j = 0; j < iterations; j++) {
Random rd = new Random();
int[] arr = new int[n];
for (int i = 0; i < arr.length; i++) {
arr[i] = rd.nextInt();
}
int k = rd.nextInt(arr.length - 1);
kMin(arr, k);
}
System.out.println("Actual: " + tc / (double)iterations);
double expected = 2.0 * n - 2.0 - (Math.log(n) / Math.log(2));
System.out.println("Expected: " + expected);
}
}
As you and others have pointed out in the comments, your calculation assumed that the array was split in half on each iteration by the random pivot, which is incorrect. This uneven splitting has a significant impact: when the element you're trying to select is the actual median, for instance, the expected size of the array after one random pivot choice is 75% of the original, since you'll always choose the larger of the two arrays.
For an accurate estimate of the expected comparisons for each value of n and k, David Eppstein published an accessible analysis here and derives this formula:
This is a very close estimate for your values, even though this assumes no duplicates in the array.
Calculating the expected number of comparisons for k from 1 to n-1, as you do, gives ~7.499 * 10^7 total comparisons when n=5000, or almost exactly 15,000 comparisons per call of Quickselect as you observed.
I have a basic insertion sort function which takes 2D String Array and an index value as argument, then sorts that arrays according to that indexes, which is below:
public static void insertionSort(String[][] arr,int featureindex) {
int j;
double key;
int n = arr.length;
for (int i = 2; i < n; i++){
key = Double.parseDouble(arr[i][featureindex]);
String[] keyRow = arr[i];
j = i - 1;
while ((j > 0) && (Double.parseDouble(arr[j][featureindex]) > key)){
arr[j+1] = arr[j];
j = j - 1;
}
arr[j+1] = keyRow;
}
}
I know that insertion sort is slower than merge sort or quicksort, but i have to compare different sorting algorithms (did mergesort and quicksort) and thought that insertion would be better than bubble and selection sort,however, it came out slower than i expected. It sorts 50 000 arrays in ~140 seconds and 100 000 arrays in more than 700 seconds. I am just trying to optimize it as much as possible so that it has to also process 250 000 arrays. My Data looks like that:(4 arrays for example):
String[][] testArr = {
{"Flow ID"," Source IP","Source Port","Destination IP","Destination Port","Protocol","Timestamp","Flow Duration"," Total Fwd Packets","Total Backward Packets"}
,{"192.168.1.101-67.212.184.66-2156-80-6","192.168.1.101","2156","67.212.184.66","80","6","13/06/2010 06:01:11","2328040","2","0","12"}
,{"192.168.1.101-67.212.184.66-2159-80-6","192.168.1.101","2159","67.212.184.66","80","6","13/06/2010 06:01:11","2328006","2","0","12"}
,{"192.168.2.106-192.168.2.113-3709-139-6","192.168.2.106","3709","192.168.2.113","139","6","13/06/2010 06:01:16","7917","10","9","987"}
,{"192.168.5.122-64.12.90.98-59707-25-6","192.168.5.122","59707","64.12.90.98","25","6","13/06/2010 06:01:25","113992","6","3","6"}
};
here i have 10 elements in each subarray however, in actual one, it is lets say 250000 x 84 array.
is there any way to improve this?
Can anyone help to analyze the time complexity of this can and please explain why.
I'm comparing the array elements with each other with one being max.
I'm not sure how to calculate the time complexity for this.
Can anyone help me with this?
class Largest
{
public static void main (String[] args)
{
int array[] = {33,55,13,46,87,42,10,34};
int max = array[0]; // Assume array[0] to be the max for time-being
for( int i = 1; i < array.length; i++) // Iterate through the First Index and compare with max
{
if( max < array[i])
{
max = array[i];
}
}
System.out.println("Largest is: "+ max);
}
}
It is O(n)
//loop n-1 times, O(n)
for( int i = 1; i < array.length; i++) // Iterate through the First Index and compare with max
{
//one comparison operation, O(1)
if( max < array[i])
{
//one assignment operation, O(1)
max = array[i];
}
}
You do 2 constant operations, n-1 times.
O(n) * [O(1) + O(1)] = O(n)
You loop goes around n - 1 times so the complexity is O(n)
I am working on a benchmarking assignment for various algorithms. The requirements are to run the programs with successive data sets of random integers totaling 10,000, 20,000, 100,000, 200,000, 1,000,000 respectively. I have written the programs so that I can manually input the data set sizes, but, I would prefer to use a loop to run the program once and automatically input the different data sets. I'm not really sure how to go about doing this; any advice is appreciated. Thanks in advance.
package bubble.sort;
public class BubbleSort {
public static void main(String[] arg) throws Exception{
int array[] = new int [1000];
int i;
for (int k = 1; k<= 100; k++){
for (i=0;i<array.length;i++){
//for loop that will populate array with random numbers
array[i] = 1 + (int)(Math.random() * 10);
}//end for loop
// get the start time in nanoseconds
long startTime = System.nanoTime();
//call mergesort to sort the entire array
bubbleSort(array);
// get the end time in nanoseconds
long endTime = System.nanoTime();
// calculate elapsed time in nanoseconds
long duration = endTime - startTime;
// print the elapsed time in seconds (nanaoseconds/ 1 billion)
System.out.printf("%12.8f %n", (double)duration/100000000) ;
}
}
public static void bubbleSort( int [ ] array){
int temp = 0;
for (int i = 0; i < array.length; i++) {
for (int j = 1; j < (array.length - i); j++) {
if (array[j - 1] > array[j]) {
temp = array[j - 1];
array[j - 1] = array[j];
array[j] = temp;
}
}
}
}
}
The simplest way - wrap your code with another loop:
int dataSetSizes[] = {10000, 20000, 100000, 200000, 1000000};
for (int dataSetSize : dataSetSizes) {
int array[] = new int[dataSetSize];
// rest of your code
}
Extract what is inside of your main method in to a method that takes in the size as a parameter and use that passed in size in order to create the array. Then, in main, run that method with each of the required sizes.
I asked this question before, but my post was cluttered with a whole bunch of other code and wasn't clearly presented, so I'm going to try again. Sorry, I'm new here
Shell sort, how I wrote it, only works sometimes. Array a is an array of 100 integers unsorted, inc is an array of 4 integers whose values are the intervals that shell sort should use (they descend and the final value is always 1), count is an array which stores the counts for different runs of shell sort, cnt represents the count value which should be updated for this run of shell sort.
When I run shell sort multiple times, with different sets of 4 intervals, only sometimes does the sort fully work. Half the time the array is fully sorted, the other half of the time the array is partially sorted.
Can anyone help? Thanks in advance!
public static void shellSort(int[] a, int[] inc, int[] count, int cnt) {
for (int k = 0; k < inc.length; k++) {
for (int i = inc[k], j; i < a.length; i += inc[k]) {
int tmp = a[i];
count[cnt] += 1;
for (j = i - inc[k]; j >= 0; j -= inc[k]) {
if (a[j] <= tmp)
break;
a[j + inc[k]] = a[j];
count[cnt] += 1;
}
a[j + inc[k]] = tmp;
count[cnt] += 1;
}
}
}
One problem is that you're only sorting one inc[k]-step sequence for each k, while you should sort them all (you're only sorting {a[0], a[s], a[2*s], ... , a[m*s]}, leaving out {a[1], a[s+1], ... , a[m*s+1]} etc.). However, that should only influence performance (number of operations), not the outcome, since the last pass is a classical insertion sort (inc[inc.length-1] == 1), so that should sort the array no matter what happened before.
I don't see anything in the code that would cause failure. Maybe the inc array doesn't contain what it should? If you print out inc[k] in each iteration of the outer loop, do you get the expected output?
There is an error in your i loop control:
for (int i = inc[k], j; i < a.length; i += inc[k]) {
Should be:
for (int i = inc[k], j; i < a.length; i++) {
The inner j loop handles the comparison of elements that are inc[k] apart. The outer i loop should simply increment by 1, the same as the outer loop of a standard Insertion sort.
In fact, the final pass of Shellsort with an increment of 1 is identical to a standard Insertion sort.