I have a basic insertion sort function which takes 2D String Array and an index value as argument, then sorts that arrays according to that indexes, which is below:
public static void insertionSort(String[][] arr,int featureindex) {
int j;
double key;
int n = arr.length;
for (int i = 2; i < n; i++){
key = Double.parseDouble(arr[i][featureindex]);
String[] keyRow = arr[i];
j = i - 1;
while ((j > 0) && (Double.parseDouble(arr[j][featureindex]) > key)){
arr[j+1] = arr[j];
j = j - 1;
}
arr[j+1] = keyRow;
}
}
I know that insertion sort is slower than merge sort or quicksort, but i have to compare different sorting algorithms (did mergesort and quicksort) and thought that insertion would be better than bubble and selection sort,however, it came out slower than i expected. It sorts 50 000 arrays in ~140 seconds and 100 000 arrays in more than 700 seconds. I am just trying to optimize it as much as possible so that it has to also process 250 000 arrays. My Data looks like that:(4 arrays for example):
String[][] testArr = {
{"Flow ID"," Source IP","Source Port","Destination IP","Destination Port","Protocol","Timestamp","Flow Duration"," Total Fwd Packets","Total Backward Packets"}
,{"192.168.1.101-67.212.184.66-2156-80-6","192.168.1.101","2156","67.212.184.66","80","6","13/06/2010 06:01:11","2328040","2","0","12"}
,{"192.168.1.101-67.212.184.66-2159-80-6","192.168.1.101","2159","67.212.184.66","80","6","13/06/2010 06:01:11","2328006","2","0","12"}
,{"192.168.2.106-192.168.2.113-3709-139-6","192.168.2.106","3709","192.168.2.113","139","6","13/06/2010 06:01:16","7917","10","9","987"}
,{"192.168.5.122-64.12.90.98-59707-25-6","192.168.5.122","59707","64.12.90.98","25","6","13/06/2010 06:01:25","113992","6","3","6"}
};
here i have 10 elements in each subarray however, in actual one, it is lets say 250000 x 84 array.
is there any way to improve this?
Related
I am benching marking sorting algorithms using java.
When I compare average case Bubble sort with Selection sort using randomised arrays ranging for 0 to 99 the Bubble sort performs noticeably better. Most references to performance I have read state the Section sort is the better of the two.
This my Selection implementation:
public static void selectionSort(int[] arr) {
/*
* Selection sort sorting algorithm. Cycle: The minimum element form the
* unsorted sub-array on he right is picked and moved to the sorted sub-array on
* the left.
*
* The outer loop runs n-1 times. Inner loop n/2 on average. this results in
* (𝑛−1)×𝑛2≈𝑛2 best, worst and average cases.
*
*/
// Count outer
for (int i = 0; i < arr.length; i++) {
// Assign the default min index
int min = i;
// Find the index with smallest value
for (int j = i + 1; j < arr.length; j++) {
if (arr[j] < arr[min]) {
min = j;
}
}
// Swap index arr[min] with a[i]
int temp = arr[min];
arr[min] = arr[i];
arr[i] = temp;
}
}
My Bubble Sort:
public static void optimalBubbleSort(int[] arr) {
/**
* The optimized version will check whether the list
* is sorted at each iteration. If the list is sorted the
* program will exist.
* Thus the best case for the optimized bubble sort
* is O{n). Conversely the above algorithm
* the best case will always be the same as the average case.
*
*/
boolean sorted = false;
int n = arr.length;
while (!sorted) {
sorted = true;
for (int i = 0; i < n - 1; i++) {
if (arr[i] > arr[i + 1]) {
int temp = arr[i + 1];
arr[i + 1] = arr[i];
arr[i] = temp;
sorted = false;
}
}
n--;
}
}
My implementation of Bubble sort not optimised to exit if the list is sorted:
for (int i = 0; i < arr.length - 1; i++) {
/*
* Iteration of the outer loop will ensure
* that at the end the largest element is in the
* (array.lenght-(i+1))th index.
* Thus the loop invariant is that
* In other words, the loop invariant is that
* the subsection bounded by the indices
* [arr.length - i, arr.length] is sorted and
* contains the i biggest elements in array.
*/
for (int j = 0; j < arr.length - i - 1; j++) {
if (arr[j] > arr[j + 1]) {
/*
* In the case where an inversion exists in
* arr[j] > arr[j + 1],
* arr[j] and arr[j + 1] are
* thus swapped.
*/
int temp = arr[j + 1];
arr[j + 1] = arr[j];
arr[j] = temp;
}
}
}
This is how I generate the randomised array for inout:
int[] random = new int[n];
for (int i = 0; i < n; i++) {
random[i] = randomInteger.nextInt(100);
}
Any input as to why the Bubble sort is faster appreciated.
So if we compare the number of comparisons, the both algorithms will have something around n*(n+1)/2 (or just the sum of all numbers from n to 1), which is approximately n^2 as you stated.
The selection sort surely does have less swaps than the bubble sort, but the thing is that it will go over the whole array and sort even if it's already sorted. The bubble sort would actually do around n comparisons when the array is sorted, which makes it have O(n) time complexity in the best case. It will also be faster when the array is nearly sorted, which on average results in bubble sort being faster than insertion sort.
You can find the big O of each case on this site
And on this site you can see what actually happens if the array is already or nearly sorted.
If I am right, you are sorting values in range [0, 99] and arrays of length up to 100000. Which means that every value can be repeated up to 1000 times on average.
IMO you can discard all your results as you are testing very special cases where a big amount of keys are equal and the algorithms can have a non-standard behavior. Their performance will be more sensitive to usually unimportant implementation details. The classical sorting algorithms were not designed for such extreme situations.
I would like to add that comparing slow sorts to fast sorts on big data sets does not make much sense (the curves for the fast sorts are so closely packed that you can't compare any of them.)
As you've found, the main difference between bubble and selection sort is that the bubble sort can work much faster than the selection sort, when the array is (nearly) sorted. In your case, as you select your random number between 0 and 100, if the chosen length of the array n is much less than 100, the probability of (nearly) sorted sample will be increased. In this sense, you can find that the average complexity of the bubble sort is better than the selection sort.
Therefore, you should be aware that this comparison depends on n, and if you increase the value of n, as the probability of sorted arrays is decreased, you will find these two curves closer.
So I have written the insertion sort code properly to where it will successfully create arrays of 10, 1,000, 100,000 and 1,000,000 integers between 1,000 and 9,999 and complete the insertion sort algorithm just fine. However, when I attempt the last step of 10,000,000 integers, the array is created, but the code never fully completes. I have allowed it plenty of time to complete, upwards of 4 or 5 hours, to no avail. Anybody have any ideas of what the issue may be here? Is the executer having issues comprehending that many integers or what could the issue stem from? I have included a copy of the insertion algorithm that I have written.
public static void insertion(int[] a) {
int n = a.length;
for(int i = 1; i < n; i++) {
int j = i -1;
int temp = a[i];
while(j > 0 && temp < a[j]) {
a[j+1] = a[j];
j--;
}
a[j+1] = temp;
}
}
Anybody have any ideas of what the issue may be here?
When you make the array 10x larger you have to wait 100x longer as this is an O(n^2) algorithm.
Is the executer having issues comprehending that many integers or what could the issue stem from?
No, the limit is 2^31-1 and you are a long way from the limit.
Running
interface A {
static void main(String[] a) {
for (int i = 25_000; i <= 10_000_000; i *= 2) {
Random r = new Random();
int[] arr = new int[i];
for (int j = 0; j < i; j++)
arr[j] = r.nextInt();
long start = System.currentTimeMillis();
insertion(arr);
long time = System.currentTimeMillis() - start;
System.out.printf("Insertion sort of %,d elements took %.3f seconds%n",
i, time / 1e3);
}
}
public static void insertion(int[] a) {
int n = a.length;
for (int i = 1; i < n; i++) {
int j = i - 1;
int temp = a[i];
while (j > 0 && temp < a[j]) {
a[j + 1] = a[j];
j--;
}
a[j + 1] = temp;
}
}
}
prints
Insertion sort of 25,000 elements took 0.049 seconds
Insertion sort of 50,000 elements took 0.245 seconds
Insertion sort of 100,000 elements took 1.198 seconds
Insertion sort of 200,000 elements took 4.343 seconds
Insertion sort of 400,000 elements took 19.212 seconds
Insertion sort of 800,000 elements took 71.297 seconds
So my machine could take in the order of 4 hours, but it could take longer as a bigger data set doesn't fit in L3 cache, but rather main memory which is slower.
I was given a task to sort an array that is filled with (non negative) integers.
They should be sorted such that the output is in the following order:
Numbers where the remainder of them divided by 4 is 0
Numbers where the remainder of them divided by 4 is 1
Numbers where the remainder of them divided by 4 is 2
Numbers where the remainder of them divided by 4 is 3
I tried to write a simple algorithm that should work at O(n) (the task is to write the code efficiently).
But I think it's a mess, a few cases didn't work (for example when I tried an array where the first few numbers have remainders of 3).
Any suggestion on how to fix it or a better way of doing this?
public static void sortByFour(int[] arr)
{
int zeroR = -1, oneR = 0, twoR = arr.length-1, threeR = arr.length;
do
{
if (arr[oneR]%4==1)
oneR++;
else if (arr[oneR]%4==0)
{
zeroR++;
int temp = arr[oneR];
arr[oneR] = arr[zeroR];
arr[zeroR] = temp;
oneR++;
}
else if (arr[oneR]%4==2)
{
twoR--;
int temp = arr[oneR];
arr[oneR] = arr[twoR];
arr[twoR] = temp;
}
else if (arr[oneR]%4==3)
{
threeR--;
int temp = arr[oneR];
arr[oneR] = arr[threeR];
arr[threeR] = temp;
}
} while (oneR < threeR && oneR < twoR);
}
A bucket sort can do the trick for you. Note that you can overcome the O(n) extra space factor of bucket sort by looping 4 times (once per each reminder), something like (java-like pseudo code):
final int REMINDER = 4; //4 because you use %4
int curr = -1;
for (int r = 0; r < REMINDER ; r++) {
for (int i = curr + 1; i < arr.length; i++) {
if (arr[i] % REMINDER == r) {
//swap elements:
int temp = arr[i];;
arr[i] = arr[++curr];
arr[curr] = temp;
}
}
}
The idea is to 'remember' where you have last set element, and iterate the array 4 times, and swap elements with matching reminder to the desired location (which you remembered).
Complexity is still O(n) with O(1) extra space.
An alternative is O(n) (but much better constants) time with O(n) space is to use a classic bucket sort, with a single pass store all elements in 4 different lists according to the desired reminder, and in a 2nd pass - on those 4 lists, fill the original array with the elements according to the desired order.
I have an array A of n integers. I also have an array B of k (k < n) integers. What I need is that any integer from array A that appears in array B to be increased by 3.
If I go with the most obvious way, I get to n*k complexity.
Array A cannot (must not) be sorted.
Is there a more efficient way of achieveing this?
Is there a more efficient way of achieveing this?
Yes: put the elements of B into a HashSet. Loop over A and, if the element you're on is contained in the set, increase it by 3. This will have O(n + k) complexity.
For instance:
Set<Integer> bSet = new HashSet<>(B.length);
for (int a : B) // O(k)
bSet.add(a);
for (int i = 0; i < A.length; i++) { // O(n)
if (bSet.contains(a[i]))
a[i] += 3;
}
If your integers are in a range that you can create and array with the length of the greatest value (for instance 0 <= A[i] and B[i] <= 65535) then you can do this
boolean [] constains = new boolean[65535];
for (int i = 0; i < k; i++){
constains[B[i]] = true;
}
for (int i = 0; i < n; i++){
if (constains[A[i]]){
A[i] += 3;
}
}
Which is O(n + k)
if array B can be sorted - then solution is obvious, sort it, then you can optimize "contains" to be log2(K), so your complexity will be N*log2(k)
if you cannot sort array B - then the only thing is straight forward N*K
UPDATE
really forgot about bitmask, if you know that you have only 32 bit integers, and have enough memory - you can store huge bitmask array, were "add" and "contains" always will be O(1), but of course it is needed only for very special performance optimizations
I asked this question before, but my post was cluttered with a whole bunch of other code and wasn't clearly presented, so I'm going to try again. Sorry, I'm new here
Shell sort, how I wrote it, only works sometimes. Array a is an array of 100 integers unsorted, inc is an array of 4 integers whose values are the intervals that shell sort should use (they descend and the final value is always 1), count is an array which stores the counts for different runs of shell sort, cnt represents the count value which should be updated for this run of shell sort.
When I run shell sort multiple times, with different sets of 4 intervals, only sometimes does the sort fully work. Half the time the array is fully sorted, the other half of the time the array is partially sorted.
Can anyone help? Thanks in advance!
public static void shellSort(int[] a, int[] inc, int[] count, int cnt) {
for (int k = 0; k < inc.length; k++) {
for (int i = inc[k], j; i < a.length; i += inc[k]) {
int tmp = a[i];
count[cnt] += 1;
for (j = i - inc[k]; j >= 0; j -= inc[k]) {
if (a[j] <= tmp)
break;
a[j + inc[k]] = a[j];
count[cnt] += 1;
}
a[j + inc[k]] = tmp;
count[cnt] += 1;
}
}
}
One problem is that you're only sorting one inc[k]-step sequence for each k, while you should sort them all (you're only sorting {a[0], a[s], a[2*s], ... , a[m*s]}, leaving out {a[1], a[s+1], ... , a[m*s+1]} etc.). However, that should only influence performance (number of operations), not the outcome, since the last pass is a classical insertion sort (inc[inc.length-1] == 1), so that should sort the array no matter what happened before.
I don't see anything in the code that would cause failure. Maybe the inc array doesn't contain what it should? If you print out inc[k] in each iteration of the outer loop, do you get the expected output?
There is an error in your i loop control:
for (int i = inc[k], j; i < a.length; i += inc[k]) {
Should be:
for (int i = inc[k], j; i < a.length; i++) {
The inner j loop handles the comparison of elements that are inc[k] apart. The outer i loop should simply increment by 1, the same as the outer loop of a standard Insertion sort.
In fact, the final pass of Shellsort with an increment of 1 is identical to a standard Insertion sort.