In the process of learning algorithms, I have written code to compare 2 algorithms performance in terms of running time. The task of these algorithms is to find all the pairs of numbers in an array that add up to a specific number.
First approach - Brute force.
2 for loops to find the pairs of numbers that add up to the given number. Basically time complexity is O(n*n).
Second approach - Efficient
First sort the array, then have start and end as index to the beginning and end of array, and depending on the sum of these elements in the positions, move left or right to find pairs of numbers.
My question is -
I am printing the running time of each algorithm approach. But it seems like the running time of the Brute force approach is faster than the Efficient one. Why is this happening?
See the code here -
public class MainRunner {
final private static int numberRange = 100000;
public static void generateRandomNumbers(int[] array, int[] dupArray) {
System.out.println("Generated Array: ");
Random random = new Random();
for (int i = 0; i < array.length; i++) {
int generatedRandomInt = random.nextInt(array.length) + 1;
array[i] = dupArray[i] = generatedRandomInt;
}
}
public static void main(String[] args) {
int[] array = new int[numberRange];
int[] dupArray = new int[numberRange];
generateRandomNumbers(array, dupArray);
Random random = new Random();
int sumToFind = random.nextInt(numberRange) + 1;
System.out.println("\n\nSum to find: " + sumToFind);
// Starting Sort and Find Pairs
final long startTimeSortAndFindPairs = System.currentTimeMillis();
new SortAndFindPairs().sortAndFindPairsOfNumbers(sumToFind, array);
final long durationSortAndFind = System.currentTimeMillis() - startTimeSortAndFindPairs;
// Starting Find Pairs
final long startTimeFindPairs = System.currentTimeMillis();
new FindPairs().findPairs(sumToFind, dupArray);
final long durationFindPairs = System.currentTimeMillis() - startTimeFindPairs;
System.out.println("Sort and Find Pairs: " + durationSortAndFind);
System.out.println("Find Pairs: " + durationFindPairs);
}
}
SortAndFindPairs.java
public class SortAndFindPairs {
public void sortAndFindPairsOfNumbers(int argNumberToFind, int[] array) {
Arrays.sort(array);
System.out.println("\n\nResults of Sort and Find Pairs: \n");
int startIndex = 0;
int endIndex = array.length - 1;
while (startIndex < endIndex) {
int sum = array[startIndex] + array[endIndex];
if (argNumberToFind == sum) {
//System.out.println(array[startIndex] + ", " + array[endIndex]);
startIndex++;
endIndex--;
} else if (argNumberToFind > sum) {
startIndex++;
} else {
endIndex--;
}
}
}
And the FindPairs.java
public class FindPairs {
public void findPairs(int argNumberToFind, int[] array) {
System.out.println("\nResults of Find Pairs: \n");
int randomInt1 = 0;
int randomInt2 = 0;
for (int i = 0; i < array.length - 1; i++) {
for (int j = i + 1; j < array.length; j++) {
int sum = array[i] + array[j];
if (argNumberToFind == sum) {
//System.out.println(array[i] + ", " + array[j]);
//randomInt1++;
//randomInt2--;
}
}
}
}}
Only on adding the two variables randomInt1 and randomInt2 in the FindPairs.java, the running time difference is seen. Or else, the running time of FindPairs.java is much less than SortAndFindPairs.java. So why does adding just 2 variable operations increase time by so much? According to conventions, simple operations should consume negligible time. Am I missing out something here?
Results for numberRange = 1000000
Results of Find Pairs:
Sort and Find Pairs: 641
Find Pairs: 57
I think the problem is your compiler optimization playing tricks to you. I tried different permutations of your code, and noticed that the double for loop in FindPairs is doing almost nothing. So the compiler may be stripping some of the code.
I got this numbers with the exact copy of your code:
Sort and Find Pairs: 43
Find Pairs: 13
Consistently (I ran it several times to double check) Sort and find was slower, everytime.
But then I changed the inner loop for to do nothing:
for (int j = i + 1; j < array.length; j++) {
//int sum = array[i] + array[j];
//if (argNumberToFind == sum) {
//System.out.println(array[i] + ", " + array[j]);
//randomInt1++;
//randomInt2--;
//}
And guess what? I got:
Sort and Find Pairs: 20
Find Pairs: 11
Tried several times and the numbers were pretty similar. By removing both loops the runtime for find pairs went to 1. So My guess, maybe the optimization step of the compiler is assuming that the code inside the inner loop doesn't have any effect and thus removes it. The code in Sort and find is a little smarter and so it gets kept.
Now, I tried a different thing, I commented out the increment of randomInt1, but left the sum and if commented,
for (int j = i + 1; j < array.length; j++) {
//int sum = array[i] + array[j];
//if (argNumberToFind == sum) {
//System.out.println(array[i] + ", " + array[j]);
randomInt1++;
//randomInt2--;
//}
and then I got:
Sort and Find Pairs: 42
Find Pairs: 5
Wow, suddenly it got faster! (maybe the compiler replaced the for for the arithmetic calculation of randomInt1 by using the loop bounds?)
My last attempt. You can noticed that this is not a fair comparison, the sort and find have a lot of logic involved, while the find doesn't. It actually does nothing when it find a pair. So to make it apples to apples we want to be sure find pairs actually do something, and lets make sure sort and find do the same extra amount (like adding the same number on both sides of the equation). So I changed the methods to calculate the count of matching pairs instead. Like this:
System.out.println("\nResults of Find Pairs: \n");
long randomInt1 = 0;
int randomInt2 = 0;
int count = 0;
for (int i = 0; i < array.length - 1; i++) {
for (int j = i + 1; j < array.length; j++) {
int sum = array[i] + array[j];
if (argNumberToFind == sum) {
count++;
}
}
}
System.out.println("\nPairs found: " + count + "\n");
and
public void sortAndFindPairsOfNumbers(int argNumberToFind, int[] array) {
Arrays.sort(array);
System.out.println("\n\nResults of Sort and Find Pairs: \n");
int startIndex = 0;
int endIndex = array.length - 1;
int count = 0;
while (startIndex < endIndex) {
int sum = array[startIndex] + array[endIndex];
if (argNumberToFind == sum) {
//System.out.println(array[startIndex] + ", " + array[endIndex]);
startIndex++;
endIndex--;
count++;
} else if (argNumberToFind > sum) {
startIndex++;
} else {
endIndex--;
}
}
System.out.println("\nPairs found: " + count + "\n");
}
And then got:
Sort and Find Pairs: 38
Find Pairs: 4405
The time for find pairs blowed up! And the sort and find kept in line with what we were seeing before.
So the most likely answer to your problem is that the compiler is optimizing something, and an almost empty for loop is something that the compiler can definitely use to optimize. whilst for the sort and find, the complex logic may cause the optimizer to step back. Your algorithm lessons are find. Here java is playing you a trick.
One more thing you can try is use different languages. I'm pretty sure you will find interesting stuff by doing so!
As stated by LIuxed, sort operation takes some time. If you invest time in sorting, why do you then not take advantage of the fact that list items are sorted?
If list elements are sorted, you could use a binary search algorithm... start in the middle of the array, and check if you go 1/2 up, or 1/2 down. As a result, you can get faster performance with sorted array for seeking a value. Such an algorithm is already implemented in the Arrays.binarySearch methods.
See https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#binarySearch(int[],%20int)
You will notice the difference when you sort just once, but seek many times.
Calling the Array.sort(MyArray) method, takes long time because it uses a selection algorithm; this means, the Sort method go through all the array x times ( x= array.lenght) searching for the smallest/biggest value, and set it on top of the array, and so on.
Thats why, using this method takes a long time, depending on the array size.
I removed everything from your sortAndFindPairsOfNumbers method, just kept
Arrays.sort(array);
But still time difference is much more.
This means most of the time taken is by sort method.
So your thinking that second approach is Efficient one is not correct. Its all about input size.
If you keep numberRange, lets say, 1000, then SortAndFindPairs will be faster.
Related
Referencing https://www.geeksforgeeks.org/find-subarray-with-given-sum-in-array-of-integers/ for some reason I'm feeling a little thick about this and not totally grasping the reason that once you find the highest index for (current_sum - target_sum) in the map, that you know if you start at the index immediately following that in the array and include the values up to the current index where you encounter this in the array, that you have your subarray solution.
I pretty much get it, that it's because if we've reached a point in our iterating of the array that we've seen the difference between our current sum and the target number, then if we remove that difference from the sum we have found the subarray for the solution, but I can't quite grasp why exactly that is. For example, what if the difference is "2" but the index we have stored in our map where we last saw the sum was "2" is not immediately before the subarray leading up to where we are now and provides the solution. Again, I kind of get it but would appreciate a clear and precise explanation so I have that "aha" moment and more solidly grasp it.
Also wondering the logic that might lead me to this solution after solving this in a different way for positive integers only, namely the efficient solution covered here https://www.geeksforgeeks.org/find-subarray-with-given-sum/.
Thanks.
public static void subArraySum(int[] arr, int n, int sum) {
//cur_sum to keep track of cummulative sum till that point
int cur_sum = 0;
int start = 0;
int end = -1;
HashMap<Integer, Integer> hashMap = new HashMap<>();
for (int i = 0; i < n; i++) {
cur_sum = cur_sum + arr[i];
//check whether cur_sum - sum = 0, if 0 it means
//the sub array is starting from index 0- so stop
if (cur_sum - sum == 0) {
start = 0;
end = i;
break;
}
//if hashMap already has the value, means we already
// have subarray with the sum - so stop
if (hashMap.containsKey(cur_sum - sum)) {
start = hashMap.get(cur_sum - sum) + 1;
end = i;
break;
}
//if value is not present then add to hashmap
hashMap.put(cur_sum, i);
}
// if end is -1 : means we have reached end without the sum
if (end == -1) {
System.out.println("No subarray with given sum exists");
} else {
System.out.println("Sum found between indexes "
+ start + " to " + end);
}
}
I solved a variation of the knapsack problem by backtracking all of the possible solutions. Basically 0 means that item is not in the backpack, 1 means that the item is in the backpack. Cost is the value of all items in the backpack, we are trying to achieve the lowest value possible while having items of every "class". Each time that a combination of all classes is found, I calculate the value of all items and if it's lower than globalBestValue, I save the value. I do this is verify().
Now I'm trying to optimize my recursive backtrack. My idea was to iterate over my array as it's being generated and return the generator if the "cost" of my generated numbers is already higher then my current best-value, therefore the combination currently being generated can't be the new best-value and can be skipped.
However with my optimization, my backtrack is not generating all the values and it actually skips the "best" value I'm trying to find. Could you tell me where the problem is?
private int globalBestValue = Integer.MAX_VALUE;
private int[] arr;
public KnapSack(int numberOfItems) {
arr = new int[numberOfItems];
}
private void generate(int fromIndex) {
int currentCost = 0; // my optimisation starts here
for (int i = 0; i < arr.length; i++) {
if (currentCost > globalBestValue) {
return;
}
if (arr[i] == 1) {
currentCost += allCosts.get(i);
}
} // ends here
if (fromIndex == arr.length) {
verify();
return;
}
for (int i = 0; i <= 1; i++) {
arr[fromIndex] = i;
generate(fromIndex + 1);
}
}
public void verify() {
// skipped the code verifying the arr if it's correct, it's long and not relevant
if (isCorrect == true && currentValue < globalBestValue) {
globalBestValue = currentValue;
}else{
return;
}
}
Pardon my bluntness, but your efforts at optimizing an inefficient algorithm can only be described as polishing the turd. You will not solve a knapsack problem of any decent size by brute force, and early return isn't enough. I have mentioned one approach to writing an efficient program on CodeReview SE; it requires a considerable effort, but you gotta do what you gotta do.
Having said that, I'd recommend you write the arr to console in order to troubleshoot the sequence. It looks like when you go back to the index i-1, the element at i remains set to 1, and you estimate the upper bound instead of the lower one. The following change might work: replace your code
for (int i = 0; i <= 1; i++) {
arr[fromIndex] = i;
generate(fromIndex + 1);
}
with
arr[fromIndex] = 1;
generate(fromIndex + 1);
arr[fromIndex] = 0;
generate(fromIndex + 1);
This turns it into a sort of greedy algorithm: instead of starting with 0000000, you effectively start with 1111111. And obviously, when you store the globalBestValue, you should store the actual data which gives it. But the main advice is: when your algorithm behaves weirdly, tracing is your friend.
For example, if you were given {1,2} as the small array and {1,2,3,4,1,2,1,3} as the big one, then it would return 2.
This is probably horribly incorrect:
public static int timesOccur(int[] small, int big[]) {
int sum= 0;
for (int i=0; i<small.length; i++){
int currentSum = 0;
for (int j=0; j<big.length; j++){
if (small[i] == big[j]){
currentSum ++;
}
sum= currentSum ;
}
}
return sum;
}
As #AndyTurner mentioned, your task can be reduced to the set of well-known string matching algorithms.
As I can understand you want solution faster than O(n * m).
There are two main approaches. First involves preprocessing text (long array), second involves preprocessing search pattern (small array).
Preprocessing text. By this I mean creating suffix array or LCP from your longer array. Having this data structure constructed you can perform a binary search to find your your substring. The most efficient time you can achieve is O(n) to build LCP and O(m + log n) to perform the search. So overall time is O(n + m).
Preprocessing pattern. This means construction DFA from the pattern. Having DFA constructed it takes one traversal of the string (long array) to find all occurrences of substring (linear time). The hardest part here is to construct the DFA. Knuth-Morris-Pratt does this in O(m) time, so overall algorithm running time will be O(m + n). Actually KMP algorithm is most probably the best available solution for this task in terms of efficiency and implementation complexity. Check #JuanLopes's answer for concrete implementation.
Also you can consider optimized bruteforce, for example Boyer-Moore, it is good for practical cases, but it has O(n * m) running time in worst case.
UPD:
In case you don't need fast approaches, I corrected your code from description:
public static int timesOccur(int[] small, int big[]) {
int sum = 0;
for (int i = 0; i < big.length - small.length + 1; i++) {
int j = 0;
while (j < small.length && small[j] == big[i + j]) {
j++;
}
if (j == small.length) {
sum++;
}
}
return sum;
}
Pay attention on the inner while loop. It stops as soon as elements don't match. It's important optimization, as it makes running time almost linear for best cases.
upd2: inner loop explanation.
The purpose of inner loop is to find out if smaller array matches bigger array starting from position i. To perform that check index j is iterated from 0 to length of smaller array, comparing the element j of the smaller array with the corresponding element i + j of the bigger array. Loop proceeds when both conditions are true at the same time: j < small.length and corresponding elements of two arrays match.
So loop stops in two situations:
j < small.length is false. This means that j==small.length. Also it means that for all j=0..small.length-1 elements of the two arrays matched (otherwise loop would break earlier, see (2) below).
small[j] == big[i + j] is false. This means that match was not found. In this case loop will break before j reaches small.length
After the loop it's sufficient to check whether j==small.length to know which condition made loop to stop and hence know whether match was found or not for current position i.
This is a simple subarray matching problem. In Java you can use Collections.indexOfSublist, but you would have to box all the integers in your array. An option is to implement your own array matching algorithm. There are several options, most string searching algorithms can be adapted to this task.
Here is an optimized version based on the KMP algorithm. In the worst case it will be O(n + m), which is better than the trivial algorithm. But it has the downside of requiring extra space to compute the failure function (F).
public class Main {
public static class KMP {
private final int F[];
private final int[] needle;
public KMP(int[] needle) {
this.needle = needle;
this.F = new int[needle.length + 1];
F[0] = 0;
F[1] = 0;
int i = 1, j = 0;
while (i < needle.length) {
if (needle[i] == needle[j])
F[++i] = ++j;
else if (j == 0)
F[++i] = 0;
else
j = F[j];
}
}
public int countAt(int[] haystack) {
int count = 0;
int i = 0, j = 0;
int n = haystack.length, m = needle.length;
while (i - j <= n - m) {
while (j < m) {
if (needle[j] == haystack[i]) {
i++;
j++;
} else break;
}
if (j == m) count++;
else if (j == 0) i++;
j = F[j];
}
return count;
}
}
public static void main(String[] args) {
System.out.println(new KMP(new int[]{1, 2}).countAt(new int[]{1, 2, 3, 4, 1, 2, 1, 3}));
System.out.println(new KMP(new int[]{1, 1}).countAt(new int[]{1, 1, 1}));
}
}
Rather than posting a solution I'll provide some hints to get your moving.
It's worth breaking the problem down into smaller pieces, in general your algorithm should look like:
for each position in the big array
check if the small array matches that position
if it does, increment your counter
The smaller piece is then checking if the small array matches a given position
first check if there's enough room to fit the smaller array
if not then the arrays don't match
otherwise for each position in the smaller array
check if the values in the arrays match
if not then the arrays don't match
if you get to the end of the smaller array and they have all matched
then the arrays match
Though not thoroughly tested I believe this is a solution to your problem. I would highly recommend using Sprinters pseudocode to try and figure this out yourself before using this.
public static void main(String[] args)
{
int[] smallArray = {1,1};
int[] bigArray = {1,1,1};
int sum = 0;
for(int i = 0; i < bigArray.length; i++)
{
boolean flag = true;
if(bigArray[i] == smallArray[0])
{
for(int x = 0; x < smallArray.length; x++)
{
if(i + x >= bigArray.length)
flag = false;
else if(bigArray[i + x] != smallArray[x])
flag = false;
}
if(flag)
sum += 1;
}
}
System.out.println(sum);
}
}
I tried to find the smallest element in an integer array using what i understood about divide and conquor algorithm.
I am getting correct results.
But i am not sure if it is a conventional way of using divide and conquor algorithm.
If there is any other smarter way of implementing divide and conquor algorithm than what i have tried then please let me know it.
public static int smallest(int[] array){
int i = 0;
int array1[] = new int[array.length/2];
int array2[] = new int[array.length - (array.length/2)];
for(int index = 0; index < array.length/2 ; index++){
array1[index] = array[index];
}
for(int index = array.length/2; index < array.length; index++){
array2[i] = array[index];
i++;
}
if(array.length > 1){
if(smallest(array1) < smallest(array2)){
return smallest(array1);
}else{
return smallest(array2);
}
}
return array[0];
}
Your code is correct, but You can write less code using existing functions like Arrays.copyOfRange and Math.min
public static int smallest(int[] array) {
if (array.length == 1) {
return array[0];
}
int array1[] = Arrays.copyOfRange(array, 0, array.length / 2);
int array2[] = Arrays.copyOfRange(array, array.length / 2, array.length);
return Math.min(smallest(array1), smallest(array2));
}
Another point. Testing for the length == 1 at the beginning is more readable version. Functionally it is identical. From a performance point of view it creates less arrays, exiting as soon as possible from the smallest function.
It is also possible to use a different form of recursion where it is not necessary to create new arrays.
private static int smallest(int[] array, int from, int to) {
if (from == to) {
return array[from];
}
int middle = from + (to - from) / 2;
return Math.min(smallest(array, from, middle), smallest(array, middle + 1, to));
}
public static int smallest(int[] array){
return smallest(array, 0, array.length - 1);
}
This second version is more efficient because it doesn't creates new arrays.
I don't find any use in using a divide and conquer in this paticular program.
Anyhow you search for the whole array from 1 to N, but in two steps
1. 1 to N / 2
2. N / 2 + 1 to N
This is equivalent to 1 to N.
Also you program check for few additional checks after the loops which aren't actually required when you do it directly.
int min = a[0];
for(int i = 1; i < arr.length; i++)
if(min < a[i])
a[i] = min;
This is considered most efficient in finding out the minimum value.
When do I use divide and conquer
A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems, until these become simple enough to be solved directly.
Consider the Merge Sort Algorithm.
Here, we divide the problem step by step untill we get smaller problem and then we combine them to sort them. In this case this is considered optimal. The normal runs in a O(n * n) and this runs in O(n log n).
But in finding the minimum the original has O(n). So this is good.
Divide And Conquer
The book
Data Structures and Algorithm Analysis in Java, 2nd edtition, Mark Allen Weiss
Says that a D&C algorithm should have two disjoint recursive calls. I.e like QuickSort. The above algorithm does not have this, even if it can be implemented recursively.
What you did here with code is correct. But there are more efficient ways of solving this code, of which i'm sure you're aware of.
Although divide and conquer algorithm can be applied to this problem, but it is more suited for complex data problem or to understand a difficult data problem by dividing it into smaller fragments. One prime example would be 'Tower of Hanoi'.
As far as your code is concerned, it is correct. Here's another copy of same code-
public class SmallestInteger {
public static void main(String[] args) {
int small ;
int array[] = {4,-2,8,3,56,34,67,84} ;
small = smallest(array) ;
System.out.println("The smallest integers is = " + small) ;
}
public static int smallest(int[] array) {
int array1[] = new int[array.length/2];
int array2[] = new int[array.length - (array.length/2)];
for (int index = 0; index < array.length/2 ; index++) {
array1[index] = array[index];
}
for (int index = array.length/2; index < array.length; index++) {
array2[index - array.length/2] = array[index] ;
}
if (array.length > 1) {
if(smallest(array1) < smallest(array2)) {
return smallest(array1) ;
}
else {
return smallest(array2) ;
}
}
return array[0] ;
}
}
Result came out to be-
The smallest integers is = -2
I asked this question before, but my post was cluttered with a whole bunch of other code and wasn't clearly presented, so I'm going to try again. Sorry, I'm new here
Shell sort, how I wrote it, only works sometimes. Array a is an array of 100 integers unsorted, inc is an array of 4 integers whose values are the intervals that shell sort should use (they descend and the final value is always 1), count is an array which stores the counts for different runs of shell sort, cnt represents the count value which should be updated for this run of shell sort.
When I run shell sort multiple times, with different sets of 4 intervals, only sometimes does the sort fully work. Half the time the array is fully sorted, the other half of the time the array is partially sorted.
Can anyone help? Thanks in advance!
public static void shellSort(int[] a, int[] inc, int[] count, int cnt) {
for (int k = 0; k < inc.length; k++) {
for (int i = inc[k], j; i < a.length; i += inc[k]) {
int tmp = a[i];
count[cnt] += 1;
for (j = i - inc[k]; j >= 0; j -= inc[k]) {
if (a[j] <= tmp)
break;
a[j + inc[k]] = a[j];
count[cnt] += 1;
}
a[j + inc[k]] = tmp;
count[cnt] += 1;
}
}
}
One problem is that you're only sorting one inc[k]-step sequence for each k, while you should sort them all (you're only sorting {a[0], a[s], a[2*s], ... , a[m*s]}, leaving out {a[1], a[s+1], ... , a[m*s+1]} etc.). However, that should only influence performance (number of operations), not the outcome, since the last pass is a classical insertion sort (inc[inc.length-1] == 1), so that should sort the array no matter what happened before.
I don't see anything in the code that would cause failure. Maybe the inc array doesn't contain what it should? If you print out inc[k] in each iteration of the outer loop, do you get the expected output?
There is an error in your i loop control:
for (int i = inc[k], j; i < a.length; i += inc[k]) {
Should be:
for (int i = inc[k], j; i < a.length; i++) {
The inner j loop handles the comparison of elements that are inc[k] apart. The outer i loop should simply increment by 1, the same as the outer loop of a standard Insertion sort.
In fact, the final pass of Shellsort with an increment of 1 is identical to a standard Insertion sort.