Finding number of subarrays whose sum equals `k`

Finding number of subarrays whose sum equals `k` - java

We would be given an array of integers and a value k. We need to find the total number of sub-arrays whose sum equals k.
I found some interesting code online (on Leetcode) which is as follows:
public class Solution {
public int subarraySum(int[] nums, int k) {
int sum = 0, result = 0;
Map<Integer, Integer> preSum = new HashMap<>();
preSum.put(0, 1);
for (int i = 0; i < nums.length; i++) {
sum += nums[i];
if (preSum.containsKey(sum - k)) {
result += preSum.get(sum - k);
}
preSum.put(sum, preSum.getOrDefault(sum, 0) + 1);
}
return result;
}
}
To understand it, I walked through some specific examples like [1,1,1,1,1] with k=3 and [1,2,3,0,3,2,6] with k=6. While the code works perfectly in both the cases, I fail to follow how it actually computes the output.
I have two specific points of confusion:
1) Why does the code continuously add the values in the array, without ever zeroing it out? For example, in case of [1,1,1,1,1] with k=3, once sum=3, don't we need to reset sum to zero? Doesn't not resetting sum interfere with finding later subarrays?
2) Shouldn't we simply do result++ when we find a subarray of sum k? Why do we add preSum.get(sum-k) instead?

Let's handle your first point of confusion first:
The reason the code keeps summing the array and doesn't reset sum is because we are saving the sum in preSum (previous sums) as we go. Then, any time we get to a point where sum-k is a previous sum (say at index i), we know that the sum between index i and our current index is exactly k.
For example, in the image below with i=2, and our current index equal to 4, we can see that since 9, the sum at our current index, minus 3, the sum at index i, is 6, the sum between indexes 2 and 4 (inclusive) is 6.
Another way to think about this is to see that discarding [1,2] from the array (at our current index of 4) gives us a subarray of sum 6, for similar reasons as above (see image for details).
Using this method of thinking, we can say we want to discard from the front of the array until we are left with a subarray of sum k. We could do this by saying, for each index, "discard just 1, then discard 1+2, then discard 1+2+3, etc" (these numbers are from our example) until we found a subarray of sum k (k=6 in our example).
That gives a perfectly valid solution, but notice we would be doing this at every index of our array, and thus summing the same numbers over and over. A way to save computation would be to save these sums for later use. Even better, we already sum these same numbers to get our current sum, so we can just save that total as we go.
To find a subarray, we can just look through our saved sums, subtracting them and testing if what we are left with is k. It is a bit annoying to have to subtract every saved sum, so we can use the commutativity of subtraction to see that if sum-x=k is true, sum-k=x is also true. This way we can just see if x is a saved sum, and, if it is, know we have found a subarray of size k. A hash map makes this lookup efficient.
Now for your second point of confusion:
Most of the time you are right, upon finding an appropriate subarray we could just do result++. Almost always, the values in preSum will be 1, so result+=preSum.get(sum-k) will be equivalent to result+=1, or result++.
The only time it isn't is when preSum.put is called on a sum that has been reached before. How can we get back to a sum we already had? The only way is with either negative numbers, which cancel out previous numbers, or with zero, which doesn't affect the sum at all.
Basically, we get back to a previous sum when a subarray's sum is equal to 0. Two examples of such subarrays are [2,-2] or the trivial [0]. With such a subarray, when we find a later, adjoining subarray with sum k, we need to add more than 1 to result as we have found more than one new subarray, one with the zero-sum subarray (sum=k+0) and one without it (sum=k).
This is the reason for that +1 in the preSum.put as well. Every time we reach the same sum again, we have found another zero-sum subarray. With two zero-sum subarrays, finding a new adjoining subarray with sum=k actually gives 3 subarrays: the new subarray (sum=k), the new subarray plus the first zero-sum (sum=k+0), and the original with both zero-sums (sum=k+0+0). This logic holds for higher numbers of zero-sum subarrays as well.

Related

Find two numbers in array x,y where x<y, x repeats at least n/3 times and y at least n/4 times

I have been struggling to solve an array problem with linear time,
The problem is:
Assuming we are given an array A [1...n] write an algorithm that return true if:
There are two numbers in the array x,y that have the following:
x < y
x repeats more than n/3 times
y repeats more than n/4 times
I have tried to write the following java program to do so assuming we have a sorted array but I don't think it is the best implementation.
public static boolean solutionManma(){
int [] arr = {2,2,2,3,3,3};
int n = arr.length;
int xCount = 1;
int yCount = 1;
int maxXcount= xCount,maxYCount = yCount;
int currX = arr[0];
int currY = arr[n-1];
for(int i = 1; i < n-2;i++){
int right = arr[n-2-i+1];
int left = arr[i];
if(currX == left){
xCount++;
}
else{
maxXcount = Math.max(xCount,maxXcount);
xCount = 1;
currX = left;
}
if(currY == right){
yCount++;
}
else {
maxYCount = Math.max(yCount,maxYCount);
yCount = 1;
currY = right;
}
}
return (maxXcount > n/3 && maxYCount > n/4);
}
If anyone has an algorithm idea for this kind of issue (preferably O(n)) I would much appreciate it because I got stuck with this one.

The key part of this problem is to find in linear time and constant space the values which occur more than n/4 times. (Note: the text of your question says "more than" and the title says "at least". Those are not the same condition. This answer is based on the text of your question.)
There are at most three values which occur more than n/4 times, and a list of such values must also include any value which occurs more than n/3 times.
The algorithm we'll use returns a list of up to three values. It only guarantees that all values which satisfy the condition are in the list it returns. The list might include other values, and it does not provide any information about the precise frequencies.
So a second pass is necessary, which scans the vector a second time counting the occurrences of each of the three values returned. Once you have the three counts, it's simple to check whether the smallest value which occurs more than n/3 times (if any) is less than the largest value which occurs more than n/4 times.
To construct the list of candidates, we use a generalisation of the Boyer-Moore majority vote algorithm, which finds a value which occurs more than n/2 times. The generalisation, published in 1982 by J. Misra and D. Gries, uses k-1 counters, each possibly associated with a value, to identify values which might occur more than 1/k times. In this case, k is 4 and so we need three counters.
Initially, all of the counters are 0 and are not associated with any value. Then for each value in the array, we do the following:
If there is a counter associated with that value, we increment it.
If no counter is associated with that value but some counter is at 0, we associate that counter with the value and increment its count to 1.
Otherwise, we decrement every counter's count.
Once all the values have been processed, the values associated with counters with positive counts are the candidate values.
For a general implementation where k is not known in advance, it would be possible to use a hash-table or other key-value map to identify values with counts. But in this case, since it is known that k is a small constant, we can just use a simple vector of three value-count pairs, making this algorithm O(n) time and O(1) space.

I will suggest the following solution, using the following assumption:
In an array of length n there will be at most n different numbers
The key feature will be to count the frequency of occurance for each different input using a histogram with n bins, meaning O(n) space. The algorithm will be as follows:
create a histogram vector with n bins, initialized to zeros
for index ii in the length of the input array a
2.1. Increase the value: hist[a[ii]] +=1
set found_x and found_y to False
for the iith bin in the histogram, check:
4.1. if found_x == False
4.1.1. if hist[ii] > n/3, set found_x = True and set x = ii
4.2. else if found_y == False
4.2.1. if hist[ii] > n/4, set y = ii and return x, y
Explanation
In the first run over the array you document the occurance frequency of all the numbers. In the run over the histogram array, which also has a length of n, you check the occurrence. First you check if there is a number that occurred more than n/3 times and if there is, for the rest of the numbers (by default larger than x due to the documentation in the histogram) you check if there is another number which occurred more than n/4 times. if there is, you return the found x and y and if there isn't you simply return not found after covering all the bins in the histogram.
As far as time complexity, you goover the input array once and you go over the histogram with the same length once, therefore the time complexity is O(n) is requested.

Java Recursion to find MAX sum in array

I am trying to figure out a solution to calculate the highest sum of numbers in an array. However, my limitation is that I cannot use adjacent values in the array.
If I am given the array int [] blocks = new int[] {15, 3, 6, 17, 2, 1, 20}; the highest sum calculated is 52 (15+17+20).
My goal is to go from a recursive solution to a solution that uses dynamic programming, however, I am having trouble with the recursive solution.
The base cases that I have initialized:
if(array.length == 0)
return 0;
if(array.length == 1)
return array[0];
After creating the base cases, I am unsure of how to continue the recursive process.
I initially tried to say that if the array was of certain length, then I can calculate the max(Math.max) of the calculations:
e.g. if array.length = 3
return Math.max(array[0], array[1], array[2], array[0]+ array[2])
The problem I then run into is that I could be given an array of length 100.
How can I use recursion in this problem?

I think recursive solution to your problem is (in pseudocode):
maxsum(A,0) = 0
maxsum(A,1) = A[0]
maxsum(A,k) = max(maxsum(A,k-2)+A[k-1], maxsum(A,k-1)), for k >= 2
Where maxsum(A,k) means the maximal sum for a subarray of array A starting from 0 and having length k. I'm sure you'll easily translate that into Java, it translates almost literally.

Longest sequence of numbers

I was recently asked this question in an interview for which i could give an O(nlogn) solution, but couldn't find a logic for O(n) . Can someone help me with O(n) solution?
In an array find the length of longest sequence of numbers
Example :
Input : 2 4 6 7 3 1
Output: 4 (because 1,2,3,4 is a sequence even though they are not in consecutive positions)
The solution should also be realistic in terms of space consumed . i.e the solution should be realistic even with an array of 1 billion numbers

For non-consecutive numbers you needs a means of sorting them in O(n). In this case you can use BitSet.
int[] ints = {2, 4, 6, 7, 3, 1};
BitSet bs = new BitSet();
IntStream.of(ints).forEach(bs::set);
// you can search for the longer consecutive sequence.
int last = 0, max = 0;
do {
int set = bs.nextSetBit(last);
int clear = bs.nextClearBit(set + 1);
int len = clear - set;
if (len > max)
max = len;
last = clear;
} while (last > 0);
System.out.println(max);

Traverse the array once and build the hash map whose key is a number from the input array and value is a boolean variable indicating whether the element has been processed or not (initially all are false). Traverse once more and do the following: when you check number a, put value true for that element in the hash map and immediately check the hash map for the existence of the elements a-1 and a+1. If found, denote their values in the hash map as true and proceed checking their neighbors, incrementing the length of the current contigous subsequence. Stop when there are no neighbors, and update longest length. Move forward in the array and continue checking unprocessed elements. It is not obvious at the first glance that this solution is O(n), but there are only two array traversals and hash map ensures that every element of the input is processed only once.
Main lesson - if you have to reduce time complexity, it is often neccesary to use additional space.

Checking whether there is a subset of size k of an array which has a sum multiple of n

Good evening, I have an array in java with n integer numbers. I want to check if there is a subset of size k of the entries that satisfies the condition:
The sum of those k entries is a multiple of m.
How may I do this as efficiently as possible? There are n!/k!(n-k)! subsets that I need to check.

You can use dynamic programming. The state is (prefix length, sum modulo m, number of elements in a subset). Transitions are obvious: we either add one more number(increasing the number of elements in a subset and computing new sum modulo m), or we just increase prefix lenght(not adding the current number). If you just need a yes/no answer, you can store only the last layer of values and apply bit optimizations to compute transitions faster. The time complexity is O(n * m * k), or about n * m * k / 64 operations with bit optimizations. The space complexity is O(m * k). It looks feasible for a few thousands of elements. By bit optimizations I mean using things like bitset in C++ that can perform an operation on a group of bits at the same time using bitwise operations.

I don't like this solution, but it may work for your needs
public boolean containsSubset( int[] a , int currentIndex, int currentSum, int depth, int divsor, int maxDepth){
//you could make a, maxDepth, and divisor static as well
//If maxDepthis equal to depth, then our subset has k elements, in addition the sum of
//elements must be divisible by out divsor, m
//If this condition is satisafied, then there exists a subset of size k whose sum is divisible by m
if(depth==maxDepth&&currentSum%divsor==0)
return true;
//If the depth is greater than or equal maxDepth, our subset has more than k elements, thus
//adding more elements can not satisfy the necessary conditions
//additionally we know that if it contains k elements and is divisible by m, it would've satisafied the above condition.
if(depth>=maxdepth)
return false;
//boolean to be returned, initialized to false because we have not found any sets yet
boolean ret = false;
//iterate through all remaining elements of our array
for (int i = currentIndex+1; i < a.length; i++){
//this may be an optimization or this line
//for (int i = currentIndex+1; i < a.length-maxDepth+depth; i++){
//by recursing, we add a[i] to our set we then use an or operation on all our subsets that could
//be constructed from the numbers we have so far so that if any of them satisfy our condition (return true)
//then the value of the variable ret will be true
ret |= containsSubset(a,i,currentSum+a[i],depth+1,divisor, maxDepth);
} //end for
//return the variable storing whether any sets of numbers that could be constructed from the numbers so far.
return ret;
}
Then invoke this method as such
//this invokes our method with "no numbers added to our subset so far" so it will try adding
// all combinations of other elements to determine if the condition is satisfied.
boolean answer = containsSubset(myArray,-1,0,0,m,k);
EDIT:
You could probably optimize this by taking everything modulo (%) m and deleting repeats. For examples with large values of n and/or k, but small values of m, this could be a pretty big optimization.
EDIT 2:
The above optimization I listed isn't helpful. You may need the repeats to get the correct information. My bad.
Happy Coding! Let me know if you have any questions!

If numbers have lower and upper bounds, it might be better to:
Iterate all multiples of n where lower_bound * k < multiple < upper_bound * k
Check if there is a subset with sum multiple in the array (see Subset Sum problem) using dynamic programming.
Complexity is O(k^2 * (lower_bound + upper_bound)^2). This approach can be optimized further, I believe with careful thinking.
Otherwise you can find all subsets of size k. Complexity is O(n!). Using backtracking (pseudocode-ish):
function find_subsets(array, k, index, current_subset):
if current_subset.size = k:
add current_subset to your solutions list
return
if index = array.size:
return
number := array[index]
add number to current_subset
find_subsets(array, k, index + 1, current_subset)
remove number from current_subset
find_subsets(array, k, index + 1, current_subset)

Get confused with nested loops

I know the rationale behind nested loops, but this one just make me confused about the reason it wants to reveal:
public static LinkedList LinkedSort(LinkedList list)
{
for(int k = 1; k < list.size(); k++)
for(int i = 0; i < list.size() - k; i++)
{
if(((Birth)list.get(i)).compareTo(((Birth)list.get(i + 1)))>0)
{
Birth birth = (Birth)list.get(i);
list.set( i, (Birth)list.get( i + 1));
list.set(i + 1, birth);
}
}
return list;
}
Why if i is bigger then i + 1, then swap i and i + 1? I know for this coding, i + 1 equals to k, but then from my view, it is impossible for i greater then k, am i right? And what the run result will be looking like? I'm quite confused what this coding wants to tell me， hope you guys can help me clarify my doubts, thank you.

This method implements a bubble sort. It reorders the elements in the list in ascending order. The exact data to be ordered by is not revealed in this code, the actual comparison is done in Birth#compare.
Lets have a look at the inner loop first. It does the actual sorting. The inner loop iterates over the list, and compares the element at position 0 to the element at position 1, then the element at position 1 to the element at position 2 etc. Each time, if the lower element is larger than the higher one, they are swapped.
After the first full run of the inner loop the largest value in the list now sits at the end of the list, since it was always larger than the the value it was compared to, and was always swapped. (try it with some numbers on paper to see what happens)
The inner loop now has to run again. It can ignore the last element, since we already know it contains the largest value. After the second run the second largest value is sitting the the second-to-last position.
This has to be repeated until the whole list is sorted.
This is what the outer loop is doing. It runs the inner loop for the exact number of times to make sure the list is sorted. It also gives the inner loop the last position it has to compare to ignore the part already sorted. This is just an optimization, the inner loop could just ignore k like this:
for(int i = 0; i < list.size() - 1; i++)
This would give the same result, but would take longer since the inner loop would needlessly compare the already sorted values at the end of the list every time.

Example: you have a list of numbers which you want to sort ascendingly:
4 2 3 1
The first iteration do these swap operations: swap(4, 2), swap(4, 3), swap(4, 1). The intermediate result after the 1st iteration is 2 3 1 4. In other words, we were able to determine which number is the greatest one and we don't need to iterate over the last item of the intermediate result.
In the second iteration, we determine the 2nd greatest number with operations: swap(3, 1). The intermediate result looks then 2 1 3 4.
And the end of the 3rd iteration, we have a sorted list.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.