Related
We would be given an array of integers and a value k. We need to find the total number of sub-arrays whose sum equals k.
I found some interesting code online (on Leetcode) which is as follows:
public class Solution {
public int subarraySum(int[] nums, int k) {
int sum = 0, result = 0;
Map<Integer, Integer> preSum = new HashMap<>();
preSum.put(0, 1);
for (int i = 0; i < nums.length; i++) {
sum += nums[i];
if (preSum.containsKey(sum - k)) {
result += preSum.get(sum - k);
}
preSum.put(sum, preSum.getOrDefault(sum, 0) + 1);
}
return result;
}
}
To understand it, I walked through some specific examples like [1,1,1,1,1] with k=3 and [1,2,3,0,3,2,6] with k=6. While the code works perfectly in both the cases, I fail to follow how it actually computes the output.
I have two specific points of confusion:
1) Why does the code continuously add the values in the array, without ever zeroing it out? For example, in case of [1,1,1,1,1] with k=3, once sum=3, don't we need to reset sum to zero? Doesn't not resetting sum interfere with finding later subarrays?
2) Shouldn't we simply do result++ when we find a subarray of sum k? Why do we add preSum.get(sum-k) instead?
Let's handle your first point of confusion first:
The reason the code keeps summing the array and doesn't reset sum is because we are saving the sum in preSum (previous sums) as we go. Then, any time we get to a point where sum-k is a previous sum (say at index i), we know that the sum between index i and our current index is exactly k.
For example, in the image below with i=2, and our current index equal to 4, we can see that since 9, the sum at our current index, minus 3, the sum at index i, is 6, the sum between indexes 2 and 4 (inclusive) is 6.
Another way to think about this is to see that discarding [1,2] from the array (at our current index of 4) gives us a subarray of sum 6, for similar reasons as above (see image for details).
Using this method of thinking, we can say we want to discard from the front of the array until we are left with a subarray of sum k. We could do this by saying, for each index, "discard just 1, then discard 1+2, then discard 1+2+3, etc" (these numbers are from our example) until we found a subarray of sum k (k=6 in our example).
That gives a perfectly valid solution, but notice we would be doing this at every index of our array, and thus summing the same numbers over and over. A way to save computation would be to save these sums for later use. Even better, we already sum these same numbers to get our current sum, so we can just save that total as we go.
To find a subarray, we can just look through our saved sums, subtracting them and testing if what we are left with is k. It is a bit annoying to have to subtract every saved sum, so we can use the commutativity of subtraction to see that if sum-x=k is true, sum-k=x is also true. This way we can just see if x is a saved sum, and, if it is, know we have found a subarray of size k. A hash map makes this lookup efficient.
Now for your second point of confusion:
Most of the time you are right, upon finding an appropriate subarray we could just do result++. Almost always, the values in preSum will be 1, so result+=preSum.get(sum-k) will be equivalent to result+=1, or result++.
The only time it isn't is when preSum.put is called on a sum that has been reached before. How can we get back to a sum we already had? The only way is with either negative numbers, which cancel out previous numbers, or with zero, which doesn't affect the sum at all.
Basically, we get back to a previous sum when a subarray's sum is equal to 0. Two examples of such subarrays are [2,-2] or the trivial [0]. With such a subarray, when we find a later, adjoining subarray with sum k, we need to add more than 1 to result as we have found more than one new subarray, one with the zero-sum subarray (sum=k+0) and one without it (sum=k).
This is the reason for that +1 in the preSum.put as well. Every time we reach the same sum again, we have found another zero-sum subarray. With two zero-sum subarrays, finding a new adjoining subarray with sum=k actually gives 3 subarrays: the new subarray (sum=k), the new subarray plus the first zero-sum (sum=k+0), and the original with both zero-sums (sum=k+0+0). This logic holds for higher numbers of zero-sum subarrays as well.
Consider this method:
public static int[] countPairs(int min, int max) {
int lastIndex = primes.size() - 1;
int i = 0;
int howManyPairs[] = new int[(max-min)+1];
for(int outer : primes) {
for(int inner : primes.subList(i, lastIndex)) {
int sum = outer + inner;
if(sum > max)
break;
if(sum >= min && sum <= max)
howManyPairs[sum - min]++;
}
i++;
}
return howManyPairs;
}
As you can see, I have to count how many times each number between min and max can be expressed as a sum of two primes.
primes is an ArrayList with all primes between 2 and 2000000. In this case, min is 1000000 and max is 2000000, that's why primes goes until 2000000.
My method works fine, but the goal here is to do something faster.
My method takes two loops, one inside the other, and it makes my algorithm an O(n²). It sucks like bubblesort.
How can I rewrite my code to accomplish the same result with a better complexity, like O(nlogn)?
One last thing: I'm coding in Java, but your reply can be in also Python, VB.Net, C#, Ruby, C or even just a explanation in English.
For each number x between min and max, we want to compute the number of ways x can be written as the sum of two primes. This number can also be expressed as
sum(prime(n)*prime(x-n) for n in xrange(x+1))
where prime(x) is 1 if x is prime and 0 otherwise. Instead of counting the number of ways that two primes add up to x, we consider all ways two nonnegative integers add up to x, and add 1 to the sum if the two integers are prime.
This isn't a more efficient way to do the computation. However, putting it in this form helps us recognize that the output we want is the discrete convolution of two sequences. Specifically, if p is the infinite sequence such that p[x] == prime(x), then the convolution of p with itself is the sequence such that
convolve(p, p)[x] == sum(p[n]*p[x-n] for n in xrange(x+1))
or, substituting the definition of p,
convolve(p, p)[x] == sum(prime(n)*prime(x-n) for n in xrange(x+1))
In other words, convolving p with itself produces the sequence of numbers we want to compute.
The straightforward way to compute a convolution is pretty much what you were doing, but there are much faster ways. For n-element sequences, a fast Fourier transform-based algorithm can compute the convolution in O(n*log(n)) time instead of O(n**2) time. Unfortunately, this is where my explanation ends. Fast Fourier transforms are kind of hard to explain even when you have proper mathematical notation available, and as my memory of the Cooley-Tukey algorithm isn't as precise as I'd like it to be, I can't really do it justice.
If you want to read more about convolution and Fourier transforms, particularly the Cooley-Tukey FFT algorithm, the Wikipedia articles I've just linked would be a decent start. If you just want to use a faster algorithm, your best bet would be to get a library that does it. In Python, I know scipy.signal.fftconvolve would do the job; in other languages, you could probably find a library pretty quickly through your search engine of choice.
What you´re searching is the count of Goldbach partitions for each number
in your range, and imho there is no efficient algorithm for it.
Uneven numbers have 0, even numbers below 4*10^18 are guaranteed to have more than 0,
but other than that... to start with, if even numbers (bigger than 4*10^18) with 0 partitions exist
is an unsolved problem since 1700-something, and such things as exact numbers are even more complicated.
There are some asymptotic and heuristic solutions, but if you want the exact number,
other than getting more CPU and RAM, there isn´t be much you can do.
The other answers have an outer loop that goes from N to M. It's more efficient, however, for the outer loop (or loops) to be pairs of primes, used to build a list of numbers between N and M that equal their sums.
Since I don't know Java I'll give a solution in Ruby for a specific example. That should allow anyone interested to implement the algorithm in Java, regardless of whether they know Ruby.
I initially assume that two primes whose product equals a number between M and N must be unique. In other words, 4 cannot be express as 4 = 2+2.
Use Ruby's prime number library.
require 'prime'
Assume M and N are 5 and 50.
lower = 5
upper = 50
Compute the prime numbers up to upper-2 #=> 48, the 2 being the first prime number.
primes = Prime.each.take_while { |p| p < upper-2 }
#=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
Construct an enumerator of all combinations of two primes.
enum = primes.combination(2)
=> #<Enumerator: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]:combination(2)>
We can see the elements that will be generated by this enumerator by converting it to an array.
enum.to_a
#=> [[2, 3], [2, 5],..., [2, 47], [3, 5],..., [43, 47]] (105 elements)
Just think of enum as an array.
Now construct a counting hash whose keys are numbers between lower and upper for which there is at least one pair of primes that sum to that number and whose values are the numbers of pairs of primes that sum to the value of the associated key.
enum.each_with_object(Hash.new(0)) do |(x,y),h|
sum = x+y
h[sum] += 1 if (lower..upper).cover?(sum)
end
#=> {5=>1, 7=>1, 9=>1, 13=>1, 15=>1, 19=>1, 21=>1, 25=>1, 31=>1, 33=>1,
# 39=>1, 43=>1, 45=>1, 49=>1, 8=>1, 10=>1, 14=>1, 16=>2, 20=>2, 22=>2,
# 26=>2, 32=>2, 34=>3, 40=>3, 44=>3, 46=>3, 50=>4, 12=>1, 18=>2, 24=>3,
# 28=>2, 36=>4, 42=>4, 48=>5, 30=>3, 38=>1}
This shows, for example, that there are two ways that 16 can be expressed as the sum of two primes (3+13 and 5+11), three ways for 34 (3+31, 5+29 and 11+23) and no way for 6.
If the two primes being summed need not be unique (e.g., 4=2+2 is to be included), only a slight change is needed.
arr = primes.combination(2).to_a.concat(primes.zip(primes))
whose sorted values are
a = arr.sort
#=> [[2, 2], [2, 3], [2, 5], [2, 7],..., [3, 3],..., [5, 5],.., [47, 47]] (120 elements)
then
a.each_with_object(Hash.new(0)) do |(x,y),h|
sum = x+y
h[sum] += 1 if (lower..upper).cover?(sum)
end
#=> {5=>1, 7=>1, 9=>1, 13=>1, 15=>1, 19=>1, 21=>1, 25=>1, 31=>1, 33=>1,
# 39=>1, 43=>1, 45=>1, 49=>1, 6=>1, 8=>1, 10=>2, 14=>2, 16=>2, 20=>2,
# 22=>3, 26=>3, 32=>2, 34=>4, 40=>3, 44=>3, 46=>4, 50=>4, 12=>1, 18=>2,
# 24=>3, 28=>2, 36=>4, 42=>4, 48=>5, 30=>3, 38=>2}
a should be replaced by arr. I used a here merely to order the elements of the resulting hash so that it would be easier to read.
Since I just wanted to describe the approach, I used a brute force method to enumerate the pairs of elements of primes, throwing away 44 of the 120 pairs of primes because their sums fall outside the range 5..50 (a.count { |x,y| !(lower..upper).cover?(x+y) } #=> 44). Clearly, there's considerable room for improvement.
A sum of two primes means N = A + B, where A and B are primes, and A < B, which means A < N / 2 and B > N / 2. Note that they can't be equal to N / 2.
So, your outer loop should only loop from 1 to floor((N - 1) / 2). In integer math, the floor is automatic.
Your inner loop can be eliminated if the primes are stored in a Set. Assuming your array is sorted (fair assumption), use a LinkedHashSet, such that iterating the set in the outer loop can stop at (N - 1) / 2.
I'll leave it up to you to code this.
Update
Sorry, the above is an answer to the problem of finding A and B for a particular N. Your question was to find all N between min and max (inclusive).
If you follow to logic of the above, you should be able to apply that to your problem.
Outer loop should be from 1 to max / 2.
Inner loop should be from min - outer to max - outer.
To find the starting point of the inner loop, you can keep some extra index variables around, or you can rely on your prime array being sorted and use Arrays.binarySearch(primes, min - outer). First option is likely a little bit faster, but second option is definitely simpler.
The problem is given an unsorted array, give subsets of array that can produce target sum:
For eg:
target = 15
data = {3,4,5,7,1,2,9};
Expected results (note the results are sorted for simplicity. not a requirement) :
[1, 2, 3, 4, 5]
[1, 2, 3, 9]
[1, 2, 5, 7]
[1, 3, 4, 7]
[1, 5, 9]
[2, 4, 9]
[3, 5, 7]
Here is my naive approach to this problem - simple and brute force.
public static void naiveSubset(int[] arr, int target){
int sum=0;
List<Integer> result = new ArrayList<>();
for (int i=0; i< arr.length;i++){
sum =arr[i];
result.add(arr[i]);
for (int j=0;j<arr.length;i++){
if (sum==target){
System.out.println(result);
result.clear();
break;
}
else if (i!=j && sum+arr[j] <= target){
sum+=arr[j];
result.add(arr[j]);
}
}
}
}
For some reasons, I am not expecting the results. I tried browsing through the code to dig out any issues. But I could not find any. please algo experts, point me in correct direction!!
The results I get (for same input as above)
[3, 3, 3, 3, 3]
[9, 3, 3]
Your solution is wrong because it's a greedy approach. It decides if you should add a number or not based on the fact that adding it does not violate the sum, at the moment.
However, this greedy approach does not work, with a simple example of the following array: [1,9,6,5] and with sum=11.
Note that for any element you choose in the outer loop, next you will add 1 to the current set. But that will deny you the possibility to get the sum of 5+6.
Once you choose 5, you start adding number, starting with '1', and adding it. Once it is added - you will never get the correct solution.
Also note: Your double loop approach can generate at most O(n^2) different subsets, but there could be exponential number of subsets - so something must be wrong.
If you want to get all possible subsets that sum to the given sum, you can use a recursive solution.
At each step "guess" if the current element is in the set or not, and recurse for both options for the smaller problem - if the data is in the set, or if it's not.
Here is a simple java code that does it:
public static void getAllSubsets(int[] elements, int sum) {
getAllSubsets(elements, 0, sum, new Stack<Integer>());
}
private static void getAllSubsets(int[] elements, int i, int sum, Stack<Integer> currentSol) {
//stop clauses:
if (sum == 0 && i == elements.length) System.out.println(currentSol);
//if elements must be positive, you can trim search here if sum became negative
if (i == elements.length) return;
//"guess" the current element in the list:
currentSol.add(elements[i]);
getAllSubsets(elements, i+1, sum-elements[i], currentSol);
//"guess" the current element is not in the list:
currentSol.pop();
getAllSubsets(elements, i+1, sum, currentSol);
}
Note that if you are looking for all subsets, there could be exponential number of those - so an inefficient and exponential time solution is expected.
If you are looking for finding if such a set exist, or finding only one such set, this can be done much more efficiently using Dynamic Programming. This thread explains the logic of how it can be done.
Note that the problem is still NP-Hard, and the "efficient" solution is actually only pseudo-polynomial.
I think the major issue in your previous approach is that simply doing loops based upon the input array will not cover all the combinations of numbers matching the target value. For example, if your major loop is in ith, and after you iterate through the jth element in your secondary loop, your future combination based on what you have collected through ith element will never include jth one anymore. Intuitively speaking, this algorithm will collect all the visible combinations through numbers near each other, but not far away from each other.
I wrote a iterative approach to cope with this subset sum problem through C++ (sorry, not have a java environment at hand:P), the idea is basically the same as the recurrsive approach, which means you would record all the existing number combinations during each iteration in your loop. I have one vector<vector> intermediate used to record all the encountered combination whose value is smaller than target, and vector<vector> final used to record all the combinations whose sum is equal to target.
The detailed explanation is recorded inline:
/* sum the vector elements */
int sum_vec(vector<int> tmp){
int sum = 0;
for(int i = 0; i < tmp.size(); i++)
sum += tmp[i];
return sum;
}
static void naiveSubset(vector<int> arr, int target){
/* sort the array from big to small, easier for us to
* discard combinations bigger than target */
sort(arr.begin(), arr.end(), greater<int>());
int sum=0;
vector<vector<int> > intermediate;
vector<vector<int> > final;
for (int i=0; i< arr.size();i++){
int curr_intermediate_size = intermediate.size();
for(int j = 0; j < curr_intermediate_size; j++){
int tmpsum = sum_vec(intermediate[j]);
/* For each selected array element, loop through all
* the combinations at hand which are smaller than target,
* dup the combination, put it into either intermediate or
* final based on the sum */
vector<int> new_comb(intermediate[j]);
if(tmpsum + arr[i] <= target){
new_comb.push_back(arr[i]);
if(tmpsum + arr[i] == target)
final.push_back(new_comb);
else
intermediate.push_back(new_comb);
}
}
/* finally make the new selected element a separate entry
* and based on its value, to insert it into either intermediate
* or final */
if(arr[i] <= target){
vector<int> tmp;
tmp.push_back(arr[i]);
if(arr[i] == target)
final.push_back(tmp);
else
intermediate.push_back(tmp);
}
}
/* we could print the final here */
}
Just wrote it so please bear with me if there is any corner case that I did not consider well. Hope this helps:)
I have been given 3 algorithms to reverse engineer and explain how they work, so far I have worked out that I have been given a quick sorting algorithm and a bubble sorting algorithm; however i'm not sure what algorithm this is. I understand how the quick sort and bubble sort work, but I just can't get my head around this algorithm. I'm unsure what the variables are and was hoping someone out there would be able to tell me whats going on here:
public static ArrayList<Integer> SortB(ArrayList<Integer> a)
{
ArrayList<Integer> array = CopyArray(a);
Integer[] zero = new Integer[a.size()];
Integer[] one = new Integer[a.size()];
int i,b;
Integer x,p;
//Change from 8 to 32 for whole integers - will run 4 times slower
for(b=0;b<8;++b)
{
int zc = 0;
int oc = 0;
for(i=0;i<array.size();++i)
{
x = array.get(i);
p = 1 << b;
if ((x & p) == 0)
{
zero[zc++] = array.get(i);
}
else
{
one[oc++] = array.get(i);
}
}
for(i=0;i<oc;++i) array.set(i,one[i]);
for(i=0;i<zc;++i) array.set(i+oc,zero[i]);
}
return(array);
}
This is a Radix Sort, limited to the least significant eight bits. It does not complete the sort unless you change the loop to go 32 times instead of 8.
Each iteration processes a single bit b. It prepares a mask called p by shifting 1 left b times. This produces a power of two - 1, 2, 4, 8, ..., or 1, 10, 100, 1000, 10000, ... in binary.
For each bit, the number of elements in the original array with bit b set to 1 and to 0 are separated into two buckets called one and zero. Once the separation is over, the elements are placed back into the original array, and the algorithm proceeds to the next iteration.
This implementation uses two times more storage than the size of the original array, and goes through the array a total of 16 times (64 times in the full version - once for reading and once for writing of data for each bit). The asymptotic complexity of the algorithm is linear.
Looks like a bit-by-bit radix sort to me, but it seems to be sorting backwards.
I am trying to count the occurrences of integers in an array. I was able to get it to work by piecing together some code I found online but I don't really understand why its working. What I have is:
int[] hand = {2, 4, 3, 2, 4};
int[] numOccurence = new int[hand.length];
for (int i = 0; i < hand.length; i++)
numOccurence[hand[i]]++;
for (int i = 1; i < numOccurence.length; i++)
if (numOccurence[i] > 0)
System.out.println("The number " + i + " occurs " + numOccurence[i] + " times.");
The output is:
The number 2 occurs 2 times.
The number 3 occurs 1 times.
The number 4 occurs 2 times.
How is this code counting the number of occurrences properly? I don't see how its accomplishing this. Thank you in advance!
This is only working because you've a good luck. Try making the second element in the hand array as 5 and see what happens. Its because the number present at the current index of hand is taken as the index of array numOccurence. In case of a number greater than or equal to the length of the numOccurence, you'll get the ArrayIndexOutOfBoundsException.
Thereforce, you can better use a Map for this where the key would be the number and the value could be its count.
Something like this:-
Map<Integer, Integer> numOccurence = new HashMap<Integer, Integer>();
for (int i = 0; i < hand.length; i++) {
int cnt = 1;
if (numOccurence.containsKey(hand[i])) {
cnt = numOccurence.get(hand[i]);
cnt++;
}
numOccurence.put(hand[i], cnt);
}
This code does not really work. At least it does for the author's use case but probably not for yours.
Try with {2, 4, 99, 2, 4}; as hand and it will fail.
The author takes the number found in hand as the index of array numOccurence.
numOccurence has the following structure : {nb occ of 0; nb occs of 1;...; nb occs of 4}. Here 99 will be out of bounds.
When you create an array
int[] numOccurence = new int[hand.length];
it is populated by their default values. For primitive int this value is 0.
This will of course only work if hand contains numbers less than or equal to max index (length -1) of the array otherwise it's ArrayIndexOutOfBound for you mister!
Actually it's the same method for creating histogram for picture ;)
You create a table where you will gather the occurrence.
numOccurence[0] will stock the number of 0
numOccurence[1] will stock the number of 1
etc.
That's what is done by this
for (int i = 0; i < hand.length; i++)
numOccurence[hand[i]]++;
it adds 1 to the value in the case corresponding to the number hand[i]
so if you look at this step by step
first he will take hand[0] = 2
so he will put
numOccurence[2] = numOccurence[2] + 1 ;
which is same (but faster to write) as
numOccurence[2]++;
This kind of performing a count is called counting sort.
The advantage of counting sort is it's speed. The disadvantage is the memory requirements when sorting big numbers.
There is a bug in the code:
int[] numOccurence = new int[hand.length];
numOccurence needs to be as long as the highest number in the list (not the number of numbers in the list). Try changing one of the numbers to 15 and you will get an exception.
The code iterates through the given array hand. it takes each value encountered as an index into the array numOccurrence. for each number n in hand, this will happen exactly as often as n occurs in hand, and each time this happens, the nth element of numOccurrence will be incremented.
thus numOccurrence is effectively an array of counters (assuming that the array elements are initialized with 0).
drawbacks of this approach:
the number of counters allocated depends on the magnitude of numbers in your handarray.
if the numbers in your hand array are distributed sparsely, most of the allocated space is never used.
alternative
you could improve the code by sorting hands first. in the sorted array the indexes of all occurrences of a given number are contiguous, so you scan the sorted array once needing a single counter only to compile the frequencies.
First of all the code is wrong. You should set the size of numOccurence array to the max number value from hand array + 1. For example:
int[] hand = {2, 100};
int[] numOccurence[] = new int[101];
(you should obviously find max number programatically)
Now let's take a look at the algorithm.
It takes each number from hand array, treats it as a numOccurence index value and increments number at that index by 1 in hand array. Note that all elements of numOccurence array are 0 by default at the beginning.
int[] hand = {2, 4, 3, 2, 4};
int[] numOccurence = new int[5];
Steps:
i = 0 (nothing happens, because there is no 0 in hand array)
i = 1 (same situation as for 0)
i = 2 (there are two 2 numbers in hand array, so we do operation numOccurence[2] += 1 twice, which in result gives 0 + 1 + 1 = 2. So we got numOccurence[2] = 2)
it continues for all numbers from 0 to max number from hand array (here: 100).