Efficient Minkowski sum calculation - java

I wonder whether there is an algorithm to efficiently calculate
a discrete 1-dimensional Minkowski sum. The Minkowski sum is defined as:
S + T = { x + y | x in S, y in T }
Could it be that we can represent the sets as lists, sort S and T, and
then do something similarly to computing the union of two sets. i.e. walk
along the sets in parallel and generate the result.
Are there such algorithms known where I don't have to additionally sort the
result to remove overlapping cases x1+y1 = x2+y2? Preferably formulated in Java?

First, the size of the output can be O(nm), if there are no collisions (e.g., A={0, 1, 2, ..., n-1}, B={n, 2*n, 3*n, ...n*n}), so if we depend on n and m, we have no hope of finding a sub-quadratic algorithm.
A straightforward one is computing all pairs (O(nm)), sorting and unique-ing (total of O(nm log nm).
If you have an upper bound M such that x <= M for all x in A union B, we can compute the sum in O(M log M) in the following way.
Generate the characteristic vectors A[i] = 1 ff i \in A, 0 otherwise and similarly for B. Each such vector is of size M.
Compute the convolution of A and B using FFT (time: O(M log M)). Output size is O(M).
Scan output O - at each cell, O[i] is nonzero iff i is an element of the Minkowski sum.
Proof: O[i] != 0 iff there exists k such that A[k] != 0 and B[i-k] != 0, iff k \in A and i-k \in B, iff k + i-k, that is i, is in the Minkowski sum.
(Taken from this paper)

Sort S and T, iterate over S searching for matching elements in T, each time you find a match remove the element from S and T and put it in a new set U. Because they are sorted, once you find a match in T, further comparisons in T can start from the last match.
Now S, T and U are all disjoint. So iterate over S and T adding each one, and S and U, and T and U. Finally iterate over U, and add every element in U by every element in U whose set index is equal to or greater than the current set index.
Sadly the algorithm is still O(n^2) with this optimization. If T and S are identical it will be 2x faster than the naive solution. You also don't have to search in the output set for duplicates.

Related

Find two numbers in array x,y where x<y, x repeats at least n/3 times and y at least n/4 times

I have been struggling to solve an array problem with linear time,
The problem is:
Assuming we are given an array A [1...n] write an algorithm that return true if:
There are two numbers in the array x,y that have the following:
x < y
x repeats more than n/3 times
y repeats more than n/4 times
I have tried to write the following java program to do so assuming we have a sorted array but I don't think it is the best implementation.
public static boolean solutionManma(){
int [] arr = {2,2,2,3,3,3};
int n = arr.length;
int xCount = 1;
int yCount = 1;
int maxXcount= xCount,maxYCount = yCount;
int currX = arr[0];
int currY = arr[n-1];
for(int i = 1; i < n-2;i++){
int right = arr[n-2-i+1];
int left = arr[i];
if(currX == left){
xCount++;
}
else{
maxXcount = Math.max(xCount,maxXcount);
xCount = 1;
currX = left;
}
if(currY == right){
yCount++;
}
else {
maxYCount = Math.max(yCount,maxYCount);
yCount = 1;
currY = right;
}
}
return (maxXcount > n/3 && maxYCount > n/4);
}
If anyone has an algorithm idea for this kind of issue (preferably O(n)) I would much appreciate it because I got stuck with this one.
The key part of this problem is to find in linear time and constant space the values which occur more than n/4 times. (Note: the text of your question says "more than" and the title says "at least". Those are not the same condition. This answer is based on the text of your question.)
There are at most three values which occur more than n/4 times, and a list of such values must also include any value which occurs more than n/3 times.
The algorithm we'll use returns a list of up to three values. It only guarantees that all values which satisfy the condition are in the list it returns. The list might include other values, and it does not provide any information about the precise frequencies.
So a second pass is necessary, which scans the vector a second time counting the occurrences of each of the three values returned. Once you have the three counts, it's simple to check whether the smallest value which occurs more than n/3 times (if any) is less than the largest value which occurs more than n/4 times.
To construct the list of candidates, we use a generalisation of the Boyer-Moore majority vote algorithm, which finds a value which occurs more than n/2 times. The generalisation, published in 1982 by J. Misra and D. Gries, uses k-1 counters, each possibly associated with a value, to identify values which might occur more than 1/k times. In this case, k is 4 and so we need three counters.
Initially, all of the counters are 0 and are not associated with any value. Then for each value in the array, we do the following:
If there is a counter associated with that value, we increment it.
If no counter is associated with that value but some counter is at 0, we associate that counter with the value and increment its count to 1.
Otherwise, we decrement every counter's count.
Once all the values have been processed, the values associated with counters with positive counts are the candidate values.
For a general implementation where k is not known in advance, it would be possible to use a hash-table or other key-value map to identify values with counts. But in this case, since it is known that k is a small constant, we can just use a simple vector of three value-count pairs, making this algorithm O(n) time and O(1) space.
I will suggest the following solution, using the following assumption:
In an array of length n there will be at most n different numbers
The key feature will be to count the frequency of occurance for each different input using a histogram with n bins, meaning O(n) space. The algorithm will be as follows:
create a histogram vector with n bins, initialized to zeros
for index ii in the length of the input array a
2.1. Increase the value: hist[a[ii]] +=1
set found_x and found_y to False
for the iith bin in the histogram, check:
4.1. if found_x == False
4.1.1. if hist[ii] > n/3, set found_x = True and set x = ii
4.2. else if found_y == False
4.2.1. if hist[ii] > n/4, set y = ii and return x, y
Explanation
In the first run over the array you document the occurance frequency of all the numbers. In the run over the histogram array, which also has a length of n, you check the occurrence. First you check if there is a number that occurred more than n/3 times and if there is, for the rest of the numbers (by default larger than x due to the documentation in the histogram) you check if there is another number which occurred more than n/4 times. if there is, you return the found x and y and if there isn't you simply return not found after covering all the bins in the histogram.
As far as time complexity, you goover the input array once and you go over the histogram with the same length once, therefore the time complexity is O(n) is requested.

Find m elements in n arrays by picking at most 1 element out of each of the n arrays

I have n arrays, each of them contain an arbitrary amount of integers. No duplicates are possible within an array ([1,1,2] can't be one of the n arrays).
I also have an integer array of size m which is filled with integers from 1 to m (value_of_an_array_entry = array_index+1). Example: m = 4, the appropriate array is [1,2,3,4].
My question:
For given n and m, is it possible to find every element in the m array by picking at most 1 element out of each of the n arrays?
An Example:
n = 3, m = 3,
The n arrays: [1], [1, 2], [2, 3]
Output should be: Yes
(because we can find every element in the m array by picking at most 1 element out of each of the n arrays. Look at the n arrays and pick 1 from the first array, 2 from the second and 3 from the third array.)
This is an interview question, I received a hint to think about the Max flow problem (I don't see how this helps me).
You can build a graph like this: The graph is divided into the left part and the right part. The left part contains n vertices which represent the n arrays. The right part contains m vertices which represent the m numbers.
Then we consider these n arrays. If element k is contained in the i-th array, we draw an edge between the i-th vertex from the left and the k-th vertex from the right. Our goal is to choose m edges, so that each of the m vertices on the right is covered by the m edges exactly once, and the vertices on the left is covered at most once. This is a bipartite graph maximum matching problem, which can be solved by many algorithms, including max flow.
I think a recursive method should do it.
If m is an empty list, PASS
Otherwise, look for members of m containing the first element of m
if none found: FAIL
for each one found:
this member of m is part of a PASS if there is a PASS for m' = tail(m) and n' = other members of (n)
I haven't tested this, but:
public boolean check(List<Integer> m, List<List<Integer>> n) {
if (m.isEmpty()) {
return true;
}
int head = head(m);
List<Integer> tail = tail(m);
for (List<Integer> nMember : n) {
if (nMember.contains(head) && check(tail, nMinus(n, nMember))) {
return true;
}
}
return false;
}
Assumed methods:
head() returns the first element of the passed list.
tail() returns the passed list with the first element removed.
nMinus() returns a view or copy of n with nMember removed. It should not modify n.
You should use immutable collections, or at least treat them as immutable. Guava provides classes that would probably be useful. But you could quite trivially knock up a ListOmitting list wrapper class with which to implement nMinus() without Guava.
I can't say for sure it's not too brute-forcey, but it "feels" adequately efficient for an interview answer to me.

Checking whether there is a subset of size k of an array which has a sum multiple of n

Good evening, I have an array in java with n integer numbers. I want to check if there is a subset of size k of the entries that satisfies the condition:
The sum of those k entries is a multiple of m.
How may I do this as efficiently as possible? There are n!/k!(n-k)! subsets that I need to check.
You can use dynamic programming. The state is (prefix length, sum modulo m, number of elements in a subset). Transitions are obvious: we either add one more number(increasing the number of elements in a subset and computing new sum modulo m), or we just increase prefix lenght(not adding the current number). If you just need a yes/no answer, you can store only the last layer of values and apply bit optimizations to compute transitions faster. The time complexity is O(n * m * k), or about n * m * k / 64 operations with bit optimizations. The space complexity is O(m * k). It looks feasible for a few thousands of elements. By bit optimizations I mean using things like bitset in C++ that can perform an operation on a group of bits at the same time using bitwise operations.
I don't like this solution, but it may work for your needs
public boolean containsSubset( int[] a , int currentIndex, int currentSum, int depth, int divsor, int maxDepth){
//you could make a, maxDepth, and divisor static as well
//If maxDepthis equal to depth, then our subset has k elements, in addition the sum of
//elements must be divisible by out divsor, m
//If this condition is satisafied, then there exists a subset of size k whose sum is divisible by m
if(depth==maxDepth&&currentSum%divsor==0)
return true;
//If the depth is greater than or equal maxDepth, our subset has more than k elements, thus
//adding more elements can not satisfy the necessary conditions
//additionally we know that if it contains k elements and is divisible by m, it would've satisafied the above condition.
if(depth>=maxdepth)
return false;
//boolean to be returned, initialized to false because we have not found any sets yet
boolean ret = false;
//iterate through all remaining elements of our array
for (int i = currentIndex+1; i < a.length; i++){
//this may be an optimization or this line
//for (int i = currentIndex+1; i < a.length-maxDepth+depth; i++){
//by recursing, we add a[i] to our set we then use an or operation on all our subsets that could
//be constructed from the numbers we have so far so that if any of them satisfy our condition (return true)
//then the value of the variable ret will be true
ret |= containsSubset(a,i,currentSum+a[i],depth+1,divisor, maxDepth);
} //end for
//return the variable storing whether any sets of numbers that could be constructed from the numbers so far.
return ret;
}
Then invoke this method as such
//this invokes our method with "no numbers added to our subset so far" so it will try adding
// all combinations of other elements to determine if the condition is satisfied.
boolean answer = containsSubset(myArray,-1,0,0,m,k);
EDIT:
You could probably optimize this by taking everything modulo (%) m and deleting repeats. For examples with large values of n and/or k, but small values of m, this could be a pretty big optimization.
EDIT 2:
The above optimization I listed isn't helpful. You may need the repeats to get the correct information. My bad.
Happy Coding! Let me know if you have any questions!
If numbers have lower and upper bounds, it might be better to:
Iterate all multiples of n where lower_bound * k < multiple < upper_bound * k
Check if there is a subset with sum multiple in the array (see Subset Sum problem) using dynamic programming.
Complexity is O(k^2 * (lower_bound + upper_bound)^2). This approach can be optimized further, I believe with careful thinking.
Otherwise you can find all subsets of size k. Complexity is O(n!). Using backtracking (pseudocode-ish):
function find_subsets(array, k, index, current_subset):
if current_subset.size = k:
add current_subset to your solutions list
return
if index = array.size:
return
number := array[index]
add number to current_subset
find_subsets(array, k, index + 1, current_subset)
remove number from current_subset
find_subsets(array, k, index + 1, current_subset)

What is the best way to represent a tree in this case?

I am trying to solve this question: https://www.hackerrank.com/challenges/journey-to-the-moon I.e. a problem of finding connected components of a graph. What I have is a list of vertices (from 0 to N-1) and each line in the standard input gives me pair of vertices that are connected by an edge (i.e. if I have 1, 3) it means that vertex 1 and vertex 3 are in one connected component. My question is what is the best way to store the inpit, i.e. how to represent my graph? My idea is to use ArrayList of Arraylist - each position in the array list stores another arraylist of adgecent vertices. This is the code:
public static List<ArrayList<Integer>> graph;
and then in the main() method:
graph = new ArrayList<ArrayList<Integer>>(N);
for (int j = 0; j < N; j++) {
graph.add(new ArrayList<Integer>());
}
//then for each line in the standard input I fill the corresponding values in the array:
for (int j = 0; j < I; j++) {
String[] line2 = br.readLine().split(" ");
int a = Integer.parseInt(line2[0]);
int b = Integer.parseInt(line2[1]);
graph.get(a-1).add(b);
graph.get(b-1).add(a);
}
I'm pretti sure that for solving the question I have to put vertex a at position b-1 and then vertex b at position a-1 so this should not change. But what I am looking for is better way to represent the graph?
Using Java's collections (ArrayList, for example) adds a lot of memory overhead. each Integer object will take at least 12 bytes, in addition to the 4 bytes required for storing the int.
Just use a huge single int array (let's call it edgeArray), which represents the adjacency matrix. Enter 1 when the cell corresponds to an edge. e.g., if nodes k and m is seen on the input, then cell (k, m) will have 1, else 0. In the row major order, it will be the index k * N + m. i.e, edgeArray[k * N + m ] = 1. You can either choose column major order, or row major order. But then your int array will be very sparse. It's trivial to implement a sparse array. Just have an array for the non-zero indices in sorted order. It should be in sorted order so that you can binary search. The number of elements will be in the order of number of edges.
Of course, when you are building the adjacency matrix, you won't know how many edges are there. So you won't be able to allocate the array. Just use a hash set. Don't use HashSet, which is very inefficient. Look at IntOpenHashSet from fastutils. If you are not allowed to use libraries, implement one that is similar to that.
Let us say that the openHashMap variable you will be using is called adjacencyMatrix. So if you see, 3 and 2 and there are 10^6 nodes in total (N = 10^6). then you will just do
adjacencyMatirx.add(3 * 10000000 + 2);
Once you have processed all the inputs, then you can make the sparse adjacency matrix implementation above:
final int[] edgeArray = adjacencyMatrix.toIntArray(new int[adjacencyMatrix.size()]);
IntArrays.sort(edgeArray)
Given an node, finding all adjacent nodes:
So if you need all the nodes connected to node p, you would binary search for the next value that is greater than or equal to p * N (O(log (number of edges))). Then you will just traverse the array until you hit a value that is greater than or equal to (p + 1) * N. All the values you encounter will be nodes connected to p.
Comparing it with the approach you mentioned in your question:
It uses O(N*b) space complexity, where N (number of nodes) and b is the branching factor. It's lower bounded by the number of edges.
For the approach I mentioned, the space complexity is just O(E). In fact it's exactly e number of integers plus the header for the int array.
I used var graph = new Dictionary<long, List<long>>();
See here for complete solution in c# - https://gist.github.com/newton3/a4a7b4e6249d708622c1bd5ea6e4a338
PS - 2 years but just in case someone stumbles into this.

Determine If An Array of Length N is in an Array of Length M [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Algorithm to determine if array contains n…n+m?
Lets assume that M > N, and you have 2 arrays. One of length M called A and one of length N called B. Is there a faster way to find out if array B exists in array A?
For Example:
A = [ 1 2 3 4 5 6 ]
B1 = [ 2 3 4 ]
so array B1 exists in A while something like [ 1 3 2 ] does not.
This is essentially implementing something like isSubstring() in Java with a char array.
The only method that I can think of is in O(n^2) where you compare each element in A with the initial element in B and iterate through B looking for a match.
I guess this question is fairly common in interviews, so my question is asking if there is a faster approach.
You want KMP
algorithm kmp_search:
input:
an array of characters, S (the text to be searched)
an array of characters, W (the word sought)
output:
an integer (the zero-based position in S at which W is found)
define variables:
an integer, m ← 0 (the beginning of the current match in S)
an integer, i ← 0 (the position of the current character in W)
an array of integers, T (the table, computed elsewhere)
while m+i is less than the length of S, do:
if W[i] = S[m + i],
if i equals the (length of W)-1,
return m
let i ← i + 1
otherwise,
let m ← m + i - T[i],
if T[i] is greater than -1,
let i ← T[i]
else
let i ← 0
(if we reach here, we have searched all of S unsuccessfully)
return the length of S
Complexity:
Assuming the prior existence of the table T, the search portion of the
Knuth–Morris–Pratt algorithm has complexity O(k), where k is the
length of S and the O is big-O notation. As except for the fixed
overhead incurred in entering and exiting the function, all the
computations are performed in the while loop, we will calculate a
bound on the number of iterations of this loop; in order to do this we
first make a key observation about the nature of T. By definition it
is constructed so that if a match which had begun at S[m] fails while
comparing S[m + i] to W[i], then the next possible match must begin at
S[m + (i - T[i])]. In particular the next possible match must occur at
a higher index than m, so that T[i] < i. Using this fact, we will show
that the loop can execute at most 2k times. For in each iteration, it
executes one of the two branches in the loop. The first branch
invariably increases i and does not change m, so that the index m + i
of the currently scrutinized character of S is increased. The second
branch adds i - T[i] to m, and as we have seen, this is always a
positive number. Thus the location m of the beginning of the current
potential match is increased. Now, the loop ends if m + i = k;
therefore each branch of the loop can be reached at most k times,
since they respectively increase either m + i or m, and m ≤ m + i: if
m = k, then certainly m + i ≥ k, so that since it increases by unit
increments at most, we must have had m + i = k at some point in the
past, and therefore either way we would be done. Thus the loop
executes at most 2k times, showing that the time complexity of the
search algorithm is O(k).
additional information:
Efficiency of the KMP algorithm
Since the two portions of the algorithm have, respectively,
complexities of O(k) and O(n), the complexity of the overall algorithm
is O(n + k). These complexities are the same, no matter how many
repetitive patterns are in W or S.

Categories