Variant of 3sum with 4 numbers in n^2*log(n) time? - java

I am working on a project for an algorithms course I'm in and I'm completely at an impass. The assignment is to find all sets of 4 numbers in an array for which i+j=k+l in O(n^2*log(n)) time.
I know this is similar to the 3sum problem, where you have to find all the sets in an array for which i+j+k=0. We have discussed a solution to this problem in our lecture that solves it in O(n^2*log(n)) time by iterating through all unique pairs of 2 (n^2 time) and using a binary search on a sorted array to find a value that satisfies the problem (log(n) time).
However, I don't see how the problem with 4 numbers could be solved in this time. I think it's a safe guess that the log(n) in the complexity comes from a binary search, which i'll use for the last one. However, that would mean I have to iterate over all possible combinations of 3 in n^2 time, which I just don't think is possible.
I'm not looking for someone to do my work for me, but maybe if someone knows how this can be solved, they could give me a gentle tap in the right direction. Thanks.
Edit: it might also be helpful to note that I am required to solve this using sorting. Once I do that, I have to write another implementation that makes it faster using hashing, but I think I'll be able to do that just fine on my own. I just need to figure out how I can solve the problem with sorting first.

Keep a sorted list of sums and pairs. For instance, given the list
[1, 2, 4, 8, 16, 32, 9, 82]
we will want to identify 1+9 = 2+8
Identify the largest two numbers in the list, O**(N)**. In this case, they're (82, 32), a sum of 114. Allocate an array pair_sum of 114 locations; set all locations to null pointers.
Now iterate through the list for your i, j pairs. For each pair, insert the two numbers as a tuple-value at index i+j. When you get a collision at some index, you're done: you found a second pair-sum.
I'll outline this in some not-quite-pseudo code; you can translate to your favorite implementation language.
bank = [1, 2, 4, 8, 16, 32, 9, 82]
size = length(bank)
// Find the two largest numbers and allocate the array ...
pair_sum = array[114], initialize to all nulls
for lo in 0:size-1
for hi in lo+1:size
sum = bank[lo] + bank[hi]
if pair_sum[sum]
// There is already a pair with that sum
print pair_sum[sum], "and", bank[lo], bank[hi]
else
// Record new sum pair
pair_sum[sum] = tuple(bank[lo], bank[hi])
This is O(N^2), with bounded space dependent on the array values.
If you aren't allowed to use the sum as an index for practical reasons, I think you can adapt this to a binary search and insertion, giving you the log(n) component.

Related

How to solve the closest subset sum problem in Java for 100+ element arrays?

I have came across a subset sum problem recently. I was able to solve it for smaller arrays using Java earlier, but in this case I have really no idea what should I do. Brute force and recurrence is probably not an option, as I came across out of memory problem.
So, let's say we have an array of {2500, 3200, 3300}. We are looking for the sum closest to the desired number K = 135000. The main difference is that we can use numbers from the array multiple times.
Ok, if we can use them multiple times, then we can change it to more "traditional" way - just divide K by each of these numbers - that is 54, 42 and 40 - and create a new array, which has those numbers the number of times received from dividing. It would be {2500, 2500, 2500, ... , ... 3300, 3300} and the new array would have the length of 136. Now this is much more than 3.
So - how to solve the closest subset sum problem, where we can pick more than 2 numbers from the array of 136 elements or more using Java?
The thing I want to get is not only the closest sum, but also a list of elements which gave that sum.
I heard and was reading about dynamic programing, approximation algorithms and genetic algorithms, but unfortunately I have no idea about those. I did genetic algorithm for a different case some time ago, but I am not sure how to use it in this case.
Any ideas? I will be really glad for help.
I am not going to solve it for you. but I'll give you the key ideas in pseudocode (aka Python).
We start with a state that represents the following statement: "I don't know how to arrive to any numbers. The best thing I can generate is 0. I have not yet processed the fact that I could get to 0."
In data:
can_generate = set()
todo = [0]
best = 0
K = 135000
What we will do is, while anything is in todo, take off a value, see if it is new to us. If it is, we might update best, and possibly add new values to todo. Like this:
while len(todo):
value = todo.pop()
if value not in can_generate:
can_generate.add(value)
if abs(K-value) < abs(K-best):
best = value
if value < K:
for term in [2500, 3200, 3300]:
todo.append(value + term)
Now that we know the values in can_generate, we search backwards to find how to get there.
answer = []
while 0 < best:
for term in [3300, 3200, 2500]:
if best - term in can_generate:
answer.append(term)
best -= term
break

Max contiguous subarray sum

So, i just had an online programming assessment where i was given 2 problems one of which was this contiguous subarray sum provided 2 Complex coding questions + 8 mcqs and was to be completed in 1 hr.
Here i would be discussing one of the above mentioned max contiguous sum of subarray. Usually the tough part i find was handling negative numbers and contiguously. What i did was i first applied a Collection.sort(arr) to the given array and i again sorted the negative values by their absolute value like for i.. arr.get(i)! =abs(arr.get(i)) for j.. if arr.get(i)>arr.get(j) then swap so final array is -1, -2, 3,4,5 for example for given random numbers array and i mantain a max after each i and all j iterations per that I have if max<sum(i.e. sum+arr.get(allj)+arr(particular i) then max=sum. So this was giving me the max sum but I got 4 cases passed out of 14 and i thought the reason being sorted array wouldnt be always contiguous so any suggestions so as to how would i inculcate such contiguous logic within this to make it work for all cases.
I think you mistook contiguous subarray problem to a subset problem instead cause you shouldn't be using sorting in the logic. You could refer the question here which handles negative numbers as well. https://www.geeksforgeeks.org/largest-sum-contiguous-subarray/
There is an excellent explanation of the Maximum Subarray Problem on this page of Wikipedia. Essentially, you are looking for an implementation of Kadane's algorithm, which is explained in the article.

Find the only unique element in an array of a million elements

I was asked this question in a recent interview.
You are given an array that has a million elements. All the elements are duplicates except one. My task is to find the unique element.
var arr = [3, 4, 3, 2, 2, 6, 7, 2, 3........]
My approach was to go through the entire array in a for loop, and then create a map with index as the number in the array and the value as the frequency of the number occurring in the array. Then loop through our map again and return the index that has value of 1.
I said my approach would take O(n) time complexity. The interviewer told me to optimize it in less than O(n) complexity. I said that we cannot, as we have to go through the entire array with a million elements.
Finally, he didn't seem satisfied and moved onto the next question.
I understand going through million elements in the array is expensive, but how could we find a unique element without doing a linear scan of the entire array?
PS: the array is not sorted.
I'm certain that you can't solve this problem without going through the whole array, at least if you don't have any additional information (like the elements being sorted and restricted to certain values), so the problem has a minimum time complexity of O(n). You can, however, reduce the memory complexity to O(1) with a XOR-based solution, if every element is in the array an even number of times, which seems to be the most common variant of the problem, if that's of any interest to you:
int unique(int[] array)
{
int unpaired = array[0];
for(int i = 1; i < array.length; i++)
unpaired = unpaired ^ array[i];
return unpaired;
}
Basically, every XORed element cancels out with the other one, so your result is the only element that didn't cancel out.
Assuming the array is un-ordered, you can't. Every value is mutually exclusive to the next so nothing can be deduced about a value from any of the other values?
If it's an ordered array of values, then that's another matter and depends entirely on the ordering used.
I agree the easiest way is to have another container and store the frequency of the values.
In fact, since the number of elements in the array was fix, you could do much better than what you have proposed.
By "creating a map with index as the number in the array and the value as the frequency of the number occurring in the array", you create a map with 2^32 positions (assuming the array had 32-bit integers), and then you have to pass though that map to find the first position whose value is one. It means that you are using a large auxiliary space and in the worst case you are doing about 10^6+2^32 operations (one million to create the map and 2^32 to find the element).
Instead of doing so, you could sort the array with some n*log(n) algorithm and then search for the element in the sorted array, because in your case, n = 10^6.
For instance, using the merge sort, you would use a much smaller auxiliary space (just an array of 10^6 integers) and would do about (10^6)*log(10^6)+10^6 operations to sort and then find the element, which is approximately 21*10^6 (many many times smaller than 10^6+2^32).
PS: sorting the array decreases the search from a quadratic to a linear cost, because with a sorted array we just have to access the adjacent positions to check if a current position is unique or not.
Your approach seems fine. It could be that he was looking for an edge-case where the array is of even size, meaning there is either no unmatched elements or there are two or more. He just went about asking it the wrong way.

how to find the maximum subsequence sum in a circular linked list

I am aware of the maximum subarray sum problem and its O(n) algorithm. This questions modifies that problem by using a circular linked list:
Find the sequence of numbers in a circular linked list with maximum sum.
Now what if sum of all entries is zero?
To me, the only approach is to modify the array solution and have the algorithm loop around and start over at the beginning at the list once the first iteration is done. Then do the same thing for up to 2 times the entire list and find the max. The down side is that there might be many very tricky to handle if I do it this way, for example, if the list looks like:
2 - 2 - 2 - 2 back to front
Then it's very tricky to not include the same element twice
Is there a better algorithm?
Thanks!!
First of all, it doesn't matter if the datastructure is a linked list or an array, so I will use array for simplicity.
I don't really understand your algorithm, but it seems that you are going to do something like duplicate the array at the back of the original, and run the Kadane's algorithm on this doubled-array. This is a wrong approach, and a counter example has been given by #RanaldLam.
To solve it, we need to discuss it in three cases:
All negative. In this case, the maximum of the array is the answer, and an O(N) scan will do the job;
The maximum sub-array does not require a wrapping, for example a = {-1, 1, 2, -3}. In this case, a normal Kadane's algorithm will do the job, time complexity O(N);
The maximum sub-array requires a wrapping, for example a = {1, -10, 1}. Actually, this case implies another fact: since elements inside the maximum sub-array requires a wrapping, then the elements that are not inside the maximum sub-array does not require a wrapping. Therefore, as long as we know the sum of these non-contributing elements, we can calculate the correct sum of contributing elements by subtracting max_non_contributing_sum from total sum of the array.
But how to calculate max_non_contributing_sum in case 3? This is a bit tricky: since these non-contributing elements do not require wrapping, so we can simply invert the sign of every elements and run Kadane's algorithm on this inverted array, which requires O(N).
Finally, we should compare the sum of non-wrapping (case 2) and wrapping (case 3), the answer should be the bigger one.
As a summary, all cases require O(N), thus the total complexity of the algorithm is O(N).
Your absolutly right. There is no better algorithm.

A[j] = 2∗A[i] in list with better than O(n^2) runtime

below I've listed a problem I'm having some trouble with. This problem is a simple nested loop away from an O(n^2) solution, but I need it to be O(n). Any ideas how this should be tackled? Would it be possible to form two equations?
Given an integer array A, check if there are two indices i and j such that A[j] = 2∗A[i]. For example, on the array (25, 13, 16, 7, 8) the algorithm should output “true” (since 16 = 2 * 8), whereas on the array (25, 17, 44, 24) the algorithm should output “false”. Describe an algorithm for this problem with worst-case running time that is better than O(n^2), where n is the length of A.
Thanks!
This is a great spot to use a hash table. Create a hash table and enter each number in the array into the hash table. Then, iterate across the array one more time and check whether 2*A[i] exists in the hash table for each i. If so, then you know a pair of indices exists with this property. If not, you know no such pair exists.
On expectation, this takes time O(n), since n operations on a hash table take expected amortized O(1) time.
Hope this helps!
templatetypedef's suggestion to use a hash table is a good one. I want to explain a little more about why.
The key here is realizing that you are essentially searching for some value in a set. You have a set of numbers you are searching in (2 * each value in the input array), and a set of numbers you are searching for (each value in the input array). Your brute-force naive case is just looking up values directly in the search-in array. What you want to do is pre-load your "search-in" set into something with faster lookups than an array (like a hash table), then you can search from there.
You can also further prune your results by not searching for A[i] where A[i] is odd; because you know that A[i] = 2 * A[j] can never be true if A[i] is odd. You can also compute the minimum and maximum values in the "search-in" array on the fly during initialization and prune all A[i] outside that range.
The performance there is hard to express in big O form since it depends on the nature of the data, but you can calculate a best- and worst- case and an amortized case as well.
However, a proper choice of hash table size (if your value range is small you can simply choose a capacity that is larger than your value range, where the hash function is the value itself) may actually make pruning more costly than not in some cases, you'd have to profile it to find out.

Categories