Find k-th smallest number of a subsequence in a circular array - java

Hi I am trying to solve this problem from IEEEXtreme 2014:
You are given N integers that are arranged circularly. There are N ways to pick consecutive subsequences of length M (M < N). For any such subsequence we can find the “K”-value of that subsequence. “K”-value for a given subsequence is the K-th smallest number in that subsequence. Given the array of N, find the smallest K-value of all possible subsequences. For example N=5 M=3 K=2 and the array 1 5 3 4 2 give the result 2.
My approach is first I create a sorted array list which inserts the new input in the correct position. I add the first M integers into the list. Record the K-th smallest value. Then I keep removing the oldest integer and adding the next integer into the list and comparing the new K-th value with the old one. This is my sorted array list.
class SortedArrayList extends ArrayList {
public void insertSorted(int value) {
for (int i = size()-1; i >= 0; i--){
if( value - (Integer)get(i)>=0){
add(i+1,new Integer(value));
return;
}
}
add(0,new Integer(value));
}
}
I think this brute-force method is not efficient but not able to come up with any ideas yet. Do you know any better solutions for this ? Thanks.

Here is a more efficient solution:
Let's get rid of circularity to keep things simpler. We can do it by appending the given array to itself.
We can assume that all numbers in the input are unique. If it is not the case, we may use a pair (element, position) instead of each element.
Let's sort the given array. Now we will use the binary search over the answer(that is, the position of the k-th smallest element among all subarrays in the sorted global array).
How to check that a fixed candidate x is at least as large as the k-th smallest number? Let's mark all positions of the numbers less than or equal to x with 1 and the rest with 0. Now we just need to check if there is a subarray of length M that contains at least k ones. We can do it in linear time using rolling sums.
The time complexity is: O(N log N) for sorting the input + O(N log N) for binary search over the answer(there are O(log N) checks and each of them is done in linear time as described in 4.). Thus, the total time complexity is O(N log N).
P.S. I can think of several other solutions with the same time complexity, but this one seems to be the simplest one to implement(it does not require any custom data structures).

More elegant solution for the problem with the circular array would be to simply use modulo. So, if you're just looking for a solution for simulating a circular array, i would suggest something like this:
int n = somevalue;//the startingpoint of the subsequence
int m = someothervalue;//the index in the subsequence
int absolute_index = (n + m) % N;
where N is the total number of elements in the sequence.
Next step towards more efficiency would be to store the index of the k-th value. This way, you'd only have to calculate a new K-Value every M-th step (worst case) and simply compare it to one new value per every other step.
But i'll leave that to you ;)

Related

Best way to retrieve K largest elements from large unsorted arrays?

I recently had a coding test during an interview. I was told:
There is a large unsorted array of one million ints. User wants to retrieve K largest elements. What algorithm would you implement?
During this, I was strongly hinted that I needed to sort the array.
So, I suggested to use built-in sort() or maybe a custom implementation if performance really mattered. I was then told that using a Collection or array to store the k largest and for-loop it is possible to achieve approximately O(N), in hindsight, I think it's O(N*k) because each iteration needs to compare to the K sized array to find the smallest element to replace, while the need to sort the array would cause the code to be at least O(N log N).
I then reviewed this link on SO that suggests priority queue of K numbers, removing the smallest number every time a larger element is found, which would also give O(N log N). Write a program to find 100 largest numbers out of an array of 1 billion numbers
Is the for-loop method bad? How should I justify pros/cons of using the for-loop or the priorityqueue/sorting methods? I'm thinking that if the array is already sorted, it could help by not needing to iterate through the whole array again, i.e. if some other method of retrieval is called on the sorted array, it should be constant time. Is there some performance factor when running the actual code that I didn't consider when theorizing pseudocode?
Another way of solving this is using Quickselect. This should give you a total average time complexity of O(n). Consider this:
Find the kth largest number x using Quickselect (O(n))
Iterate through the array again (or just through the right-side partition) (O(n)) and save all elements ≥ x
Return your saved elements
(If there are repeated elements, you can avoid them by keeping count of how many duplicates of x you need to add to the result.)
The difference between your problem and the one in the SO question you linked to is that you have only one million elements, so they can definitely be kept in memory to allow normal use of Quickselect.
There is a large unsorted array of one million ints. The user wants to retrieve the K largest elements.
During this, I was strongly hinted that I needed to sort the array.
So, I suggested using a built-in sort() or maybe a custom
implementation
That wasn't really a hint I guess, but rather a sort of trick to deceive you (to test how strong your knowledge is).
If you choose to approach the problem by sorting the whole source array using the built-in Dual-Pivot Quicksort, you can't obtain time complexity better than O(n log n).
Instead, we can maintain a PriorytyQueue which would store the result. And while iterating over the source array for each element we need to check whether the queue has reached the size K, if not the element should be added to the queue, otherwise (is size equals to K) we need to compare the next element against the lowest element in the queue - if the next element is smaller or equal we should ignore it if it is greater the lowest element has to be removed and the new element needs to be added.
The time complexity of this approach would be O(n log k) because adding a new element into the PriorytyQueue of size k costs O(k) and in the worst-case scenario this operation can be performed n times (because we're iterating over the array of size n).
Note that the best case time complexity would be Ω(n), i.e. linear.
So the difference between sorting and using a PriorytyQueue in terms of Big O boils down to the difference between O(n log n) and O(n log k). When k is much smaller than n this approach would give a significant performance gain.
Here's an implementation:
public static int[] getHighestK(int[] arr, int k) {
Queue<Integer> queue = new PriorityQueue<>();
for (int next: arr) {
if (queue.size() == k && queue.peek() < next) queue.remove();
if (queue.size() < k) queue.add(next);
}
return toIntArray(queue);
}
public static int[] toIntArray(Collection<Integer> source) {
return source.stream().mapToInt(Integer::intValue).toArray();
}
main()
public static void main(String[] args) {
System.out.println(Arrays.toString(getHighestK(new int[]{3, -1, 3, 12, 7, 8, -5, 9, 27}, 3)));
}
Output:
[9, 12, 27]
Sorting in O(n)
We can achieve worst case time complexity of O(n) when there are some constraints regarding the contents of the given array. Let's say it contains only numbers in the range [-1000,1000] (sure, you haven't been told that, but it's always good to clarify the problem requirements during the interview).
In this case, we can use Counting sort which has linear time complexity. Or better, just build a histogram (first step of Counting Sort) and look at the highest-valued buckets until you've seen K counts. (i.e. don't actually expand back to a fully sorted array, just expand counts back into the top K sorted elements.) Creating a histogram is only efficient if the array of counts (possible input values) is smaller than the size of the input array.
Another possibility is when the given array is partially sorted, consisting of several sorted chunks. In this case, we can use Timsort which is good at finding sorted runs. It will deal with them in a linear time.
And Timsort is already implemented in Java, it's used to sort objects (not primitives). So we can take advantage of the well-optimized and thoroughly tested implementation instead of writing our own, which is great. But since we are given an array of primitives, using built-in Timsort would have an additional cost - we need to copy the contents of the array into a list (or array) of wrapper type.
This is a classic problem that can be solved with so-called heapselect, a simple variation on heapsort. It also can be solved with quickselect, but like quicksort has poor quadratic worst-case time complexity.
Simply keep a priority queue, implemented as binary heap, of size k of the k smallest values. Walk through the array, and insert values into the heap (worst case O(log k)). When the priority queue is too large, delete the minimum value at the root (worst case O(log k)). After going through the n array elements, you have removed the n-k smallest elements, so the k largest elements remain. It's easy to see the worst-case time complexity is O(n log k), which is faster than O(n log n) at the cost of only O(k) space for the heap.
Here is one idea. I will think for creating array (int) with max size (2147483647) as it is max value of int (2147483647). Then for every number in for-each that I get from the original array just put the same index (as the number) +1 inside the empty array that I created.
So in the end of this for each I will have something like [1,0,2,0,3] (array that I created) which represent numbers [0, 2, 2, 4, 4, 4] (initial array).
So to find the K biggest elements you can make backward for over the created array and count back from K to 0 every time when you have different element then 0. If you have for example 2 you have to count this number 2 times.
The limitation of this approach is that it works only with integers because of the nature of the array...
Also the representation of int in java is -2147483648 to 2147483647 which mean that in the array that need to be created only the positive numbers can be placed.
NOTE: if you know that there is max number of the int then you can lower the created array size with that max number. For example if the max int is 1000 then your array which you need to create is with size 1000 and then this algorithm should perform very fast.
I think you misunderstood what you needed to sort.
You need to keep the K-sized list sorted, you don't need to sort the original N-sized input array. That way the time complexity would be O(N * log(K)) in the worst case (assuming you need to update the K-sized list almost every time).
The requirements said that N was very large, but K is much smaller, so O(N * log(K)) is also smaller than O(N * log(N)).
You only need to update the K-sized list for each record that is larger than the K-th largest element before it. For a randomly distributed list with N much larger than K, that will be negligible, so the time complexity will be closer to O(N).
For the K-sized list, you can take a look at the implementation of Is there a PriorityQueue implementation with fixed capacity and custom comparator? , which uses a PriorityQueue with some additional logic around it.
There is an algorithm to do this in worst-case time complexity O(n*log(k)) with very benign time constants (since there is just one pass through the original array, and the inner part that contributes to the log(k) is only accessed relatively seldomly if the input data is well-behaved).
Initialize a priority queue implemented with a binary heap A of maximum size k (internally using an array for storage). In the worst case, this has O(log(k)) for inserting, deleting and searching/manipulating the minimum element (in fact, retrieving the minimum is O(1)).
Iterate through the original unsorted array, and for each value v:
If A is not yet full then
insert v into A,
else, if v>min(A) then (*)
insert v into A,
remove the lowest value from A.
(*) Note that A can return repeated values if some of the highest k values occur repeatedly in the source set. You can avoid that by a search operation to make sure that v is not yet in A. You'd also want to find a suitable data structure for that (as the priority queue has linear complexity), i.e. a secondary hash table or balanced binary search tree or something like that, both of which are available in java.util.
The java.util.PriorityQueue helpfully guarantees the time complexity of its operations:
this implementation provides O(log(n)) time for the enqueing and dequeing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size).
Note that as laid out above, we only ever remove the lowest (first) element from A, so we enjoy the O(log(k)) for that. If you want to avoid duplicates as mentioned above, then you also need to search for any new value added to it (with O(k)), which opens you up to a worst-case overall scenario of O(n*k) instead of O(n*log(k)) in case of a pre-sorted input array, where every single element v causes the inner loop to fire.

Find the Kth min element in the grid?

We are given a grid of size N * N where each element A[i][j] is calculated by this equation (i + j) ^ 2 + (j-i) * 10^5.
We need to find the Kth min element in optimized way.
Constraints :
1 <= number of test cases <= 20
1 <= N <= 5*10^4
1 <= k <= N^2
How to solve this problem in an efficient way ?
The problem that you are trying to solve (finding kth smallest element in an unordered array) is called the selection problem. There are many ways to solve this problem, and one of the best known ways is the quick select algorithm.
You can solve this problem in O(N log N) time. Note that that is sublinear in the size of the matrix, which is N*N. Of course you should never actually construct the matrix.
First, it helps to see what the values in the matrix looks like:
Note that over the possible range of matrix sizes, the smallest element is always at (i,j) = (N-1,0).
The largest element will be either (0,N-1) or (N-1,N-1), depending on the size of the matrix.
Consider the elements on the line from the smallest to largest. If you pick any one of these elements, then you can trace the contour to find the number of <= elements in O(N) time.
Furthermore, the elements on this line are always monotonically increasing from smallest to largest, so do a binary search on this line of elements, to find the largest one such that the number of <= elements is < k. At O(N) time per test, this takes O(N log N) all together.
Let's say the element you discover has value x, and the number of elements <= x is r, with r < k. Next trace the contour for the next higher element on the line, call its value y, and make list of all the values v such that x < v <= y. There will be O(N) elements in this list, and this takes only O(N) time.
Finally, just use sorting or quickselect to pick the (k-r)th element from this list. Again this takes at mostO(N log N) time.

How to efficiently sort one million elements?

I need to compare about 60.000 with a list of 935.000 elements and if they match I need to perform a calculation.
I already implemented everything needed but the process takes about 40 min. I have a unique 7-digit number in both lists. The 935.000 and the 60.000 files are unsorted. Is it more efficient to sort (which sort?) the big list before I try to find the element? Keep in mind that I have to do this calculation only once a month so I don't need to repeat the process every day.
Basically which is faster:
unsorted linear search
sort list first and then search with another algorithm
Try it out.
You've got Collections.sort() which will do the heavy lifting for you, and Collections.binarySearch() which will allow you to find the elements in the sorted list.
When you search the unsorted list, you have to look through half the elements on average before you find the one you're looking for. When you do that 60,000 times on a list of 935,000 elements, that works out to about
935,000 * 1/2 * 60,000 = 28,050,000,000 operations
If you sort the list first (using mergesort) it will take about n * log(n) operations. Then you can use binary search to find elements in log(n) lookups for each of the 60,000 elements in your shorted list. That's about
935,000 * log(935,000) + log(935,000) * 60,000 = 19,735,434 operations
It should be a lot faster if you sort the list first, then use a search algorithm that takes advantage of the sorted list.
What would work quite well is to sort both lists and then iterate over both at the same time.
Use collections.sort() to sort the lists.
You start with an index for each sorted list and just basically walk straight through it. You start with the first element on the short list and compare it to the first elements of the long list. If you reach an element on the long list with an higher 7 digit number than the current number in the short list, increment your index of the short list. This way there is no need to check elements twice.
But actually, since you want to find the intersection of two lists, you might be better off just using longList.retainAll(shortList) to just get the intersection of the two lists. Then you can perform whatever you want on both of the lists in about O(1) since there is no need to actually find anything.
You can sort both lists and compare them element by element incrementing first or second index (i and j in the example below) as needed:
List<Comparable> first = ....
List<Comparable> second = ...
Collections.sort(first);
Collections.sort(second);
int i = 0;
int j = 0;
while (i < first.size() && j < second.size()) {
if (first.get(i).compareTo(second.get(j)) == 0) {
// Action for equals
}
if (first.get(i).compareTo(second.get(j)) > 0) {
j++;
} else {
i++;
}
}
The complexity of this code is O(n log(n)) where n is the biggest list size.

algorithm for finding longest sequence of the same element in 1D array-looking for better solution

I need to find algorithm which will find the longest seqeunce of element in one
dimension array.
For example:
int[] myArr={1,1,1,3,4,5,5,5,5,5,3,3,4,3,3}
solution will be 5 because sequnece of 5 is the longest.
This is my solution of the problem:
static int findSequence(int [] arr, int arrLength){
int frequency=1;
int bestNumber=arr[0];
int bestFrequency=0;
for(int n=1;n<arrLength;n++){
if(arr[n]!=arr[n-1]){
if(frequency>bestFrequency){
bestNumber=arr[n-1];
bestFrequency=frequency;
}
frequency=1;
}else {
frequency++;
}
}
if( frequency>bestFrequency){
bestNumber=arr[arrLength-1];
bestFrequency=frequency;
}
return bestNumber;
}
but I'm not satisfied.May be some one know more effective solution?
You can skip the some number in the array in the following pattern:
Maintain a integer jump_count to maintain the number of elements to skip (which will be bestFrequency/2). The divisor 2 can be changed according to the data set. Update the jump_count every time you update the bestFrequency.
Now, after every jump
If previous element is not equal to current element and frequency <= jump_count, then scan backwards from current element to find number of duplicates and update the frequency.
e.g. 2 2 2 2 3 3 and frequency = 0 (bold are previous and current elements), then scan backwards to find number of 3's and update the frequency = 2
If previous element is not equal to current element and frequency > jump_count, scan for scan for every element to update the frequency and update the bestFrequency if needed.
e.g. 2 2 2 2 2 3 3 and frequency = 1 (bold are previous and current elements), scan for number of 2's in this jump and update the frequency = 1 + 4. Now, frequency < bestFrequency, scan backwards to find number of 3's and update the frequency = 2.
If previous element = current element, scan the jump to make sure it is continuous sequence. If yes, update the frequency to frequency + jump_count, else consider this as the same case as step 2.
Here, we will consider two examples:
a) 2 2 2 2 2 2 (bold are previous and current elements), check if the jump contains all 2's. Yes in this case, so add the jump_count to frequency.
b) 2 2 2 2 3 2 (bold are previous and current elements), check if the jump contains all the 2's. No in this case, so considering this as in step 2. So, scan for number of 2's in this jump and update the frequency = 1 + 4. Now, frequency < bestFrequency, scan backwards to find number of 2's(from the current element) and update the frequency = 1.
Optimization: You can save some loops in many cases.
P.S. Since this is my first answer, I hope I am able to convey myself.
Try this:
public static void longestSequence(int[] a) {
int count = 1, max = 1;
for (int i = 1; i < a.length; i++) {
if (a[i] == a[i - 1]) {
count++;
} else {
if (count > max) {
max = count;
}
count = 1;
}
}
if (count> max)
System.out.println(count);
else
System.out.println(max);
}
Your algorithm is pretty good.
It touches each array element (except the last) only once. This puts it at O(n) runtime which for this problem seems like the best worst case runtime you can get and is a pretty good worst case runtime as far as algorithms go.
One possible suggestion is when you find a new bestFrequency and n+bestFrequency > arrayLength you can break out of the loops. This is because you know a longer sequence cannot be found.
The only optimization that seems possible is:
for(int n=1;n<arrLength && frequency + (arrLength - n) >= bestFrequency;n++){
because you don't need to search any further one you can't possible exceed the best frequency with the number of elements remaining (probably possible to simplify that even further given a little more thought).
But as others point out, you're doing a O(n) search on n elements for a sequence - there's really no more efficient algorithm available.
I was thinking this must be an O(n) problem, but now I'm wondering if it doesn't have to be, that you could potentially make it O(log n) using a binary search (I don't think what #BlackJack posted actually works quite right, but it was inspiring):
Was thinking something like keep track of first, last element (in a block, probably a recursive algorithm). Do a binary split (so middle element to start). If it matches either first or last, you possibly have a run of at least that length. Check if the total length could exceed the current known max run. If so, continue, if not break.
Then repeat the process - do a binary split of one of those halves to see if the middle item matches. Repeat this process, recursing up and down to get the maximum length of a single run within a branch. Stop searching a branch when it can't possibly exceed the maximum run length.
I think this still comes out to be an O(n) algorithm because the worth-case is still searching every single element (consider a max length of 1 or 2). But you limit to checking each item once, and you search into the most-likely longest branches first (based on start/middle/end matches), it could potentially skip some fairly long runs. A breadth-first rather than depth-first search would also help.

Find nearest number in unordered array

Given a large unordered array of long random numbers and a target long, what's the most efficient algorithm for finding the closest number?
#Test
public void findNearest() throws Exception {
final long[] numbers = {90L, 10L, 30L, 50L, 70L};
Assert.assertEquals("nearest", 10L, findNearest(numbers, 12L));
}
Iterate through the array of longs once. Store the current closest number and the distance to that number. Continue checking each number if it is closer, and just replace the current closest number when you encounter a closer number.
This gets you best performance of O(n).
Building a binary tree as suggested by other answerer will take O(nlogn). Of course future search will only take O(logn)...so it may be worth it if you do a lot of searches.
If you are pro, you can parallelize this with openmp or thread library, but I am guessing that is out of the scope of your question.
If you do not intend to do multiple such requests on the array there is no better way then the brute force linear time check of each number.
If you will do multiple requests on the same array first sort it and then do a binary search on it - this will reduce the time for such requests to O(log(n)) but you still pay the O(n*log(n)) for the sort so this is only reasonable if the number of requests is reasonably large i.e. k*n >>(a lot bigger then) n*log(n) + k* log(n) where k is the number of requests.
If the array will change, then create a binary search tree and do a lower bound request on it. This again is only reasonable if the nearest number request is relatively large with comparison to array change requests and also to the number of elements. As the cost of building the tree is O(n*log(n)) and also the cost of updating it is O(logn) you need to have k*log(n) + n*log(n) + k*log(n) <<(a lot smaller then) k*n
IMHO, I think that you should use a Binary Heap (http://en.wikipedia.org/wiki/Binary_heap) which has the insertion time of O(log n), being O(n log n) for the entire array. For me, the coolest thing about the binary heap is that it can be made inside from your own array, without overhead. Take a look the heapfy section.
"Heapfying" your array turns possible to get the bigger/lower element in O(1).
if you build a binary search tree from your numbers and search against. O(log n) would be the complexity in worst case. In your case you won't search for equality instead, you'll looking for the smallest return value through subtraction
I would check the difference between the numbers while iterating through the array and save the min value for that difference.
If you plan to use findNearest multiple times I would calculate the difference while sorting (with an sorting algorithm of complexity n*log(n)) after each change of values in that array
The time complex to do this job is O(n), the length of the numbers.
final long[] numbers = {90L, 10L, 30L, 50L, 70L};
long tofind = 12L;
long delta = Long.MAX_VALUE;
int index = -1;
int i = 0;
while(i < numbers.length){
Long tmp = Math.abs(tofind - numbers[i]);
if(tmp < delta){
delta = tmp;
index = i;
}
i++;
}
System.out.println(numbers[index]); //if index is not -1
But if you want to find many times with different values such as 12L against the same numbers array, you may sort the array first and binary search against the sorted numbers array.
If your search is a one-off, you can partition the array like in quicksort, using the input value as pivot.
If you keep track - while partitioning - of the min item in the right half, and the max item in the left half, you should have it in O(n) and 1 single pass over the array.
I'd say it's not possible to do it in less than O(n) since it's not sorted and you have to scan the input at the very least.
If you need to do many subsequent search, then a BST could help indeed.
You could do it in below steps
Step 1 : Sort array
Step 2 : Find index of the search element
Step 3 : Based on the index, display the number that are at the Right & Left Side
Let me know incase of any queries...

Categories