This question already has answers here:
How to find the kth largest element in an unsorted array of length n in O(n)?
(32 answers)
Closed 7 years ago.
I had an interview with Facebook and they asked me this question.
Suppose you have an unordered array with N distinct values
$input = [3,6,2,8,9,4,5]
Implement a function that finds the Kth largest value.
EG: If K = 0, return 9. If K = 1, return 8.
What I did was this method.
private static int getMax(Integer[] input, int k)
{
List<Integer> list = Arrays.asList(input);
Set<Integer> set = new TreeSet<Integer>(list);
list = new ArrayList<Integer>(set);
int value = (list.size() - 1) - k;
return list.get(value);
}
I just tested and the method works fine based on the question. However, interviewee said, in order to make your life complex! lets assume that your array contains millions of numbers then your listing becomes too slow. What you do in this case?
As hint, he suggested to use min heap. Based on my knowledge each child value of heap should not be more than root value. So, in this case if we assume that 3 is root then 6 is its child and its value is grater than root's value. I'm probably wrong but what you think and what is its implementation based on min heap?
He has actually given you the whole answer. Not just a hint.
And your understanding is based on max heap. Not min heap. And it's workings are self-explanatory.
In a min heap, the root has the minimum (less than it's children) value.
So, what you need is, iterate over the array and populate K elements in min heap.
Once, it's done, the heap automatically contains the lowest at the root.
Now, for each (next) element you read from the array,
-> check if the value is greater than root of min heap.
-> If yes, remove root from min heap, and add the value to it.
After you traverse your whole array, the root of min heap will automtically contain the kth largest element.
And all other elements (k-1 elements to be precise) in the heap will be larger than k.
Here is the implementation of the Min Heap using PriorityQueue in java. Complexity: n * log k.
import java.util.PriorityQueue;
public class LargestK {
private static Integer largestK(Integer array[], int k) {
PriorityQueue<Integer> queue = new PriorityQueue<Integer>(k+1);
int i = 0;
while (i<=k) {
queue.add(array[i]);
i++;
}
for (; i<array.length; i++) {
Integer value = queue.peek();
if (array[i] > value) {
queue.poll();
queue.add(array[i]);
}
}
return queue.peek();
}
public static void main(String[] args) {
Integer array[] = new Integer[] {3,6,2,8,9,4,5};
System.out.println(largestK(array, 3));
}
}
Output: 5
The code loop over the array which is O(n). Size of the PriorityQueue (Min Heap) is k, so any operation would be log k. In the worst case scenario, in which all the number are sorted ASC, complexity is n*log k, because for each element you need to remove top of the heap and insert new element.
Edit: Check this answer for O(n) solution.
You can probably make use of PriorityQueue as well to solve this problem:
public int findKthLargest(int[] nums, int k) {
int p = 0;
int numElements = nums.length;
// create priority queue where all the elements of nums will be stored
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
// place all the elements of the array to this priority queue
for (int n : nums){
pq.add(n);
}
// extract the kth largest element
while (numElements-k+1 > 0){
p = pq.poll();
k++;
}
return p;
}
From the Java doc:
Implementation note: this implementation provides O(log(n)) time for
the enqueing and dequeing methods (offer, poll, remove() and
add); linear time for the remove(Object) and contains(Object)
methods; and constant time for the retrieval methods (peek,
element, and size).
The for loop runs n times and the complexity of the above algorithm is O(nlogn).
Heap based solution is perfect if the number of elements in array/stream is unknown. But, what if they are finite but still you want an optimized solution in linear time.
We can use Quick Select, discussed here.
Array = [3,6,2,8,9,4,5]
Let's chose the pivot as first element:
pivot = 3 (at 0th index),
Now partition the array in such a way that all elements less than or equal are on left side and numbers greater than 3 on right side. Like it's done in Quick Sort (discussed on my blog).
So after first pass - [2,3,6,8,9,4,5]
pivot index is 1 (i.e it's the second lowest element). Now apply the same process again.
chose, 6 now, the value at index after previous pivot - [2,3,4,5,6,8,9]
So now 6 is at the proper place.
Keep checking if you have found the appropriate number (kth largest or kth lowest in each iteration). If it's found you are done else continue.
One approach for constant values of k is to use a partial insertion sort.
(This assumes distinct values, but can easily be altered to work with duplicates as well)
last_min = -inf
output = []
for i in (0..k)
min = +inf
for value in input_array
if value < min and value > last_min
min = value
output[i] = min
print output[k-1]
(That's pseudo code, but should be easy enough to implement in Java).
The overall complexity is O(n*k), which means it works pretty well if and only if k is constant or known to be less that log(n).
On the plus side, it is a really simple solution. On the minus side, it is not as efficient as the heap solution
Related
I've been given an exercise in class that requires the following:
An array v formed by N integers is circularly ordered if, either the array is ordered, or else v[N‐1] ≤ v[0] and ∃k with 0<k<N such as ∀i≠k v[i] ≤ v[i+1].
Example:
Given a circularly ordered array with as much as 10 positive items, calculate the sum of the positive values. For this last example the answer would be 27.
I've been required to implement it using a Divide-and-Conquer scheme in java, given that the complexity is in the worst case O(Log N), being N the array size.
So far I tried to pivot a value until I find a positive value, then knowing the other positive values are adjacent, it's possible to sum the maximum of 10 positive values with a O(1) complexity.
I thought of doing a binary search to achieve O(Log N) complexity, but this would not follow the divide and conquer pattern.
I'm easily able to implement it through a O(N) complexity like this:
public static int addPositives(int[] vector){
return addPositives(vector,0,vector.length-1
}
public static int addPositives(int[] vector, int i0, int iN){
int k = (i0+iN)/2;
if (iN-i0 > 1){
return addPositives(vector,i0,k) + addPositives(vector,k+1,iN);
}else{
int temp = 0;
for (int i = i0; i <= iN; i++) {
if (vector[i]>0) temp+=vector[i];
}
return temp;
}
}
However trying to land the O(Log N) gets me nowhere, how could I achieve it?
You can improve your divide and conquer implementation to meet the required running time if you prune irrelevant branches of the recursion.
After you divide the current array into two sub-arrays, compare the first and last elements of each sub-array. If both are negative and the first is smaller than the last, you know for sure that all the elements in this sub-array are negative and you don't have to make the recursive call on it (since you know it will contribute 0 to the total sum).
You can also stop the recursion if all the elements in a sub-array are positive (which can also be verified by comparing the first and last elements of the sub-array) - in that case you have to sum all the elements of that sub-array, so there's no point to continue the recursion.
My advice for the O(Log N) would be a direct comparison to meet the second of the two criteria: the last item being less than the first.
return vector[0] >= vector[iN-1]
If you want something with greater complexity, I forget the algorithm name, but you could get the array at the halfway point, and do two ordered searches from there: from the mid to the start and then the mid to the end
What is the fastest way to find the k largest elements in an array in order (i.e. starting from the largest element to the kth largest element)?
One option would be the following:
Using a linear-time selection algorithm like median-of-medians or introsort, find the kth largest element and rearrange the elements so that all elements from the kth element forward are greater than the kth element.
Sort all elements from the kth forward using a fast sorting algorithm like heapsort or quicksort.
Step (1) takes time O(n), and step (2) takes time O(k log k). Overall, the algorithm runs in time O(n + k log k), which is very, very fast.
Hope this helps!
C++ also provides the partial_sort algorithm, which solves the problem of selecting the smallest k elements (sorted), with a time complexity of O(n log k). No algorithm is provided for selecting the greatest k elements since this should be done by inverting the ordering predicate.
For Perl, the module Sort::Key::Top, available from CPAN, provides a set of functions to select the top n elements from a list using several orderings and custom key extraction procedures. Furthermore, the Statistics::CaseResampling module provides a function to calculate quantiles using quickselect.
Python's standard library (since 2.4) includes heapq.nsmallest() and nlargest(), returning sorted lists, the former in O(n + k log n) time, the latter in O(n log k) time.
Radix sort solution:
Sort the array in descending order, using radix sort;
Print first K elements.
Time complexity: O(N*L), where L = length of the largest element, can assume L = O(1).
Space used: O(N) for radix sort.
However, I think radix sort has costly overhead, making its linear time complexity less attractive.
1) Build a Max Heap tree in O(n)
2) Use Extract Max k times to get k maximum elements from the Max Heap O(klogn)
Time complexity: O(n + klogn)
A C++ implementation using STL is given below:
#include <iostream>
#include<bits/stdc++.h>
using namespace std;
int main() {
int arr[] = {4,3,7,12,23,1,8,5,9,2};
//Lets extract 3 maximum elements
int k = 3;
//First convert the array to a vector to use STL
vector<int> vec;
for(int i=0;i<10;i++){
vec.push_back(arr[i]);
}
//Build heap in O(n)
make_heap(vec.begin(), vec.end());
//Extract max k times
for(int i=0;i<k;i++){
cout<<vec.front()<<" ";
pop_heap(vec.begin(),vec.end());
vec.pop_back();
}
return 0;
}
#templatetypedef's solution is probably the fastest one, assuming you can modify or copy input.
Alternatively, you can use heap or BST (set in C++) to store k largest elements at given moment, then read array's elements one by one. While this is O(n lg k), it doesn't modify input and only uses O(k) additional memory. It also works on streams (when you don't know all the data from the beginning).
Here's a solution with O(N + k lg k) complexity.
int[] kLargest_Dremio(int[] A, int k) {
int[] result = new int[k];
shouldGetIndex = true;
int q = AreIndicesValid(0, A.Length - 1) ? RandomizedSelet(0, A.Length-1,
A.Length-k+1) : -1;
Array.Copy(A, q, result, 0, k);
Array.Sort(result, (a, b) => { return a>b; });
return result;
}
AreIndicesValid and RandomizedSelet are defined in this github source file.
There was a question on performance & restricted resources.
Make a value class for the top 3 values. Use such an accumulator for reduction in a parallel stream. Limit the parallelism according to the context (memory, power).
class BronzeSilverGold {
int[] values = new int[] {Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE};
// For reduction
void add(int x) {
...
}
// For combining two results of two threads.
void merge(BronzeSilverGold other) {
...
}
}
The parallelism must be restricted in your constellation, hence specify an N_THREADS in:
try {
ForkJoinPool threadPool = new ForkJoinPool(N_THREADS);
threadPool.submit(() -> {
BronzeSilverGold result = IntStream.of(...).parallel().collect(
BronzeSilverGold::new,
(bsg, n) -> BronzeSilverGold::add,
(bsg1, bsg2) -> bsg1.merge(bsg2));
...
});
} catch (InterruptedException | ExecutionException e) {
prrtl();
}
This is a common interview question.
You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).
Implement a class which supports three methods in O(1)
* insert(int i);
* getMean();
* getMedian();
This is my code.
public class FindAverage {
private int[] store;
private long size;
private long total;
private int highestIndex;
private int lowestIndex;
public FindAverage() {
store = new int[1000];
size = 0;
total = 0;
highestIndex = Integer.MIN_VALUE;
lowestIndex = Integer.MAX_VALUE;
}
public void insert(int item) throws OutOfRangeException {
if(item < 0 || item > 999){
throw new OutOfRangeException();
}
store[item] ++;
size ++;
total += item;
highestIndex = Integer.max(highestIndex, item);
lowestIndex = Integer.min(lowestIndex, item);
}
public float getMean(){
return (float)total/size;
}
public float getMedian(){
}
}
I can't seem to think of a way to get the median in O(1) time.
Any help appreciated.
You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.
You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.
Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.
Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.
In your case, that bound is 1000.
The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.
To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.
EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.
For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.
Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).
sort(A):
ds = new D()
for each x in A:
ds.add(x)
m1 = min(A) - 1
m2 = max(A) + 1
for (i=0; i < A.size(); i++):
ds.add(m1)
# at this point, ds.median() is smallest element in A
for (i = 0; i < A.size(); i++):
yield ds.median()
# Each two insertions advances median by 1
ds.add(m2)
ds.add(m2)
Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.
Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.
Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.
QED.
Good evening, I have an array in java with n integer numbers. I want to check if there is a subset of size k of the entries that satisfies the condition:
The sum of those k entries is a multiple of m.
How may I do this as efficiently as possible? There are n!/k!(n-k)! subsets that I need to check.
You can use dynamic programming. The state is (prefix length, sum modulo m, number of elements in a subset). Transitions are obvious: we either add one more number(increasing the number of elements in a subset and computing new sum modulo m), or we just increase prefix lenght(not adding the current number). If you just need a yes/no answer, you can store only the last layer of values and apply bit optimizations to compute transitions faster. The time complexity is O(n * m * k), or about n * m * k / 64 operations with bit optimizations. The space complexity is O(m * k). It looks feasible for a few thousands of elements. By bit optimizations I mean using things like bitset in C++ that can perform an operation on a group of bits at the same time using bitwise operations.
I don't like this solution, but it may work for your needs
public boolean containsSubset( int[] a , int currentIndex, int currentSum, int depth, int divsor, int maxDepth){
//you could make a, maxDepth, and divisor static as well
//If maxDepthis equal to depth, then our subset has k elements, in addition the sum of
//elements must be divisible by out divsor, m
//If this condition is satisafied, then there exists a subset of size k whose sum is divisible by m
if(depth==maxDepth&¤tSum%divsor==0)
return true;
//If the depth is greater than or equal maxDepth, our subset has more than k elements, thus
//adding more elements can not satisfy the necessary conditions
//additionally we know that if it contains k elements and is divisible by m, it would've satisafied the above condition.
if(depth>=maxdepth)
return false;
//boolean to be returned, initialized to false because we have not found any sets yet
boolean ret = false;
//iterate through all remaining elements of our array
for (int i = currentIndex+1; i < a.length; i++){
//this may be an optimization or this line
//for (int i = currentIndex+1; i < a.length-maxDepth+depth; i++){
//by recursing, we add a[i] to our set we then use an or operation on all our subsets that could
//be constructed from the numbers we have so far so that if any of them satisfy our condition (return true)
//then the value of the variable ret will be true
ret |= containsSubset(a,i,currentSum+a[i],depth+1,divisor, maxDepth);
} //end for
//return the variable storing whether any sets of numbers that could be constructed from the numbers so far.
return ret;
}
Then invoke this method as such
//this invokes our method with "no numbers added to our subset so far" so it will try adding
// all combinations of other elements to determine if the condition is satisfied.
boolean answer = containsSubset(myArray,-1,0,0,m,k);
EDIT:
You could probably optimize this by taking everything modulo (%) m and deleting repeats. For examples with large values of n and/or k, but small values of m, this could be a pretty big optimization.
EDIT 2:
The above optimization I listed isn't helpful. You may need the repeats to get the correct information. My bad.
Happy Coding! Let me know if you have any questions!
If numbers have lower and upper bounds, it might be better to:
Iterate all multiples of n where lower_bound * k < multiple < upper_bound * k
Check if there is a subset with sum multiple in the array (see Subset Sum problem) using dynamic programming.
Complexity is O(k^2 * (lower_bound + upper_bound)^2). This approach can be optimized further, I believe with careful thinking.
Otherwise you can find all subsets of size k. Complexity is O(n!). Using backtracking (pseudocode-ish):
function find_subsets(array, k, index, current_subset):
if current_subset.size = k:
add current_subset to your solutions list
return
if index = array.size:
return
number := array[index]
add number to current_subset
find_subsets(array, k, index + 1, current_subset)
remove number from current_subset
find_subsets(array, k, index + 1, current_subset)
I have an int[] array of length N containing the values 0, 1, 2, .... (N-1), i.e. it represents a permutation of integer indexes.
What's the most efficient way to determine if the permutation has odd or even parity?
(I'm particularly keen to avoid allocating objects for temporary working space if possible....)
I think you can do this in O(n) time and O(n) space by simply computing the cycle decomposition.
You can compute the cycle decomposition in O(n) by simply starting with the first element and following the path until you return to the start. This gives you the first cycle. Mark each node as visited as you follow the path.
Then repeat for the next unvisited node until all nodes are marked as visited.
The parity of a cycle of length k is (k-1)%2, so you can simply add up the parities of all the cycles you have discovered to find the parity of the overall permutation.
Saving space
One way of marking the nodes as visited would be to add N to each value in the array when it is visited. You would then be able to do a final tidying O(n) pass to turn all the numbers back to the original values.
I selected the answer by Peter de Rivaz as the correct answer as this was the algorithmic approach I ended up using.
However I used a couple of extra optimisations so I thought I would share them:
Examine the size of data first
If it is greater than 64, use a java.util.BitSet to store the visited elements
If it is less than or equal to 64, use a long with bitwise operations to store the visited elements. This makes it O(1) space for many applications that only use small permutations.
Actually return the swap count rather than the parity. This gives you the parity if you need it, but is potentially useful for other purposes, and is no more expensive to compute.
Code below:
public int swapCount() {
if (length()<=64) {
return swapCountSmall();
} else {
return swapCountLong();
}
}
private int swapCountLong() {
int n=length();
int swaps=0;
BitSet seen=new BitSet(n);
for (int i=0; i<n; i++) {
if (seen.get(i)) continue;
seen.set(i);
for(int j=data[i]; !seen.get(j); j=data[j]) {
seen.set(j);
swaps++;
}
}
return swaps;
}
private int swapCountSmall() {
int n=length();
int swaps=0;
long seen=0;
for (int i=0; i<n; i++) {
long mask=(1L<<i);
if ((seen&mask)!=0) continue;
seen|=mask;
for(int j=data[i]; (seen&(1L<<j))==0; j=data[j]) {
seen|=(1L<<j);
swaps++;
}
}
return swaps;
}
You want the parity of the number of inversions. You can do this in O(n * log n) time using merge sort, but either you lose the initial array, or you need extra memory on the order of O(n).
A simple algorithm that uses O(n) extra space and is O(n * log n):
inv = 0
mergesort A into a copy B
for i from 1 to length(A):
binary search for position j of A[i] in B
remove B[j] from B
inv = inv + (j - 1)
That said, I don't think it's possible to do it in sublinear memory. See also:
https://cs.stackexchange.com/questions/3200/counting-inversion-pairs
https://mathoverflow.net/questions/72669/finding-the-parity-of-a-permutation-in-little-space
Consider this approach...
From the permutation, get the inverse permutation, by swapping the rows and
sorting according to the top row order. This is O(nlogn)
Then, simulate performing the inverse permutation and count the swaps, for O(n). This should give the parity of the permutation, according to this
An even permutation can be obtained as the composition of an even
number and only an even number of exchanges (called transpositions) of
two elements, while an odd permutation be obtained by (only) an odd
number of transpositions.
from Wikipedia.
Here's some code I had lying around, which performs an inverse permutation, I just modified it a bit to count swaps, you can just remove all mention of a, p contains the inverse permutation.
size_t
permute_inverse (std::vector<int> &a, std::vector<size_t> &p) {
size_t cnt = 0
for (size_t i = 0; i < a.size(); ++i) {
while (i != p[i]) {
++cnt;
std::swap (a[i], a[p[i]]);
std::swap (p[i], p[p[i]]);
}
}
return cnt;
}