Finding mean and median in constant time

Finding mean and median in constant time - java

This is a common interview question.
You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).
Implement a class which supports three methods in O(1)
* insert(int i);
* getMean();
* getMedian();
This is my code.
public class FindAverage {
private int[] store;
private long size;
private long total;
private int highestIndex;
private int lowestIndex;
public FindAverage() {
store = new int[1000];
size = 0;
total = 0;
highestIndex = Integer.MIN_VALUE;
lowestIndex = Integer.MAX_VALUE;
}
public void insert(int item) throws OutOfRangeException {
if(item < 0 || item > 999){
throw new OutOfRangeException();
}
store[item] ++;
size ++;
total += item;
highestIndex = Integer.max(highestIndex, item);
lowestIndex = Integer.min(lowestIndex, item);
}
public float getMean(){
return (float)total/size;
}
public float getMedian(){
}
}
I can't seem to think of a way to get the median in O(1) time.
Any help appreciated.

You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.
You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.
Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.
Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.
In your case, that bound is 1000.

The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.
To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.
EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.

For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.
Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).
sort(A):
ds = new D()
for each x in A:
ds.add(x)
m1 = min(A) - 1
m2 = max(A) + 1
for (i=0; i < A.size(); i++):
ds.add(m1)
# at this point, ds.median() is smallest element in A
for (i = 0; i < A.size(); i++):
yield ds.median()
# Each two insertions advances median by 1
ds.add(m2)
ds.add(m2)
Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.
Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.
Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.
QED.

Related

Find two numbers in array x,y where x<y, x repeats at least n/3 times and y at least n/4 times

I have been struggling to solve an array problem with linear time,
The problem is:
Assuming we are given an array A [1...n] write an algorithm that return true if:
There are two numbers in the array x,y that have the following:
x < y
x repeats more than n/3 times
y repeats more than n/4 times
I have tried to write the following java program to do so assuming we have a sorted array but I don't think it is the best implementation.
public static boolean solutionManma(){
int [] arr = {2,2,2,3,3,3};
int n = arr.length;
int xCount = 1;
int yCount = 1;
int maxXcount= xCount,maxYCount = yCount;
int currX = arr[0];
int currY = arr[n-1];
for(int i = 1; i < n-2;i++){
int right = arr[n-2-i+1];
int left = arr[i];
if(currX == left){
xCount++;
}
else{
maxXcount = Math.max(xCount,maxXcount);
xCount = 1;
currX = left;
}
if(currY == right){
yCount++;
}
else {
maxYCount = Math.max(yCount,maxYCount);
yCount = 1;
currY = right;
}
}
return (maxXcount > n/3 && maxYCount > n/4);
}
If anyone has an algorithm idea for this kind of issue (preferably O(n)) I would much appreciate it because I got stuck with this one.

The key part of this problem is to find in linear time and constant space the values which occur more than n/4 times. (Note: the text of your question says "more than" and the title says "at least". Those are not the same condition. This answer is based on the text of your question.)
There are at most three values which occur more than n/4 times, and a list of such values must also include any value which occurs more than n/3 times.
The algorithm we'll use returns a list of up to three values. It only guarantees that all values which satisfy the condition are in the list it returns. The list might include other values, and it does not provide any information about the precise frequencies.
So a second pass is necessary, which scans the vector a second time counting the occurrences of each of the three values returned. Once you have the three counts, it's simple to check whether the smallest value which occurs more than n/3 times (if any) is less than the largest value which occurs more than n/4 times.
To construct the list of candidates, we use a generalisation of the Boyer-Moore majority vote algorithm, which finds a value which occurs more than n/2 times. The generalisation, published in 1982 by J. Misra and D. Gries, uses k-1 counters, each possibly associated with a value, to identify values which might occur more than 1/k times. In this case, k is 4 and so we need three counters.
Initially, all of the counters are 0 and are not associated with any value. Then for each value in the array, we do the following:
If there is a counter associated with that value, we increment it.
If no counter is associated with that value but some counter is at 0, we associate that counter with the value and increment its count to 1.
Otherwise, we decrement every counter's count.
Once all the values have been processed, the values associated with counters with positive counts are the candidate values.
For a general implementation where k is not known in advance, it would be possible to use a hash-table or other key-value map to identify values with counts. But in this case, since it is known that k is a small constant, we can just use a simple vector of three value-count pairs, making this algorithm O(n) time and O(1) space.

I will suggest the following solution, using the following assumption:
In an array of length n there will be at most n different numbers
The key feature will be to count the frequency of occurance for each different input using a histogram with n bins, meaning O(n) space. The algorithm will be as follows:
create a histogram vector with n bins, initialized to zeros
for index ii in the length of the input array a
2.1. Increase the value: hist[a[ii]] +=1
set found_x and found_y to False
for the iith bin in the histogram, check:
4.1. if found_x == False
4.1.1. if hist[ii] > n/3, set found_x = True and set x = ii
4.2. else if found_y == False
4.2.1. if hist[ii] > n/4, set y = ii and return x, y
Explanation
In the first run over the array you document the occurance frequency of all the numbers. In the run over the histogram array, which also has a length of n, you check the occurrence. First you check if there is a number that occurred more than n/3 times and if there is, for the rest of the numbers (by default larger than x due to the documentation in the histogram) you check if there is another number which occurred more than n/4 times. if there is, you return the found x and y and if there isn't you simply return not found after covering all the bins in the histogram.
As far as time complexity, you goover the input array once and you go over the histogram with the same length once, therefore the time complexity is O(n) is requested.

Find all the ways you can go up an n step staircase if you can take k steps at a time such that k <= n

This is a problem I'm trying to solve on my own to be a bit better at recursion(not homework). I believe I found a solution, but I'm not sure about the time complexity (I'm aware that DP would give me better results).
Find all the ways you can go up an n step staircase if you can take k steps at a time such that k <= n
For example, if my step sizes are [1,2,3] and the size of the stair case is 10, I could take 10 steps of size 1 [1,1,1,1,1,1,1,1,1,1]=10 or I could take 3 steps of size 3 and 1 step of size 1 [3,3,3,1]=10
Here is my solution:
static List<List<Integer>> problem1Ans = new ArrayList<List<Integer>>();
public static void problem1(int numSteps){
int [] steps = {1,2,3};
problem1_rec(new ArrayList<Integer>(), numSteps, steps);
}
public static void problem1_rec(List<Integer> sequence, int numSteps, int [] steps){
if(problem1_sum_seq(sequence) > numSteps){
return;
}
if(problem1_sum_seq(sequence) == numSteps){
problem1Ans.add(new ArrayList<Integer>(sequence));
return;
}
for(int stepSize : steps){
sequence.add(stepSize);
problem1_rec(sequence, numSteps, steps);
sequence.remove(sequence.size()-1);
}
}
public static int problem1_sum_seq(List<Integer> sequence){
int sum = 0;
for(int i : sequence){
sum += i;
}
return sum;
}
public static void main(String [] args){
problem1(10);
System.out.println(problem1Ans.size());
}
My guess is that this runtime is k^n where k is the numbers of step sizes, and n is the number of steps (3 and 10 in this case).
I came to this answer because each step size has a loop that calls k number of step sizes. However, the depth of this is not the same for all step sizes. For instance, the sequence [1,1,1,1,1,1,1,1,1,1] has more recursive calls than [3,3,3,1] so this makes me doubt my answer.
What is the runtime? Is k^n correct?

TL;DR: Your algorithm is O(2n), which is a tighter bound than O(kn), but because of some easily corrected inefficiencies the implementation runs in O(k2 × 2n).
In effect, your solution enumerates all of the step-sequences with sum n by successively enumerating all of the viable prefixes of those step-sequences. So the number of operations is proportional to the number of step sequences whose sum is less than or equal to n. [See Notes 1 and 2].
Now, let's consider how many possible prefix sequences there are for a given value of n. The precise computation will depend on the steps allowed in the vector of step sizes, but we can easily come up with a maximum, because any step sequence is a subset of the set of integers from 1 to n, and we know that there are precisely 2n such subsets.
Of course, not all subsets qualify. For example, if the set of step-sizes is [1, 2], then you are enumerating Fibonacci sequences, and there are O(φn) such sequences. As k increases, you will get closer and closer to O(2n). [Note 3]
Because of the inefficiencies in your coded, as noted, your algorithm is actually O(k2 αn) where α is some number between φ and 2, approaching 2 as k approaches infinity. (φ is 1.618..., or (1+sqrt(5))/2)).
There are a number of improvements that could be made to your implementation, particularly if your intent was to count rather than enumerate the step sizes. But that was not your question, as I understand it.
Notes
That's not quite exact, because you actually enumerate a few extra sequences which you then reject; the cost of these rejections is a multiplier by the size of the vector of possible step sizes. However, you could easily eliminate the rejections by terminating the for loop as soon as a rejection is noticed.
The cost of an enumeration is O(k) rather than O(1) because you compute the sum of the sequence arguments for each enumeration (often twice). That produces an additional factor of k. You could easily eliminate this cost by passing the current sum into the recursive call (which would also eliminate the multiple evaluations). It is trickier to avoid the O(k) cost of copying the sequence into the output list, but that can be done using a better (structure-sharing) data-structure.
The question in your title (as opposed to the problem solved by the code in the body of your question) does actually require enumerating all possible subsets of {1…n}, in which case the number of possible sequences would be exactly 2n.

If you want to solve this recursively, you should use a different pattern that allows caching of previous values, like the one used when calculating Fibonacci numbers. The code for Fibonacci function is basically about the same as what do you seek, it adds previous and pred-previous numbers by index and returns the output as current number. You can use the same technique in your recursive function , but add not f(k-1) and f(k-2), but gather sum of f(k-steps[i]). Something like this (I don't have a Java syntax checker, so bear with syntax errors please):
static List<Integer> cache = new ArrayList<Integer>;
static List<Integer> storedSteps=null; // if used with same value of steps, don't clear cache
public static Integer problem1(Integer numSteps, List<Integer> steps) {
if (!ArrayList::equal(steps, storedSteps)) { // check equality data wise, not link wise
storedSteps=steps; // or copy with whatever method there is
cache.clear(); // remove all data - now invalid
// TODO make cache+storedSteps a single structure
}
return problem1_rec(numSteps,steps);
}
private static Integer problem1_rec(Integer numSteps, List<Integer> steps) {
if (0>numSteps) { return 0; }
if (0==numSteps) { return 1; }
if (cache.length()>=numSteps+1) { return cache[numSteps] } // cache hit
Integer acc=0;
for (Integer i : steps) { acc+=problem1_rec(numSteps-i,steps); }
cache[numSteps]=acc; // cache miss. Make sure ArrayList supports inserting by index, otherwise use correct type
return acc;
}

Java, Finding Kth largest value from the array [duplicate]

This question already has answers here:
How to find the kth largest element in an unsorted array of length n in O(n)?
(32 answers)
Closed 7 years ago.
I had an interview with Facebook and they asked me this question.
Suppose you have an unordered array with N distinct values
$input = [3,6,2,8,9,4,5]
Implement a function that finds the Kth largest value.
EG: If K = 0, return 9. If K = 1, return 8.
What I did was this method.
private static int getMax(Integer[] input, int k)
{
List<Integer> list = Arrays.asList(input);
Set<Integer> set = new TreeSet<Integer>(list);
list = new ArrayList<Integer>(set);
int value = (list.size() - 1) - k;
return list.get(value);
}
I just tested and the method works fine based on the question. However, interviewee said, in order to make your life complex! lets assume that your array contains millions of numbers then your listing becomes too slow. What you do in this case?
As hint, he suggested to use min heap. Based on my knowledge each child value of heap should not be more than root value. So, in this case if we assume that 3 is root then 6 is its child and its value is grater than root's value. I'm probably wrong but what you think and what is its implementation based on min heap?

He has actually given you the whole answer. Not just a hint.
And your understanding is based on max heap. Not min heap. And it's workings are self-explanatory.
In a min heap, the root has the minimum (less than it's children) value.
So, what you need is, iterate over the array and populate K elements in min heap.
Once, it's done, the heap automatically contains the lowest at the root.
Now, for each (next) element you read from the array,
-> check if the value is greater than root of min heap.
-> If yes, remove root from min heap, and add the value to it.
After you traverse your whole array, the root of min heap will automtically contain the kth largest element.
And all other elements (k-1 elements to be precise) in the heap will be larger than k.

Here is the implementation of the Min Heap using PriorityQueue in java. Complexity: n * log k.
import java.util.PriorityQueue;
public class LargestK {
private static Integer largestK(Integer array[], int k) {
PriorityQueue<Integer> queue = new PriorityQueue<Integer>(k+1);
int i = 0;
while (i<=k) {
queue.add(array[i]);
i++;
}
for (; i<array.length; i++) {
Integer value = queue.peek();
if (array[i] > value) {
queue.poll();
queue.add(array[i]);
}
}
return queue.peek();
}
public static void main(String[] args) {
Integer array[] = new Integer[] {3,6,2,8,9,4,5};
System.out.println(largestK(array, 3));
}
}
Output: 5
The code loop over the array which is O(n). Size of the PriorityQueue (Min Heap) is k, so any operation would be log k. In the worst case scenario, in which all the number are sorted ASC, complexity is n*log k, because for each element you need to remove top of the heap and insert new element.

Edit: Check this answer for O(n) solution.
You can probably make use of PriorityQueue as well to solve this problem:
public int findKthLargest(int[] nums, int k) {
int p = 0;
int numElements = nums.length;
// create priority queue where all the elements of nums will be stored
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
// place all the elements of the array to this priority queue
for (int n : nums){
pq.add(n);
}
// extract the kth largest element
while (numElements-k+1 > 0){
p = pq.poll();
k++;
}
return p;
}
From the Java doc:
Implementation note: this implementation provides O(log(n)) time for
the enqueing and dequeing methods (offer, poll, remove() and
add); linear time for the remove(Object) and contains(Object)
methods; and constant time for the retrieval methods (peek,
element, and size).
The for loop runs n times and the complexity of the above algorithm is O(nlogn).

Heap based solution is perfect if the number of elements in array/stream is unknown. But, what if they are finite but still you want an optimized solution in linear time.
We can use Quick Select, discussed here.
Array = [3,6,2,8,9,4,5]
Let's chose the pivot as first element:
pivot = 3 (at 0th index),
Now partition the array in such a way that all elements less than or equal are on left side and numbers greater than 3 on right side. Like it's done in Quick Sort (discussed on my blog).
So after first pass - [2,3,6,8,9,4,5]
pivot index is 1 (i.e it's the second lowest element). Now apply the same process again.
chose, 6 now, the value at index after previous pivot - [2,3,4,5,6,8,9]
So now 6 is at the proper place.
Keep checking if you have found the appropriate number (kth largest or kth lowest in each iteration). If it's found you are done else continue.

One approach for constant values of k is to use a partial insertion sort.
(This assumes distinct values, but can easily be altered to work with duplicates as well)
last_min = -inf
output = []
for i in (0..k)
min = +inf
for value in input_array
if value < min and value > last_min
min = value
output[i] = min
print output[k-1]
(That's pseudo code, but should be easy enough to implement in Java).
The overall complexity is O(n*k), which means it works pretty well if and only if k is constant or known to be less that log(n).
On the plus side, it is a really simple solution. On the minus side, it is not as efficient as the heap solution

Efficiently determine the parity of a permutation

I have an int[] array of length N containing the values 0, 1, 2, .... (N-1), i.e. it represents a permutation of integer indexes.
What's the most efficient way to determine if the permutation has odd or even parity?
(I'm particularly keen to avoid allocating objects for temporary working space if possible....)

I think you can do this in O(n) time and O(n) space by simply computing the cycle decomposition.
You can compute the cycle decomposition in O(n) by simply starting with the first element and following the path until you return to the start. This gives you the first cycle. Mark each node as visited as you follow the path.
Then repeat for the next unvisited node until all nodes are marked as visited.
The parity of a cycle of length k is (k-1)%2, so you can simply add up the parities of all the cycles you have discovered to find the parity of the overall permutation.
Saving space
One way of marking the nodes as visited would be to add N to each value in the array when it is visited. You would then be able to do a final tidying O(n) pass to turn all the numbers back to the original values.

I selected the answer by Peter de Rivaz as the correct answer as this was the algorithmic approach I ended up using.
However I used a couple of extra optimisations so I thought I would share them:
Examine the size of data first
If it is greater than 64, use a java.util.BitSet to store the visited elements
If it is less than or equal to 64, use a long with bitwise operations to store the visited elements. This makes it O(1) space for many applications that only use small permutations.
Actually return the swap count rather than the parity. This gives you the parity if you need it, but is potentially useful for other purposes, and is no more expensive to compute.
Code below:
public int swapCount() {
if (length()<=64) {
return swapCountSmall();
} else {
return swapCountLong();
}
}
private int swapCountLong() {
int n=length();
int swaps=0;
BitSet seen=new BitSet(n);
for (int i=0; i<n; i++) {
if (seen.get(i)) continue;
seen.set(i);
for(int j=data[i]; !seen.get(j); j=data[j]) {
seen.set(j);
swaps++;
}
}
return swaps;
}
private int swapCountSmall() {
int n=length();
int swaps=0;
long seen=0;
for (int i=0; i<n; i++) {
long mask=(1L<<i);
if ((seen&mask)!=0) continue;
seen|=mask;
for(int j=data[i]; (seen&(1L<<j))==0; j=data[j]) {
seen|=(1L<<j);
swaps++;
}
}
return swaps;
}

You want the parity of the number of inversions. You can do this in O(n * log n) time using merge sort, but either you lose the initial array, or you need extra memory on the order of O(n).
A simple algorithm that uses O(n) extra space and is O(n * log n):
inv = 0
mergesort A into a copy B
for i from 1 to length(A):
binary search for position j of A[i] in B
remove B[j] from B
inv = inv + (j - 1)
That said, I don't think it's possible to do it in sublinear memory. See also:
https://cs.stackexchange.com/questions/3200/counting-inversion-pairs
https://mathoverflow.net/questions/72669/finding-the-parity-of-a-permutation-in-little-space

Consider this approach...
From the permutation, get the inverse permutation, by swapping the rows and
sorting according to the top row order. This is O(nlogn)
Then, simulate performing the inverse permutation and count the swaps, for O(n). This should give the parity of the permutation, according to this
An even permutation can be obtained as the composition of an even
number and only an even number of exchanges (called transpositions) of
two elements, while an odd permutation be obtained by (only) an odd
number of transpositions.
from Wikipedia.
Here's some code I had lying around, which performs an inverse permutation, I just modified it a bit to count swaps, you can just remove all mention of a, p contains the inverse permutation.
size_t
permute_inverse (std::vector<int> &a, std::vector<size_t> &p) {
size_t cnt = 0
for (size_t i = 0; i < a.size(); ++i) {
while (i != p[i]) {
++cnt;
std::swap (a[i], a[p[i]]);
std::swap (p[i], p[p[i]]);
}
}
return cnt;
}

Fastest strategy to form and sort an array of positive integers

In Java, what is faster: to create, fill in and then sort an array of ints like below
int[] a = int[1000];
for (int i = 0; i < a.length; i++) {
// not sure about the syntax
a[i] = Maths.rand(1, 500) // generate some random positive number less than 500
}
a.sort(); // (which algorithm is best?)
or insert-sort on the fly
int[] a = int[1000];
for (int i = 0; i < a.length; i++) {
// not sure about the syntax
int v = Maths.rand(1, 500) // generate some random positive number less than 500
int p = findPosition(a, v); // where to insert
if (a[p] == 0) a[p] = v;
else {
shift a by 1 to the right
a[p] = v;
}
}

There are many ways that you could do this:
Build the array and sort as you go. This is likely to be very slow, since the time required to move array elements over to make space for the new element will almost certainly dominate the sorting time. You should expect this to take at best Ω(n2) time, where n is the number of elements that you want to put into the array, regardless of the algorithm used. Doing insertion sort on-the-fly will take expected O(n2) time here.
Build the array unsorted, then sort it. This is probably going to be extremely fast if you use a good sorting algorithm, such as quicksort or radix sort. You should expect this to take O(n log n) time (for quicksort) or O(n lg U) time (for radix sort), where n is the number of values and U is the largest value.
Add the numbers incrementally to a priority queue, then dequeue all elements from the priority queue. Depending on how you implement the priority queue, this could be very fast. For example, using a binary heap here would cause this process to take O(n log n) time, and using a van Emde Boas tree would take O(n lg lg U) time, where U is the largest number that you are storing. That said, the constant factors here would likely make this approach slower than just sorting the values.
Hope this helps!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.