Sorting an array while moving duplicates to the end? - java

This was a question in one my friend's programming class.
Q. How do you sort an array of ints and then arrange them such that all duplicate elements appear at the end of the array?
For example, given the input
{5, 2, 7, 6, 1, 1, 5, 6, 2}
The output would be
{1, 2, 5, 6, 7, 1, 2, 5, 6}
Note that the numbers are sorted and duplicate numbers are after 7, which is the maximum in the array.
This has to be achieved with out using any Java library packages/utils.
I suggested to sort the array first using insertion or bubble sort, and then go over the array, perform something like the following :
for (int i = 0; i < nums.length - 2; i++) {
for (int j = i + 1; j < nums.length; j++) {
//current and next are same, move elements up
//and place the next number at the end.
if (nums[i] == nums[j]) {
int temp = nums[j];
for (int k = j; k < nums.length - 1; k++) {
nums[k] = nums[k + 1];
}
nums[nums.length - 1] = temp;
break;
}
}
}
I tried this myself later (and that is how the code above) - As I try this out, I think this could be achieved by using less code, be more efficiently. And may be I gave a wrong advice.
Any thoughts?

Depending on the parameters of your problem, there are many approaches to solving this.
If you are not allowed to use O(n) external memory, then one option would be to use a standard sorting algorithm to sort the array in-place in O(n log n) time, then to run a second pass over it to move the duplicates to the end (as you've suggested). The code you posted above takes O(n2) time, but I think that this step can be done in O(n log n) time using a slightly more complicated algorithm. The idea works in two steps. In the first step, in O(n log n) time you bring all non-duplicated elements to the front in sorted order and bring all the duplicates to the back in non-sorted order. Once you've done that, you then sort the back half of the array in O(n log n) time using the sorting algorithm from the first step.
I'm not going to go into the code to sort the array. I really love sorting, but there are so many other good resources on how to sort arrays in-place that it's not a good use of my time/space here to go into them. If it helps, here's links to Java implementations of heapsort, quicksort, and smoothsort, all of which runs in O(n log n) time. Heapsort and smoothsort use only O(1) external memory, while quicksort can use O(n) in the worst case (though good implementations can limit this to O(log n) using cute tricks).
The interesting code is the logic to bring all the non-duplicated elements to the front of the range. Intuitively, the code works by storing two pointers - a read pointer and a write pointer. The read pointer points to the next element to read, while the write pointer points to the location where the next unique element should be placed. For example, given this array:
1 1 1 1 2 2 3 4 5 5
We start with the read and write pointers initially pointing at 1:
write v
1 1 1 1 2 2 3 4 5 5
read ^
Next, we skip the read pointer ahead to the next element that isn't 1. This finds 2:
write v
1 1 1 1 2 2 3 4 5 5
read ^
Then, we bump the write pointer to the next location:
write v
1 1 1 1 2 2 3 4 5 5
read ^
Now, we swap the 2 into the spot held by the write pointer:
write v
1 2 1 1 1 2 3 4 5 5
read ^
advance the read pointer to the next value that isn't 2:
write v
1 2 1 1 1 2 3 4 5 5
read ^
then advance the write pointer:
write v
1 2 1 1 1 2 3 4 5 5
read ^
Again, we exchange the values pointed at by 'read' and 'write' and move the write pointer forward, then move the read pointer to the next unique value:
write v
1 2 3 1 1 2 1 4 5 5
read ^
Once more yields
write v
1 2 3 4 1 2 1 1 5 5
read ^
and the final iteration gives
write v
1 2 3 4 5 2 1 1 1 5
read ^
If we now sort from the write pointer to the read pointer, we get
write v
1 2 3 4 5 1 1 1 2 5
read ^
and bingo! We've got the answer we're looking for.
In (untested, sorry...) Java code, this fixup step might look like this:
int read = 0;
int write = 0;
while (read < array.length) {
/* Swap the values pointed at by read and write. */
int temp = array[write];
array[write] = array[read];
array[read] = temp;
/* Advance the read pointer forward to the next unique value. Since we
* moved the unique value to the write location, we compare values
* against array[write] instead of array[read].
*/
while (read < array.length && array[write] == array[read])
++ read;
/* Advance the write pointer. */
++ write;
}
This algorithm runs in O(n) time, which leads to an overall O(n log n) algorithm for the problem. Since the reordering step uses O(1) memory, the overall memory usage would be either O(1) (for something like smoothsort or heapsort) or O(log n) (for something like quicksort).
EDIT: After talking this over with a friend, I think that there is a much more elegant solution to the problem based on a modification of quicksort. Typically, when you run quicksort, you end up partitioning the array into three regions:
+----------------+----------------+----------------+
| values < pivot | values = pivot | values > pivot |
+----------------+----------------+----------------+
The recursion then sorts the first and last regions to put them into sorted order. However, we can modify this for our version of the problem. We'll need as a primitive the rotation algorithm, which takes two adjacent blocks of values in an array and exchanges them in O(n) time. It does not change the relative order of the elements in those blocks. For example, we could use rotation to convert the array
1 2 3 4 5 6 7 8
into
3 4 5 6 7 8 1 2
and can do so in O(n) time.
The modified version of quicksort would work by using the Bentley-McIlroy three-way partition algortihm (described here) to, using O(1) extra space, rearrange the array elements into the configuration shown above. Next, we apply a rotation to reorder the elements so that they look like this:
+----------------+----------------+----------------+
| values < pivot | values > pivot | values = pivot |
+----------------+----------------+----------------+
Next, we perform a swap so that we move exactly one copy of the pivot element into the set of elements at least as large as the pivot. This may have extra copies of the pivot behind. We then recursively apply the sorting algorithm to the < and > ranges. When we do this, the resulting array will look like this:
+---------+-------------+---------+-------------+---------+
| < pivot | dup < pivot | > pivot | dup > pivot | = pivot |
+---------+-------------+---------+-------------+---------+
We then apply two rotations to the range to put it into the final order. First, rotate the duplicate values less than the pivot with the values greater than the pivot. This gives
+---------+---------+-------------+-------------+---------+
| < pivot | > pivot | dup < pivot | dup > pivot | = pivot |
+---------+---------+-------------+-------------+---------+
At this point, this first range is the unique elements in ascending order:
+---------------------+-------------+-------------+---------+
| sorted unique elems | dup < pivot | dup > pivot | = pivot |
+---------------------+-------------+-------------+---------+
Finally, do one last rotation of the duplicate elements greater than the pivot and the elements equal to the pivot to yield this:
+---------------------+-------------+---------+-------------+
| sorted unique elems | dup < pivot | = pivot | dup > pivot |
+---------------------+-------------+---------+-------------+
Notice that these last three blocks are just the sorted duplicate values:
+---------------------+-------------------------------------+
| sorted unique elems | sorted duplicate elements |
+---------------------+-------------------------------------+
and voila! We've got everything in the order we want. Using the same analysis that you'd do for normal quicksort, plus the fact that we're only doing O(n) work at each level (three extra rotations), this works out to O(n log n) in the best case with O(log n) memory usage. It's still O(n2) in the worst case with O(log n) memory, but that happens with extremely low probability.
If you are allowed to use O(n) memory, one option would be to build a balanced binary search tree out of all of the elements that stores key/value pairs, where each key is an element of the array and the value is the number of times it appears. You could then sort the array in your format as follows:
For each element in the array:
If that element already exists in the BST, increment its count.
Otherwise, add a new node to the BST with that element having count 1.
Do an inorder walk of the BST. When encountering a node, output its key.
Do a second inorder walk of the BST. When encountering a node, if it has count greater than one, output n - 1 copies of that node, where n is the number of times it appears.
The runtime of this algorithm is O(n log n), but it would be pretty tricky to code up a BST from scratch. It also requires external space, which I'm not sure you're allowed to do.
However, if you are allowed external space and the arrays you are sorting are small and contain small integers, you could modify the above approach by using a modified counting sort. Just replace the BST with an array large enough for each integer in the original array to be a key. This reduces the runtime to O(n + k), with memory usage O(k), where k is the largest element in the array.
Hope this helps!

a modified merge sort could do the trick: on the last merge pass keep track of the last number you pushed on the front of result array and if the lowest of the next numbers is equal add to the end instead of front

Welcome to the world of Data Structures and Algorithms. You're absolutely right in that you could sort that faster. You could also do it a dozen different ways. PHD's are spent on this stuff :)
Here's a link where you can see an optimized bubble sort
You might also want to check out Big O Notation
Have fun and good luck!

Use quicksort to sort the array. When implementing the sort you can modify it slightly by adding all duplicates to a seperate duplicate array. When done simply append the duplicate array to the end of the sorted array.

Related

Create 2 arrays - 1st has random integers, 2nd has unique random integers

I am working on a school homework problem. I need to create 2 int[] arrays. The first array int[10] is filled with random integers. The second array has the same numbers as in the first array, but without any duplicates.
For example, assume my first array is 1,2,2,3,1,5,5,7,9,9. My second array would then be 1,2,3,5,7,9.
Could someone please point me in the right direction to solving this problem.
Put the numbers into a Set. Then retrieve numbers from the Set. Simple! Duplicates will automatically be removed!
I would do the following (assuming that it is homework and you shouldn't be doing anything too complicated)...
Sort the array using java.util.Arrays.sort(myArray); - this will order the numbers, and make sure that all repeating numbers are next to each other.
Loop through the array and keep a count of the number of unique numbers (ie compare the current number to the next number - if they're different, increment the counter by 1)
Create your second int[] array to the correct size (from point 2)
Repeat the same process as point 2, but fill your new array with the unique numbers, rather than incrementing a counter.
This should be enough to get you moving in the right direction. When you have some code, if you still have questions, come back to us and ask.
I recommend using a Set , but here's a way to do it without using a Set. (Note: This will work, but don't ask me about the efficiency of this!)
Have a function like this -
public static boolean isNumberInArray(int[] array, int number)
{
for(int i=0; i<array.length; i++)
{
if(number == array[i])
return true;
}
return false;
}
Now use this function before you make an insert into the new array. I leave you to figure out that part. It's homework after all!
Hints(WATTO explains it better):
a = sorted first array
lastItem = a[0]
append lastItem into new array
for i in 1 to length(a):
if a[i] != lastItem:
append a[i] into new array
lastItem = a[i]
#WATTO Studios has a good approach. Sorting is always useful when duplicates are involved.
I will suggest an alternative method using hash tables:
Create a hashing structure with an integer as key (the number in the original array) and a counter as a value.
Go through the original array and for each number encountered increment it's corresponding counter value in the hash table.
Go through the original array again. For each number check back the hash table. If the counter associated is greater than 1, remove the value and decrement the counter.
Let's see a practical case:
4 5 6 4 1 1 3
First pass will create the following table:
1 -> 2
3 -> 1
4 -> 2
5 -> 1
6 -> 1
Second pass step by step:
4 5 6 4 1 1 3
^
4 has a counter of 2 -> remove and decrement:
1 -> 2
3 -> 1
4 -> 1
5 -> 1
6 -> 1
5 6 4 1 1 3
^
5 has a counter of 1 -> ignore
6 has a counter of 1 -> ignore
4 has a counter of 1 -> ignore
1 has a counter of 2 -> remove and decrement
1 -> 1
3 -> 1
4 -> 1
5 -> 1
6 -> 1
5 6 4 1 3
^
1 has a counter of 1 -> ignore
3 has a counter of 1 -> ignore
Final array:
5 6 4 1 3
There are, of course, more efficient ways to handle the removal (since using an array implies shifting), like inserting the items into a linked list for example. I'll let you decide that. :)
Edit: An even faster approach, requiring a single pass:
Use the same hashing structure as above.
Go through the original array. For each item check the table. If the associated counter is 0, increment it to 1. If it's already 1, remove the item.

Reversing a part of an array (or any other data structure )

I want to reverse an array (or any other data structure) but because this operation is going to be done on the array for n times , im looking for the best solution possible, I have the sorted array , which is gotten in O(nlgn) time , i start looking for first element in the sorted array , inside the unsorted array ( which is equal to finding the smallest key in the unsorted array ) then I reverse the array from the beginning to the index of that value , then i do the same for the rest , find the second smallest value's index , and reverse the array again , from the second index to the end of the array and so on :
for example , consider this array :
*‫2 6 (*1*) 5 4 *3‬ // array is reversed from its 0th index to the 3rd index (0 based)
1 *5* 4 3 6 (*2*) // array is reversed from its 1st index (0 based ) to the 5th index
1 2 *6* *3* 4 5 // and ...
1 2 3 *6* *4* 5
1 2 3 4 *6* *5*
1 2 3 4 5 6
well i have to sort the array in order to have the values im looking for in the unsorted array , it'll take o(nlgn) time , and doing the algorithm above , will take o(n^2) ,any idea to make it more quick , to be done in o(nlgn) time ? so the question is reversing a sub array of the array in the least time Order , cause it's done for many times in large sized arrays. ( I can get the indices i and j ( which are the first and last index of the sub array ) in O (n) time cause i have the sorted array and i'll just look up the numbers in the unsorted array ) so im looking for the best time order for reversing an array from it's ith index to it's jth index .
thanks in advance
Here comes an O(n) solution (i think, reading your description was hard). It's a data structure wich allows 1) reversing a sub-array in O(1), 2) Getting a value from the array in O(r), where r is the number of reversings that is done, 3) find the index of an element in O(n), where n is the length of the list.
Just store our array as usual, and have a list of ReversedRange(imin, imax) elements. Reversing part of the array is as easy as inserting another element in this list.
Whenever you need to get a value from the modified array at index i, you look through all the ReversedRange for which imin <= i <= imax, and calculate the index j which corresponds to the original array index. You need to check r reversings, so it is O(r).
To get the index i of a value v, look through the original array and find the index j. Done in O(n) time. Do the same traversing of the ReversedRanges, only in the oppsite direction to calculate i. Done in O(r) time, total O(n+r) which is O(n).
Example
Consider the following list: 0 1 2 3 4 5 6 7 8 9. Now say we reverse the list form indexes 1 through 6, and then from 0 through 5. So we have:
I 0 1 2 3 4 5 6 7 8 9
| |
II 0 6 5 4 3 2 1 7 8 9
| |
III 2 3 4 5 6 0 1 7 8 9
No let us map the index i = 2 to the original array. From the graph we see that we should end up with III(2) = 4
1) since i = 2 is in [0, 5] we calculate
i <- 5 - (i - 0) = 5 - 2 = 3
2) since i = 3 is in [1, 6] we calculate
i <- 6 - (i - 1) = 6 - 2 = 4
3) there are no more ranges, we are done with result 4!
(Possibly) memory heavy...
Why don't you maintain two Collections. Traverse the original collection until you have found your reverse point. Then traverse backwards from there with an iterator or indexing if you have it (ArrayList) adding each element to a new collection. Then merge the two collections (The reversed portion and the previous untouched portion). Repeat this until finished.
Seems like a very simple answer to a complicated question so maybe I'm missing something. If you are looking for an efficient way to reverse part of an array then something like the following should work. You could easily make it generic of course.
// endIndex and startIndex are inclusive here
int half = startIndex + ((endIndex + 1) - startIndex) / 2;
int endCount = endIndex;
for (int startCount = startIndex; startCount < half; startCount++) {
int store = array[startCount];
array[startCount] = array[endCount];
array[endCount] = store;
endCount--;
}
Seems like the rest of your code would be much more complex than this. Not sure what the O() is here because it is not doing comparisons which are traditionally the measure but this is doing 1.5 x N assignments to reverse the array. I don't see any way that this can be done faster.

Find all differences in an array in O(n)

Question: Given a sorted array A find all possible difference of elements from A.
My solution:
for (int i=0; i<n-1; ++i) {
for (int j=i+1; j<n; ++j) {
System.out.println(Math.abs(ai-aj));
}
}
Sure, it's O(n^2), but I don't over count things at all. I looked online and I found this: http://www.careercup.com/question?id=9111881. It says you can't do better, but at an interview I was told you can do O(n). Which is right?
A first thought is that you aren't using the fact that the array is sorted. Let's assume it's in increasing order (decreasing can be handled analogously).
We can also use the fact that the differences telescope (i>j):
a_i - a_j = (a_i - a_(i-1)) + (a_(i-1) - a_(i-2)) + ... + (a_(j+1) - a_j)
Now build a new sequence, call it s, that has the simple difference, meaning (a_i - a_(i-1)). This takes only one pass (O(n)) to do, and you may as well skip over repeats, meaning skip a_i if a_i = a_(i+1).
All possible differences a_i-a_j with i>j are of the form s_i + s_(i+1) + ... + s_(j+1). So maybe if you count that as having found them, then you did it in O(n) time. To print them, however, may take as many as n(n-1)/2 calls, and that's definitely O(n^2).
For example for an array with the elements {21, 22, ..., 2n} there are n⋅(n-1)/2 possible differences, and no two of them are equal. So there are O(n2) differences.
Since you have to enumerate all of them, you also need at least O(n2) time.
sorted or unsorted doesn't matter, if you have to calculate each difference there is no way to do it in less then n^2,
the question was asked wrong, or you just do O(n) and then print 42 the other N times :D
You can get another counter-example by assuming the array contents are random integers before sorting. Then the chance that two differences, Ai - Aj vs Ak - Al, or even Ai - Aj vs Aj - Ak, are the same is too small for there to be only O(n) distinct differences Ai - Aj.
Given that, the question to your interviewer is to explain the special circumstances that allow an O(n) solution. One possibility is that the array values are all numbers in the range 0..n, because in this case the maximum absolute difference is only n.
I can do this in O(n lg n) but not O(n). Represent the array contents by an array of size n+1 with element i set to 1 where there is a value i in the array. Then use FFT to convolve the array with itself - there is a difference Ai - Aj = k where the kth element of the convolution is non-zero.
If the interviewer is fond of theoretical games, perhaps he was thinking of using a table of inputs and results? Any problem with a limit on the size of the input, and that has a known solution, can be solved by table lookup. Given that you have first created and stored that table, which might be large.
So if the array size is limited, the problem can be solved by table lookup, which (given some assumptions) can even be done in constant time. Granted, even for a maximum array size of two (assuming 32-bit integers) the table will not fit in a normal computer's memory, or on the disks. For larger max sizes of the array, you're into "won't fit in the known universe" size. But, theoretically, it can be done.
(But in reality, I think that Jens Gustedt's comment is more likely.)
Yes you can surely do that its a little tricky method.
to find differances in O(n) you will need to use BitSet(C++) or any similar Data Structure in respective language.
Initialize two bitset say A and B
You can do as follows:
For each iteration through array:
1--store consecutive differance in BitSet A
2--LeftShift B
3--store consecutive differance in BitSet B
4--take A=A or B
for example I have given code-
Here N is Size of array
for (int i=1;i<N;i++){
int diff = arr[i]-arr[i-1];
A[diff]=1;
B<<=diff;
B[diff]=1;
A=A | B;
}
Bits in A which are 1 will be the differances.
First of all the array need to be sorted
lets think a sorted array ar = {1,2,3,4}
so what we were doing at the O(n^2)
for(int i=0; i<n; i++)
for(int j=i+1; j<n; j++) sum+=abs(ar[i]-ar[j]);
if we do the operations here elaborately then it will look like below
when i = 0 | sum = sum + {(2-1)+(3-1)+(4-1)}
when i = 1 | sum = sum + {(3-2)+(4-2)}
when i = 2 | sum = sum + {(4-3)}
if we write them all
sum = ( -1-1-1) + (2+ -2-2) + (3+3 -3) + (4+4+4 )
we can see that
the number at index 0 is added to the sum for 0 time and substracted from the sum for 3 time.
the number at index 1 is added to the sum for 1 time and substracted from the sum for 2 time.
the number at index 2 is added to the sum for 2 time and substracted from the sum for 1 time.
the number at index 3 is added to the sum for 3 time and substracted from the sum for 0 time.
so for we can say that,
the number at index i will be added to the sum for i time
and will be substracted from the sum for (n-i)-1 time
Then the generalized expression for
each element will be
sum = sum + (i*a[i]) – ((n-i)-1)*a[i];

How do I use the median of an array as the pivot for quicksort

I have to write a quicksort algorithm that uses the median of the array as the pivot. From my general understanding from what I read in a book, I have to use a select algorithm in which I split the array into n/5 sub arrays, sort each of the sub arrays using insertion sort, find the median, then recursively call the select algorithm to find the median of the medians. I'm not even sure how to start this and I'm pretty confused. the selectMedian algorithm call is supposed to look something like this: SelectMedian(int first, int last, int i) where i is the ith index I want to select (in this case it would be the middle index, so array.length/2). The book I'm reading gives this description of it:
The algorithm in words (if n>1):
1. Divide n elements into groups of 5
2. Find median of each group (use insertion sort for this)
3. Use Select() recursively to find median x of the n/5
medians
4. Partition the n elements around x. Let k = rank(x)
5. if (i == k) then return x
if (i < k) then use Select() recursively to find i-th
smallest element in first partition else (i > k) use
Select() recursively to find (i-k)th smallest element in
last partition.
can anyone assist me in writing this algorithm? thanks!
Would that really be necessary? Why not use the median of three where you select the pivot based on the median of three values, ie. the first, middle and last values.
Or you could even use a random pivot, which will drastically lower the chances of ending up with QuickSort's worst case time of O(N²), which may also be appropriate for your implementation.
I assume you can figure out created n/5 sub-arrays of 5 elements each.
Finding the median of a subarray is fairly easy: you look at each element and find the element which has two smaller elements.
For example, you have 1 4 2 3 5. 1 has no smaller elements. 4 has three smaller elements. 2 has one smaller element. 3 has two smaller elements; this is the one you want.
Now you have found n/5 medians. You want to find the median of the medians, so you run the algorithm again.
Example:
1 7 2 4 9 0 3 8 5 6 1 4 7 2 3
[1 7 2 4 9][0 3 8 5 6][1 4 7 2 3]
findMedian([1 7 2 4 9]) = 4;
findMedian([0 3 8 5 6]) = 5;
findMedian([1 4 7 2 3]) = 3;
[4 5 3]
findMedian([4 5 3]) = 4;
4 is your pivot.
The reason you do this is to try and split your array evenly; if your array is split lopsided, you'll get O(N^2) performance; if your array is split evenly, you get O(NlogN) performance.
Selecting a random pivot means you could get either one - in practice it would balance out to O(NlogN) but a lot of applications want consistent performance, and random quicksort is not consistent from run to run.
The reason we use 5 (instead of 3 or 7) is because we're adding another term of time complexity searching for the median - this term has to be less than O(NlogN) but you want it to be as small as possible. Using 3 gets you O(N^2), using 5 gets you O(NlogN), and 5 is the smallest number for which this is true.
(the algorithm to find the median in linear time was given by Blum, Floyd, Pratt, Rivest and Tarjan in their 1973 paper "Time bounds for selection", and answered a famous open problem)
QUICKSORT2(A, p, r)
if p<r
median=floor((p + r) /2) - p + 1
q=SELECT(A, p, r, median)
q=PARTITION2(A, p, r, q)
QUICKSORT2(A, p, q-1)
QUICKSORT2(A, q+1, r)
PARTITION2(A, p, r, q)
switch between A[r] and A[q]
return PARTITION(A, p, r)

How does one output a new array, containing all numbers of a previous array that occur at most k times

How does one create a new array, containing all numbers of the array that occur at most k times in java?
For example, if the array was:
{1,4,4,3,4,3,5,2,5,1,5}
and k = 2, the new array would be:
{1,3,2}.
I assume that I would need to compare the elements within the arrays and add store the number of times it occurs as a variable. Then I would compare that variable to k and if it is smaller or equal to it, it will add it to a new arraylist. I can't really implement this tho.
Thanks
There are many ways to do this, depending on what your constraints are.
If you don't need to worry about memory constraints, you can solve this problem quite easily by using a HashMap that maps from array elements to their frequencies. The algorithm would work by scanning across the array. For each element, if that element is already in the HashMap, you update the key/value pair in the HashMap to increment the frequency of that element. Otherwise, if the element hasn't been seen, you update the frequency to be 1. Once you've finished populating the HashMap, you can then iterate across the map and copy over all elements that have frequency at most k into a new ArrayList. For example:
for(Map.Entry<T, Integer> entry: myMap.entrySet()) {
if (entry.getValue() <= k)
myArrayList.add(entry.getKey());
}
This runs in O(n) time and O(n) memory, which is quite good.
If you don't want to store everything in memory, another option would be to sort the array in O(n log n) time and O(log n) space using Arrays.sort. Once you've sorted the array, all of the copies of each value will be stored consecutively and you can more easily count their frequency. For example, after sorting the array you mentioned in your original post, you would get the array
1 1 2 3 3 4 4 4 5 5 5
From here, it should be much easier to determine how many copies of each element there are. You could do this by walking across the array, keeping track of the element you're currently looking for and how many copies there are. Whenever you see a new element, if it matches the current element, you increment the counter. If not, you reset the element you're checking to be the new element and set the counter back to one. For example, with the above array, you might start off by looking at this element:
1 1 2 3 3 4 4 4 5 5 5
^
Element: 1
Frequency: 1
Now, you look at the next element of the array. Because it's a 1, you increment the frequency count to 2:
1 1 2 3 3 4 4 4 5 5 5
^
Element: 1
Frequency: 2
When you look at the next element, you'll see that it's a 2. This means that there aren't any more 1's left. Because there are two copies of 1, you could then append 1 to the resulting array. You'd then reset the counter to 1, leaving this state:
1 1 2 3 3 4 4 4 5 5 5
^
Element: 2
Frequency: 1
The one thing to watch out for when doing this is to remember to handle the case where you visit the final array element. It's important that when you hit the end of the array, if you have at most k copies of the last element, you add the last number to the output array.
Whenever you do this, if you ever find an element that appears at most k times, you can add it to the new array. This second step runs in O(n) time and O(m) space (where m is the total number of elements in the resulting array), so the total complexity is O(n log n) time and O(m + log n) space.
Hope this helps!
You have at least two ways to do it:
Scan linearly the array, updating an hash table which stores the frequency of each encountered number (Average time: O(n), Space: O(n));
Sort the array and keeping a count of how many times you've seen the current element (which will be set to one every time the current number is different from the previous one). Time: O(n lg n), Space: O(m), where m is the number of returned element (assuming to use a O(1) space-complexity sorter, like heapsort).
The first one is time-optimal, the second one is space-optimal.

Categories