Minimum absolute difference of a set of numbers - java

I am given a number set of size n, and a list of Q inputs. Each input will either remove or add a number to this set. After each input, I am supposed to output the minimum absolute difference of the set of numbers.
Constraints:
2 <= N <= 10^6
1 <= Q <= 10^6
-10^9 <= set[i] <= 10^9
Example:
input:
set = [2, 4, 7]
ADD 6
REMOVE 7
REMOVE 4
ADD 2
output:
1
2
4
0
I am tasked to solve this using an algorithm of time complexity O((N+Q)log(N+Q)) or better.
My current implementation is not fast enough, but it is as follows:
TreeSet<Integer> tree = new TreeSet<>();
HashMap<Integer, Integer> numberFreq = new HashMap<>();
int dupeCount = 0;
for (int i : set) {
tree.add(i);
if (numberFreq.get(i) > 0) dupeCount++;
numberFreq.put(i, numberFreq.getOrDefault(i, 0) + 1);
}
void add(int i) {
if (numberFreq.get(i) > 0) dupeCount++;
numberFreq.put(i, numberFreq.getOrDefault(i, 0) + 1);
tree.add(i); // if duplicate nothing gets added anyway
if (dupeCount > 0) Console.write(0);
else {
int smallestBall = ballTree.first();
int absDiff;
int maxAbsDiff = Integer.MAX_VALUE;
while (ballTree.higher(smallestBall) != null) {
absDiff = Math.abs(smallestBall - ballTree.higher(smallestBall));
maxAbsDiff = Math.min(absDiff, maxAbsDiff);
smallestBall = ballTree.higher(smallestBall);
}
Console.write(maxAbsDiff);
}
}
void remove(int i) {
if (numberFreq.get(i) > 0) dupeCount--;
else tree.remove(i);
numberFreq.put(i, numberFreq.get(i) - 1);
if (dupeCount > 0) Console.write(0);
else {
int smallestBall = ballTree.first();
int absDiff;
int maxAbsDiff = Integer.MAX_VALUE;
while (ballTree.higher(smallestBall) != null) {
absDiff = Math.abs(smallestBall - ballTree.higher(smallestBall));
maxAbsDiff = Math.min(absDiff, maxAbsDiff);
smallestBall = ballTree.higher(smallestBall);
}
Console.write(maxAbsDiff);
}
}
I've tried at it for 2 days now and I'm quite lost.

Here's one algorithm that should work (though I don't know if this is the intended algorithm):
Sort the list of numbers L (if not already sorted): L = [2, 4, 7]
Build a corresponding list D of "sorted adjacent absolute differences" (i.e., the differences between adjacent pairs in the sorted list, sorted themselves in ascending order): D = [2, 3]
For each operation... Suppose the operation is ADD 6, as an example. a) insert the number (6) into L in the correct (sorted) location: L = [2, 4, 6, 7]; b) based on where you inserted it, determine the corresponding adjacent absolute difference that is now obsolete and remove it from D (in this case, the difference 7-4=3 is no longer relevant and can be removed, since 4 and 7 are no longer adjacent with 6 separating them): D = [2]; c) Add in the two new adjacent absolute differences to the correct (sorted) locations (in this case, 6-4=2 and 7-6=1): D = [1, 2, 2]; d) print out the first element of D
If you encounter a remove operation in step 3, the logic is similar but slightly different. You'd find and remove the element from L, remove the two adjacent differences from D that have been made irrelevant by the remove operation, add the new relevant adjacent difference to D, and print the first element of D.
The proof of correctness is straightforward. The minimum adjacent absolute difference will definitely also be the minimum absolute difference, because the absolute difference between two non-adjacent numbers will always be greater than or equal to the absolute difference between two adjacent numbers which lie "between them" in sorted order. This algorithm outputs the minimum adjacent absolute difference after each operation.
You have a few options for the sorted list data structures. But since you want to be able to quickly insert, remove, and read ordered data, I'd suggest something like a self-balancing binary tree. Suppose we use an AVL tree.
Step 1 is O(N log(N)). If the input is an array or something, you could just build an AVL tree; insertion in an AVL tree is log(N), and you have to do it N times to build the tree.
Step 2 is O(N log(N)); you just have to iterate over the AVL tree for L in ascending order, computing adjacent differences as you go, and insert each difference into a new AVL tree for D (again, N insertions each with log(N) complexity).
For a single operation, steps 3a), 3b), 3c), and 3d) are all O(log(N+Q)), since they each involve inserting, deleting, or reading one or two elements from an AVL tree of size < N+Q. So for a single operation, step 3 is O(log(N+Q)). Step 3 repeats this across Q operations, giving you O(Q log(N+Q)).
So the final algorithmic runtime complexity is O(N log(N)) + O(Q log(N+Q)), which is less than O((N+Q) log(N+Q)).
Edit:
I just realized that the "list of numbers" (L) is actually a set (at least, it is according to the question title, but that might be misleading). Sets don't allow for duplicates. But that's fine either way; whenever inserting, just check if it's a duplicate (after determining where to insert it). If it's a duplicate, the whole operation becomes a no-op. This doesn't change the complexity. Though I suppose that's what a TreeSet does anyways.

Related

Finding mode for every window of size k in an array

Given an array of size n and k, how do you find the mode for every contiguous subarray of size k?
For example
arr = 1 2 2 6 6 1 1 7
k = 3
ans = 2 2 6 6 1 1
I was thinking of having a hashmap where the key is no and value is frequency, treemap where the key is freq and value is number, and having a queue to remove the first element when the size > k. Here the time complexity is o(nlog(n)). Can we do this in O(1)?.
This can be done in O(n) time
I was intrigued by this problem in part because, as I indicated in the comments, I felt certain that it could be done in O(n) time. I had some time over this past weekend, so I wrote up my solution to this problem.
Approach: Mode Frequencies
The basic concept is this: the mode of a collection of numbers is the number(s) which occur with the highest frequency within that set.
This means that whenever you add a number to the collection, if the number added was not already one of the mode-values then the frequency of the mode would not change. So with the collection (8 9 9) the mode-values are {9} and the mode-frequency is 2. If you add say a 5 to this collection ((8 9 9 5)) neither the mode-frequency nor the mode-values change. If instead you add an 8 to the collection ((8 9 9 8)) then the mode-values change to {9, 8} but the mode-frequency is still unchanged at 2. Finally, if you instead added a 9 to the collection ((8 9 9 9)), now the mode-frequency goes up by one.
Thus in all cases when you add a single number to the collection, the mode-frequency is either unchanged or goes up by only one. Likewise, when you remove a single number from the collection, the mode-frequency is either unchanged or goes down by at most one. So all incremental changes to the collection result in only two possible new mode-frequencies. This means that if we had all of the distinct numbers of the collection indexed by their frequencies, then we could always find the new Mode in a constant amount of time (i.e., O(1)).
To accomplish this I use a custom data structure ("ModeTracker") that has a multiset ("numFreqs") to store the distinct numbers of the collection along with their current frequency in the collection. This is implemented with a Dictionary<int, int> (I think that this is a Map in Java). Thus given a number, we can use this to find its current frequency within the collection in O(1).
This data structure also has an array of sets ("freqNums") that given a specific frequency will return all of the numbers that have that frequency in the current collection.
I have included the code for this data structure class below. Note that this is implemented in C# as I do not know Java well enough to implement it there, but I believe that a Java programmer should have no trouble translating it.
(pseudo)Code:
class ModeTracker
{
HashSet<int>[] freqNums; //numbers at each frequency
Dictionary<int, int> numFreqs; //frequencies for each number
int modeFreq_ = 0; //frequency of the current mode
public ModeTracker(int maxFrequency)
{
freqNums = new HashSet<int>[maxFrequency + 2];
// populate frequencies, so we dont have to check later
for (int i=0; i<maxFrequency+1; i++)
{
freqNums[i] = new HashSet<int>();
}
numFreqs = new Dictionary<int, int>();
}
public int Mode { get { return freqNums[modeFreq_].First(); } }
public void addNumber(int n)
{
int newFreq = adjustNumberCount(n, 1);
// new mode-frequency is one greater or the same
if (freqNums[modeFreq_+1].Count > 0) modeFreq_++;
}
public void removeNumber(int n)
{
int newFreq = adjustNumberCount(n, -1);
// new mode-frequency is the same or one less
if (freqNums[modeFreq_].Count == 0) modeFreq_--;
}
int adjustNumberCount(int num, int adjust)
{
// make sure we already have this number
if (!numFreqs.ContainsKey(num))
{
// add entries for it
numFreqs.Add(num, 0);
freqNums[0].Add(num);
}
// now adjust this number's frequency
int oldFreq = numFreqs[num];
int newFreq = oldFreq + adjust;
numFreqs[num] = newFreq;
// remove old freq for this number and and the new one
freqNums[oldFreq].Remove(num);
freqNums[newFreq].Add(num);
return newFreq;
}
}
Also, below is a small C# function that demonstrates how to use this datastructure to solve the problem originally posed in the question.
int[] ModesOfSubarrays(int[] arr, int subLen)
{
ModeTracker tracker = new ModeTracker(subLen);
int[] modes = new int[arr.Length - subLen + 1];
for (int i=0; i < arr.Length; i++)
{
//add every number into the tracker
tracker.addNumber(arr[i]);
if (i >= subLen)
{
// remove the number that just rotated out of the window
tracker.removeNumber(arr[i-subLen]);
}
if (i >= subLen - 1)
{
// add the new Mode to the output
modes[i - subLen + 1] = tracker.Mode;
}
}
return modes;
}
I have tested this and it does appear to work correctly for all of my tests.
Complexity Analysis
Going through the individual steps of the `ModesOfSubarrays()` function:
The new ModeTracker object is created in O(n) time or less.
The modes[] array is created in O(n) time.
The For(..) loops N times:
. 3a: the addNumber() function takes O(1) time
. 3b: the removeNumber() function takes O(1) time
. 3c: getting the new Mode takes O(1) time
So the total time is O(n) + O(n) + n*(O(1) + O(1) + O(1)) = O(n)
Please let me know of any questions that you might have about this code.

Should I use a splay tree?

So, for an assignment we're asked to find write a pseudocode that for a given sequence, find the largest frequency of a number from the sequence. So, a quick example would be:
[ 1, 8, 5, 6, 6, 7, 6, 7, 6, 1, 1, 5, 8 ] => The number with the largest frequency is 6. The "winner" is 6.
We have to implement it in O(nlogm) time where m is the number of distinct numbers. So, in the example above, there are 5 different numbers (m=5).
My approach was to go through each number in the sequence and add it to a binary tree (if not already there) and increment the frequency. Thus, for every number number in the sequence, my program goes to the binary tree, finds the element (in logm time) and increments the frequency by one. It does logm in n amount of times, so the program runs in O(nlogm). However, to find out which number had the largest frequency would take another O(m). I'm left with O(nlogm + m), by dropped the lower-order terms this leaves me with O(m) which is not what the professor is asking for.
I remember from class that a splay tree would be a good option to use in order to keep the most frequently access item at the root, thus giving me O(1) or maybe O(logn) at most to get me the "winner"? I don't know where to begin to implement a splay tree.
If you could provide any insight, I would highly appreciate it.
public E theWinner(E[] C) {
int i = 0;
while (i < C.length) {
findCandidate(C[i], this.root);
}
// This is where I'm stuck, returning the winner in < O(n) time.
}
public void findNumber(E number, Node<E> root) {
if (root.left == null && root.right == null) {
this.add(number);
//splay tree?
} else if (root.data.compareTo(number) == 0) {
root.freqCount = root.freqCount + 1;
//splay tree?
} else {
if ( root.data.compareTo(number) < 0) {
findNumber(number, root.right);
} else {
findNumber(number, root.left);
}
}
}
You don't need a splay tree. O(n log m + m) is O(n log m) as the number of distinct elements m is not greater than the total number of elements n. So you can iterate over all the elements in the tree after processing the input sequence to find the maximum.

Longest sequence of numbers

I was recently asked this question in an interview for which i could give an O(nlogn) solution, but couldn't find a logic for O(n) . Can someone help me with O(n) solution?
In an array find the length of longest sequence of numbers
Example :
Input : 2 4 6 7 3 1
Output: 4 (because 1,2,3,4 is a sequence even though they are not in consecutive positions)
The solution should also be realistic in terms of space consumed . i.e the solution should be realistic even with an array of 1 billion numbers
For non-consecutive numbers you needs a means of sorting them in O(n). In this case you can use BitSet.
int[] ints = {2, 4, 6, 7, 3, 1};
BitSet bs = new BitSet();
IntStream.of(ints).forEach(bs::set);
// you can search for the longer consecutive sequence.
int last = 0, max = 0;
do {
int set = bs.nextSetBit(last);
int clear = bs.nextClearBit(set + 1);
int len = clear - set;
if (len > max)
max = len;
last = clear;
} while (last > 0);
System.out.println(max);
Traverse the array once and build the hash map whose key is a number from the input array and value is a boolean variable indicating whether the element has been processed or not (initially all are false). Traverse once more and do the following: when you check number a, put value true for that element in the hash map and immediately check the hash map for the existence of the elements a-1 and a+1. If found, denote their values in the hash map as true and proceed checking their neighbors, incrementing the length of the current contigous subsequence. Stop when there are no neighbors, and update longest length. Move forward in the array and continue checking unprocessed elements. It is not obvious at the first glance that this solution is O(n), but there are only two array traversals and hash map ensures that every element of the input is processed only once.
Main lesson - if you have to reduce time complexity, it is often neccesary to use additional space.

Reverse Engineer Sorting Algorithm

I have been given 3 algorithms to reverse engineer and explain how they work, so far I have worked out that I have been given a quick sorting algorithm and a bubble sorting algorithm; however i'm not sure what algorithm this is. I understand how the quick sort and bubble sort work, but I just can't get my head around this algorithm. I'm unsure what the variables are and was hoping someone out there would be able to tell me whats going on here:
public static ArrayList<Integer> SortB(ArrayList<Integer> a)
{
ArrayList<Integer> array = CopyArray(a);
Integer[] zero = new Integer[a.size()];
Integer[] one = new Integer[a.size()];
int i,b;
Integer x,p;
//Change from 8 to 32 for whole integers - will run 4 times slower
for(b=0;b<8;++b)
{
int zc = 0;
int oc = 0;
for(i=0;i<array.size();++i)
{
x = array.get(i);
p = 1 << b;
if ((x & p) == 0)
{
zero[zc++] = array.get(i);
}
else
{
one[oc++] = array.get(i);
}
}
for(i=0;i<oc;++i) array.set(i,one[i]);
for(i=0;i<zc;++i) array.set(i+oc,zero[i]);
}
return(array);
}
This is a Radix Sort, limited to the least significant eight bits. It does not complete the sort unless you change the loop to go 32 times instead of 8.
Each iteration processes a single bit b. It prepares a mask called p by shifting 1 left b times. This produces a power of two - 1, 2, 4, 8, ..., or 1, 10, 100, 1000, 10000, ... in binary.
For each bit, the number of elements in the original array with bit b set to 1 and to 0 are separated into two buckets called one and zero. Once the separation is over, the elements are placed back into the original array, and the algorithm proceeds to the next iteration.
This implementation uses two times more storage than the size of the original array, and goes through the array a total of 16 times (64 times in the full version - once for reading and once for writing of data for each bit). The asymptotic complexity of the algorithm is linear.
Looks like a bit-by-bit radix sort to me, but it seems to be sorting backwards.

Finding unique numbers from sorted array in less than O(n)

I had an interview and there was the following question:
Find unique numbers from sorted array in less than O(n) time.
Ex: 1 1 1 5 5 5 9 10 10
Output: 1 5 9 10
I gave the solution but that was of O(n).
Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000.
Divide and conquer:
look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1]).
If both are equal, the only element in the sequence is the first (no matter how long the sequence is).
If the are different, divide the sequence and repeat for each subsequence.
Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different).
Java code:
public static List<Integer> findUniqueNumbers(int[] data) {
List<Integer> result = new LinkedList<Integer>();
findUniqueNumbers(data, 0, data.length - 1, result, false);
return result;
}
private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {
int a = data[i1];
int b = data[i2];
// homogenous sequence a...a
if (a == b) {
if (!skipFirst) {
result.add(a);
}
}
else {
//divide & conquer
int i3 = (i1 + i2) / 2;
findUniqueNumbers(data, i1, i3, result, skipFirst);
findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
}
}
I don't think it can be done in less than O(n). Take the case where the array contains 1 2 3 4 5: in order to get the correct output, each element of the array would have to be looked at, hence O(n).
If your sorted array of size n has m distinct elements, you can do O(mlogn).
Note that this is going to efficient when m << n (eg m=2 and n=100)
Algorithm:
Initialization: Current element y = first element x[0]
Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i
Step 2: y = x[i+1] and go to step 1
Edit: In cases where m = O(n) this algorithm is going to work badly. To alleviate it you can run it in parallel with regular O(n) algorithm. The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. The meta algorithm stops when either of these two algorithms complete.
Since the data consists of integers, there are a finite number of unique values that can occur between any two values. So, start with looking at the first and last value in the array. If a[length-1] - a[0] < length - 1, there will be some repeating values. Put a[0] and a[length-1] into some constant-access-time container like a hash set. If the two values are equal, you konow that there is only one unique value in the array and you are done. You know that the array is sorted. So, if the two values are different, you can look at the middle element now. If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. Otherwise, analyze both left and right part recursively.
Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. If the values are all unique and a[length-1] - a[0] = length - 1, you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1]. However, in order to actually list them, you will have to output each number, and there are N of them.
Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. This means that if there are few unique values, you can get them in few operations even for a huge array (e.g. in constant time regardless of array size if there is only one unique value). Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases").
import java.util.*;
/**
* remove duplicate in a sorted array in average O(log(n)), worst O(n)
* #author XXX
*/
public class UniqueValue {
public static void main(String[] args) {
int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
UniqueValue u = new UniqueValue();
System.out.println(u.getUniqueValues(test, 0, test.length - 1));
}
// i must be start index, j must be end index
public List<Integer> getUniqueValues(int[] array, int i, int j) {
if (array == null || array.length == 0) {
return new ArrayList<Integer>();
}
List<Integer> result = new ArrayList<>();
if (array[i] == array[j]) {
result.add(array[i]);
} else {
int mid = (i + j) / 2;
result.addAll(getUniqueValues(array, i, mid));
// avoid duplicate divide
while (mid < j && array[mid] == array[++mid]);
if (array[(i + j) / 2] != array[mid]) {
result.addAll(getUniqueValues(array, mid, j));
}
}
return result;
}
}

Categories