Should I use a splay tree? - java

So, for an assignment we're asked to find write a pseudocode that for a given sequence, find the largest frequency of a number from the sequence. So, a quick example would be:
[ 1, 8, 5, 6, 6, 7, 6, 7, 6, 1, 1, 5, 8 ] => The number with the largest frequency is 6. The "winner" is 6.
We have to implement it in O(nlogm) time where m is the number of distinct numbers. So, in the example above, there are 5 different numbers (m=5).
My approach was to go through each number in the sequence and add it to a binary tree (if not already there) and increment the frequency. Thus, for every number number in the sequence, my program goes to the binary tree, finds the element (in logm time) and increments the frequency by one. It does logm in n amount of times, so the program runs in O(nlogm). However, to find out which number had the largest frequency would take another O(m). I'm left with O(nlogm + m), by dropped the lower-order terms this leaves me with O(m) which is not what the professor is asking for.
I remember from class that a splay tree would be a good option to use in order to keep the most frequently access item at the root, thus giving me O(1) or maybe O(logn) at most to get me the "winner"? I don't know where to begin to implement a splay tree.
If you could provide any insight, I would highly appreciate it.
public E theWinner(E[] C) {
int i = 0;
while (i < C.length) {
findCandidate(C[i], this.root);
}
// This is where I'm stuck, returning the winner in < O(n) time.
}
public void findNumber(E number, Node<E> root) {
if (root.left == null && root.right == null) {
this.add(number);
//splay tree?
} else if (root.data.compareTo(number) == 0) {
root.freqCount = root.freqCount + 1;
//splay tree?
} else {
if ( root.data.compareTo(number) < 0) {
findNumber(number, root.right);
} else {
findNumber(number, root.left);
}
}
}

You don't need a splay tree. O(n log m + m) is O(n log m) as the number of distinct elements m is not greater than the total number of elements n. So you can iterate over all the elements in the tree after processing the input sequence to find the maximum.

Related

Minimum absolute difference of a set of numbers

I am given a number set of size n, and a list of Q inputs. Each input will either remove or add a number to this set. After each input, I am supposed to output the minimum absolute difference of the set of numbers.
Constraints:
2 <= N <= 10^6
1 <= Q <= 10^6
-10^9 <= set[i] <= 10^9
Example:
input:
set = [2, 4, 7]
ADD 6
REMOVE 7
REMOVE 4
ADD 2
output:
1
2
4
0
I am tasked to solve this using an algorithm of time complexity O((N+Q)log(N+Q)) or better.
My current implementation is not fast enough, but it is as follows:
TreeSet<Integer> tree = new TreeSet<>();
HashMap<Integer, Integer> numberFreq = new HashMap<>();
int dupeCount = 0;
for (int i : set) {
tree.add(i);
if (numberFreq.get(i) > 0) dupeCount++;
numberFreq.put(i, numberFreq.getOrDefault(i, 0) + 1);
}
void add(int i) {
if (numberFreq.get(i) > 0) dupeCount++;
numberFreq.put(i, numberFreq.getOrDefault(i, 0) + 1);
tree.add(i); // if duplicate nothing gets added anyway
if (dupeCount > 0) Console.write(0);
else {
int smallestBall = ballTree.first();
int absDiff;
int maxAbsDiff = Integer.MAX_VALUE;
while (ballTree.higher(smallestBall) != null) {
absDiff = Math.abs(smallestBall - ballTree.higher(smallestBall));
maxAbsDiff = Math.min(absDiff, maxAbsDiff);
smallestBall = ballTree.higher(smallestBall);
}
Console.write(maxAbsDiff);
}
}
void remove(int i) {
if (numberFreq.get(i) > 0) dupeCount--;
else tree.remove(i);
numberFreq.put(i, numberFreq.get(i) - 1);
if (dupeCount > 0) Console.write(0);
else {
int smallestBall = ballTree.first();
int absDiff;
int maxAbsDiff = Integer.MAX_VALUE;
while (ballTree.higher(smallestBall) != null) {
absDiff = Math.abs(smallestBall - ballTree.higher(smallestBall));
maxAbsDiff = Math.min(absDiff, maxAbsDiff);
smallestBall = ballTree.higher(smallestBall);
}
Console.write(maxAbsDiff);
}
}
I've tried at it for 2 days now and I'm quite lost.
Here's one algorithm that should work (though I don't know if this is the intended algorithm):
Sort the list of numbers L (if not already sorted): L = [2, 4, 7]
Build a corresponding list D of "sorted adjacent absolute differences" (i.e., the differences between adjacent pairs in the sorted list, sorted themselves in ascending order): D = [2, 3]
For each operation... Suppose the operation is ADD 6, as an example. a) insert the number (6) into L in the correct (sorted) location: L = [2, 4, 6, 7]; b) based on where you inserted it, determine the corresponding adjacent absolute difference that is now obsolete and remove it from D (in this case, the difference 7-4=3 is no longer relevant and can be removed, since 4 and 7 are no longer adjacent with 6 separating them): D = [2]; c) Add in the two new adjacent absolute differences to the correct (sorted) locations (in this case, 6-4=2 and 7-6=1): D = [1, 2, 2]; d) print out the first element of D
If you encounter a remove operation in step 3, the logic is similar but slightly different. You'd find and remove the element from L, remove the two adjacent differences from D that have been made irrelevant by the remove operation, add the new relevant adjacent difference to D, and print the first element of D.
The proof of correctness is straightforward. The minimum adjacent absolute difference will definitely also be the minimum absolute difference, because the absolute difference between two non-adjacent numbers will always be greater than or equal to the absolute difference between two adjacent numbers which lie "between them" in sorted order. This algorithm outputs the minimum adjacent absolute difference after each operation.
You have a few options for the sorted list data structures. But since you want to be able to quickly insert, remove, and read ordered data, I'd suggest something like a self-balancing binary tree. Suppose we use an AVL tree.
Step 1 is O(N log(N)). If the input is an array or something, you could just build an AVL tree; insertion in an AVL tree is log(N), and you have to do it N times to build the tree.
Step 2 is O(N log(N)); you just have to iterate over the AVL tree for L in ascending order, computing adjacent differences as you go, and insert each difference into a new AVL tree for D (again, N insertions each with log(N) complexity).
For a single operation, steps 3a), 3b), 3c), and 3d) are all O(log(N+Q)), since they each involve inserting, deleting, or reading one or two elements from an AVL tree of size < N+Q. So for a single operation, step 3 is O(log(N+Q)). Step 3 repeats this across Q operations, giving you O(Q log(N+Q)).
So the final algorithmic runtime complexity is O(N log(N)) + O(Q log(N+Q)), which is less than O((N+Q) log(N+Q)).
Edit:
I just realized that the "list of numbers" (L) is actually a set (at least, it is according to the question title, but that might be misleading). Sets don't allow for duplicates. But that's fine either way; whenever inserting, just check if it's a duplicate (after determining where to insert it). If it's a duplicate, the whole operation becomes a no-op. This doesn't change the complexity. Though I suppose that's what a TreeSet does anyways.

Circular Array Loop, detection

I am working on a problem, and have spent some time on it.
Problem statement:
You are given an array of positive and negative integers. If a number n at an index is positive, then move forward n steps. Conversely, if it's negative (-n), move backward n steps. Assume the first element of the array is forward next to the last element, and the last element is backward next to the first element. Determine if there is a loop in this array. A loop starts and ends at a particular index with more than 1 element along the loop. The loop must be "forward" or "backward'.
Example 1: Given the array [2, -1, 1, 2, 2], there is a loop, from index 0 -> 2 -> 3 -> 0.
Example 2: Given the array [-1, 2], there is no loop.
Note: The given array is guaranteed to contain no element "0".
Can you do it in O(n) time complexity and O(1) space complexity?
And this is my solution in progress, however, I am not sure how should I end the do-while condition, when there is no loop detected. I believe my code will run infinitely if there is no loop detected.
public static boolean circularArrayLoop(int[] nums) {
int size = nums.length;
if(size < 2) return false;
int loopStart = nums[0];
int index = 0;
int start = nums[0];
do{
if(nums[index] > 0){
index = moveForward(index, nums[index], size);
}else {
index = moveBackward(index, Math.abs(nums[index]), size);
}
}while (loopStart != nums[index]);
}
This can be seen as a version of cycle detection in a directed (possibly disconnected) graph or more like finding a minimum spanning trees for all the connected subgraphs in the given graph. The numbers in the array are vertices and an edge will be formed between the vertices based on the vertice value. There are no known graph parsing algorithms which can possibly solve it in O(1) space complexity. This might be solved in O(n) time complexity as the best graph parsing algorithms can be solved in O(V+E) time and V=E in this case which makes it possible to solve with O(n) time complexity in some cases. The best-known algorithm is Kruskal's: http://www.geeksforgeeks.org/greedy-algorithms-set-2-kruskals-minimum-spanning-tree-mst/ which solves in O(nlogn) time.
Since there are guaranteed no elements with value 0, there is always going to be a loop. The qualifier is loops must be greater than a single element long.
With this condition, when advancing to the next index as directed by the array element value results in the same index being reached, "no" loop is present.
The fast and slow moving cursors can be used to find the beginning of the loop. Then advancing a single cursor until it returns to the same index would let you iterate over the elements of the loop. If a single advancement returns the cursor to the same index no loop is present.
public static void main(String[] args) {
int[] loop = {2, -1, 1, 2, 2};
int[] noloop = {-1, 2};
System.out.println(circularArrayLoop(loop));
System.out.println(circularArrayLoop(noloop));
}
static int nextIndex(int[] nums, int cur) {
// Get next index based on value taking into account wrapping around
}
static boolean circularArrayLoop(int[] nums) {
int fast = 0;
int slow = 0;
do {
// advance fast cursor twice
// advance slow cursor once
} while (fast != slow);
int next = nextIndex(nums, fast);
// return if the loop has more than a single element
}
Am I wrong to think there is no guarantee that the loop will go on with the first element ? Thus, you can't just do int loopStart = nums[0];
What if your example 1 was rather [2, -1, 1, 4, 2], then the loop would be from index 0 -> 2 -> 3 -> 2. And, your check with loopstart wouldn't work, since it checks sums[0].
A good solution is to use 2 variables and move them at different speed (one twice the speed). If the array/linked list is circular, you'll get to a point where var1 equals var2.
Here's the pseudocode:
if array.length<=1
return false
int i=0;
//loop is when "var1 == var2"
//end is when "var1 == abs(array.length)"
loop (until var1 == var2 or var1 reaches the end)
var1 = moveToNext(var1)
if (i++ % 2 == 0)
var2 = moveToNext(var2)
return var1 == var2;
This is quite similar to a question generally asked using linked list: How to detect a loop in a linked list?

Check if a value in Array correspond to it place

I was confronted not so long ago to an algorithmic problem.
I needed to find if a value stored in an array was at it "place".
An example will be easier to understand.
Let's take an Array A = {-10, -3, 3, 5, 7}. The algorithm would return 3, because the number 3 is at A[2] (3rd place).
On the contrary, if we take an Array B = {5, 7, 9, 10}, the algorithm will return 0 or false or whatever.
The array is always sorted !
I wasn't able to find a solution with a good complexity. (Looking at each value individualy is not good !) Maybe it is possible to resolve that problem by using an approach similar to merge sorting, by cuting in half and verifying on those halves ?
Can somebody help me on this one ?
Java algorithm would be the best, but pseudocode would also help me a lot !
Here is an algorithm (based on binary search) to find all matching indices that has a best-case complexity of O(log(n)) and a worst case complexity of O(n):
1- Check the element at position m = array.length / 2
2- if the value array[m] is strictly smaller than m, you can forget about the left half of the array (from index 0 to index m-1), and apply recursively to the right half.
3- if array[m]==m, add one to the counter and apply recursively to both halves
4- if array[m]>m, forget about the right half of the array and apply recursively to the left half.
Using threads can accelerate things here. I suppose that there is no repetitions in the array.
Since there can be no duplicates, you can use the fact that the function f(x): A[x] - x is monotonous and apply binary search to solve the problem in O(log n) worst-case complexity.
You want to find a point where that function A[x] - x takes value zero. This code should work:
boolean binarySearch(int[] data, int size)
{
int low = 0;
int high = size - 1;
while(high >= low) {
int middle = (low + high) / 2;
if(data[middle] - 1 == middle) {
return true;
}
if(data[middle] - 1 < middle) {
low = middle + 1;
}
if(data[middle] - 1 > middle) {
high = middle - 1;
}
}
return false;
}
Watch out for the fact that arrays in Java are 0-indexed - that is the reason why I subtract -1 from the array.
If you want the find the first number in the array that is at its own place, you just have to iterate the array:
static int find_in_place(int[] a) {
for (int i=0; i<a.length; i++) {
if (a[i] == i+1) {
return a[i];
}
}
return 0;
}
It has a complexity of O(n), and an average cost of n/2
You can skip iterating if there is no such element by adding a special condition
if(a[0]>1 && a[a.length-1]>a.length){
//then don't iterate through the array and return false
return false;
} else {
//make a loop here
}
Using binary search (or a similar algorithm) you could get better than O(n). Since the array is sorted, we can make the following assumptions:
if the value at index x is smaller than x-1 (a[x] <= x), you know that all previous values also must be smaller than their index (because no duplicates are allowed)
if a[x] > x + 1 all following values must be greater than their index (again no duplicates allowed).
Using that you can use a binary approach and pick the center value, check for its index and discard the left/right part if it matches one of the conditions above. Of course you stop when a[x] = x + 1.
simply use a binary search for the 0 and use for compare the value in the array minus index of the array. O(log n)

Finding unique numbers from sorted array in less than O(n)

I had an interview and there was the following question:
Find unique numbers from sorted array in less than O(n) time.
Ex: 1 1 1 5 5 5 9 10 10
Output: 1 5 9 10
I gave the solution but that was of O(n).
Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000.
Divide and conquer:
look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1]).
If both are equal, the only element in the sequence is the first (no matter how long the sequence is).
If the are different, divide the sequence and repeat for each subsequence.
Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different).
Java code:
public static List<Integer> findUniqueNumbers(int[] data) {
List<Integer> result = new LinkedList<Integer>();
findUniqueNumbers(data, 0, data.length - 1, result, false);
return result;
}
private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {
int a = data[i1];
int b = data[i2];
// homogenous sequence a...a
if (a == b) {
if (!skipFirst) {
result.add(a);
}
}
else {
//divide & conquer
int i3 = (i1 + i2) / 2;
findUniqueNumbers(data, i1, i3, result, skipFirst);
findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
}
}
I don't think it can be done in less than O(n). Take the case where the array contains 1 2 3 4 5: in order to get the correct output, each element of the array would have to be looked at, hence O(n).
If your sorted array of size n has m distinct elements, you can do O(mlogn).
Note that this is going to efficient when m << n (eg m=2 and n=100)
Algorithm:
Initialization: Current element y = first element x[0]
Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i
Step 2: y = x[i+1] and go to step 1
Edit: In cases where m = O(n) this algorithm is going to work badly. To alleviate it you can run it in parallel with regular O(n) algorithm. The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. The meta algorithm stops when either of these two algorithms complete.
Since the data consists of integers, there are a finite number of unique values that can occur between any two values. So, start with looking at the first and last value in the array. If a[length-1] - a[0] < length - 1, there will be some repeating values. Put a[0] and a[length-1] into some constant-access-time container like a hash set. If the two values are equal, you konow that there is only one unique value in the array and you are done. You know that the array is sorted. So, if the two values are different, you can look at the middle element now. If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. Otherwise, analyze both left and right part recursively.
Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. If the values are all unique and a[length-1] - a[0] = length - 1, you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1]. However, in order to actually list them, you will have to output each number, and there are N of them.
Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. This means that if there are few unique values, you can get them in few operations even for a huge array (e.g. in constant time regardless of array size if there is only one unique value). Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases").
import java.util.*;
/**
* remove duplicate in a sorted array in average O(log(n)), worst O(n)
* #author XXX
*/
public class UniqueValue {
public static void main(String[] args) {
int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
UniqueValue u = new UniqueValue();
System.out.println(u.getUniqueValues(test, 0, test.length - 1));
}
// i must be start index, j must be end index
public List<Integer> getUniqueValues(int[] array, int i, int j) {
if (array == null || array.length == 0) {
return new ArrayList<Integer>();
}
List<Integer> result = new ArrayList<>();
if (array[i] == array[j]) {
result.add(array[i]);
} else {
int mid = (i + j) / 2;
result.addAll(getUniqueValues(array, i, mid));
// avoid duplicate divide
while (mid < j && array[mid] == array[++mid]);
if (array[(i + j) / 2] != array[mid]) {
result.addAll(getUniqueValues(array, mid, j));
}
}
return result;
}
}

Find the sum of all even-valued terms in the Fibonacci Sequence

I'm having trouble figuring why the following code isn't producing the expected output. Instead, result = 272 which does not seem right.
/*
*Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be: 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
*Find the sum of all the even-valued terms in the sequence which do not exceed four million.
*/
public class Fibonacci
{
public static void main (String[] args)
{
int result = 0;
for(int i=2;i<=33;i++)
{
System.out.println("i:" + fib(i));
if(i % 2 == 0) //if i is even
{
result += i;
System.out.println("result:" + result);
}
}
}
public static long fib(int n)
{
if (n <= 1)
return n;
else
return fib(n-1) + fib(n-2);
}
}
The line result += i; doesn't add a Fibonacci number to result.
You should be able to figure out how to make it add a Fibonacci number to result.
Hint: Consider making a variable that stores the number you're trying to work with.
First of all, you got one thing wrong for the Fib. The definition for Fib can be found here: http://en.wikipedia.org/wiki/Fibonacci_number.
Second of all (i % 2) is true for every other number (2, 4, 6 and so), which will man that it is true for fib(2), fib(4), and so on.
And last, result += i adds the index. What you wan't to add is the result of the fib(i). So first you need to calculate fib(i), store that in a variable, and check if THAT is an even or odd number, and if it is, then add the variable to the result.
[Edit]
One last point: doing fib in recursion when you wan't to add up all the numbers can be really bad. If you are working with to high numbers you might even end up with StackOverflowException, so it is always a good idea to try and figure a way so that you don't have to calculate the same numbers over and over again. In this example, you want to sum the numbers, so instead of first trying fib(0), then fib(1) and so on, you should just go with the list, check every number on the way and then add it to the result if it matches your criteria.
Well, here's a good starting point, but it's C, not Java. Still, it might help: link text

Categories