Why are these loop & hashing operations take O(N) time complexity? - java

Given the array :
int arr[]= {1, 2, 3, 2, 3, 1, 3}
You are asked to find a number within the array that occurs odd number of times. It's 3 (occurring 3 times). The time complexity should be at least O(n).
The solution is to use an HashMap. Elements become keys and their counts become values of the hashmap.
// Code belongs to geeksforgeeks.org
// function to find the element occurring odd
    // number of times
    static int getOddOccurrence(int arr[], int n)
    {
        HashMap<Integer,Integer> hmap = new HashMap<>();
        // Putting all elements into the HashMap
        for(int i = 0; i < n; i++)
        {
            if(hmap.containsKey(arr[i]))
            {
                int val = hmap.get(arr[i]);
                // If array element is already present then
                // increase the count of that element.
                hmap.put(arr[i], val + 1); 
            }
            else
                // if array element is not present then put
                // element into the HashMap and initialize 
                // the count to one.
                hmap.put(arr[i], 1); 
        }
        // Checking for odd occurrence of each element present
          // in the HashMap 
        for(Integer a:hmap.keySet())
        {
            if(hmap.get(a) % 2 != 0)
                return a;
        }
        return -1;
    }
I don't get why this overall operation takes O(N) time complexity. If I think about it, the loop alone takes O(N) time complexity. Those hmap.put (an insert operation) and hmap.get(a find operations) take O(N) and they are nested within the loop. So normally I would think this function takes O(N^2) times. Why it instead takes O(N)?.

I don't get why this overall operation takes O(N) time complexity.
You must examine all elements of the array - O(N)
For each element of the array you call contain, get and put on the array. These are O(1) operations. Or more precisely, they are O(1) on averaged amortized over the lifefime of the HashMap. This is due to the fact that a HashMap will grow its hash array when the ratio of the array size to the number of elements exceeds the load factor.
O(N) repetitions of 2 or 3 O(1) operations is O(N). QED
Reference:
Is a Java hashmap really O(1)?
Strictly speaking there are a couple of scenarios where a HashMap is not O(1).
If the hash function is poor (or the key distribution is pathological) the hash chains will be unbalanced. With early HashMap implementations, this could lead to (worst case) O(N) operations because operations like get had to search a long hash chain. With recent implementations, HashMap will construct a balanced binary tree for any hash chain that is too long. That leads to worst case O(logN) operations.
HashMap is unable to grow the hash array beyond 2^31 hash buckets. So at that point HashMap complexity starts transitioning to O(log N) complexity. However if you have a map that size, other secondary effects will probably have affected the real performance anyway.

The algorithm first iterates the array of numbers, of size n, to generate the map with counts of occurrences. It should be clear why this is an O(n) operation. Then, after the hashmap has been built, it iterates that map and finds all entries whose counts are odd numbers. The size of this map would in practice be somewhere between 1 (in the case of all input numbers being the same), and n (in the case where all inputs are different). So, this second operation is also bounded by O(n), leaving the entire algorithm O(n).

Related

Best way to retrieve K largest elements from large unsorted arrays?

I recently had a coding test during an interview. I was told:
There is a large unsorted array of one million ints. User wants to retrieve K largest elements. What algorithm would you implement?
During this, I was strongly hinted that I needed to sort the array.
So, I suggested to use built-in sort() or maybe a custom implementation if performance really mattered. I was then told that using a Collection or array to store the k largest and for-loop it is possible to achieve approximately O(N), in hindsight, I think it's O(N*k) because each iteration needs to compare to the K sized array to find the smallest element to replace, while the need to sort the array would cause the code to be at least O(N log N).
I then reviewed this link on SO that suggests priority queue of K numbers, removing the smallest number every time a larger element is found, which would also give O(N log N). Write a program to find 100 largest numbers out of an array of 1 billion numbers
Is the for-loop method bad? How should I justify pros/cons of using the for-loop or the priorityqueue/sorting methods? I'm thinking that if the array is already sorted, it could help by not needing to iterate through the whole array again, i.e. if some other method of retrieval is called on the sorted array, it should be constant time. Is there some performance factor when running the actual code that I didn't consider when theorizing pseudocode?
Another way of solving this is using Quickselect. This should give you a total average time complexity of O(n). Consider this:
Find the kth largest number x using Quickselect (O(n))
Iterate through the array again (or just through the right-side partition) (O(n)) and save all elements ≥ x
Return your saved elements
(If there are repeated elements, you can avoid them by keeping count of how many duplicates of x you need to add to the result.)
The difference between your problem and the one in the SO question you linked to is that you have only one million elements, so they can definitely be kept in memory to allow normal use of Quickselect.
There is a large unsorted array of one million ints. The user wants to retrieve the K largest elements.
During this, I was strongly hinted that I needed to sort the array.
So, I suggested using a built-in sort() or maybe a custom
implementation
That wasn't really a hint I guess, but rather a sort of trick to deceive you (to test how strong your knowledge is).
If you choose to approach the problem by sorting the whole source array using the built-in Dual-Pivot Quicksort, you can't obtain time complexity better than O(n log n).
Instead, we can maintain a PriorytyQueue which would store the result. And while iterating over the source array for each element we need to check whether the queue has reached the size K, if not the element should be added to the queue, otherwise (is size equals to K) we need to compare the next element against the lowest element in the queue - if the next element is smaller or equal we should ignore it if it is greater the lowest element has to be removed and the new element needs to be added.
The time complexity of this approach would be O(n log k) because adding a new element into the PriorytyQueue of size k costs O(k) and in the worst-case scenario this operation can be performed n times (because we're iterating over the array of size n).
Note that the best case time complexity would be Ω(n), i.e. linear.
So the difference between sorting and using a PriorytyQueue in terms of Big O boils down to the difference between O(n log n) and O(n log k). When k is much smaller than n this approach would give a significant performance gain.
Here's an implementation:
public static int[] getHighestK(int[] arr, int k) {
Queue<Integer> queue = new PriorityQueue<>();
for (int next: arr) {
if (queue.size() == k && queue.peek() < next) queue.remove();
if (queue.size() < k) queue.add(next);
}
return toIntArray(queue);
}
public static int[] toIntArray(Collection<Integer> source) {
return source.stream().mapToInt(Integer::intValue).toArray();
}
main()
public static void main(String[] args) {
System.out.println(Arrays.toString(getHighestK(new int[]{3, -1, 3, 12, 7, 8, -5, 9, 27}, 3)));
}
Output:
[9, 12, 27]
Sorting in O(n)
We can achieve worst case time complexity of O(n) when there are some constraints regarding the contents of the given array. Let's say it contains only numbers in the range [-1000,1000] (sure, you haven't been told that, but it's always good to clarify the problem requirements during the interview).
In this case, we can use Counting sort which has linear time complexity. Or better, just build a histogram (first step of Counting Sort) and look at the highest-valued buckets until you've seen K counts. (i.e. don't actually expand back to a fully sorted array, just expand counts back into the top K sorted elements.) Creating a histogram is only efficient if the array of counts (possible input values) is smaller than the size of the input array.
Another possibility is when the given array is partially sorted, consisting of several sorted chunks. In this case, we can use Timsort which is good at finding sorted runs. It will deal with them in a linear time.
And Timsort is already implemented in Java, it's used to sort objects (not primitives). So we can take advantage of the well-optimized and thoroughly tested implementation instead of writing our own, which is great. But since we are given an array of primitives, using built-in Timsort would have an additional cost - we need to copy the contents of the array into a list (or array) of wrapper type.
This is a classic problem that can be solved with so-called heapselect, a simple variation on heapsort. It also can be solved with quickselect, but like quicksort has poor quadratic worst-case time complexity.
Simply keep a priority queue, implemented as binary heap, of size k of the k smallest values. Walk through the array, and insert values into the heap (worst case O(log k)). When the priority queue is too large, delete the minimum value at the root (worst case O(log k)). After going through the n array elements, you have removed the n-k smallest elements, so the k largest elements remain. It's easy to see the worst-case time complexity is O(n log k), which is faster than O(n log n) at the cost of only O(k) space for the heap.
Here is one idea. I will think for creating array (int) with max size (2147483647) as it is max value of int (2147483647). Then for every number in for-each that I get from the original array just put the same index (as the number) +1 inside the empty array that I created.
So in the end of this for each I will have something like [1,0,2,0,3] (array that I created) which represent numbers [0, 2, 2, 4, 4, 4] (initial array).
So to find the K biggest elements you can make backward for over the created array and count back from K to 0 every time when you have different element then 0. If you have for example 2 you have to count this number 2 times.
The limitation of this approach is that it works only with integers because of the nature of the array...
Also the representation of int in java is -2147483648 to 2147483647 which mean that in the array that need to be created only the positive numbers can be placed.
NOTE: if you know that there is max number of the int then you can lower the created array size with that max number. For example if the max int is 1000 then your array which you need to create is with size 1000 and then this algorithm should perform very fast.
I think you misunderstood what you needed to sort.
You need to keep the K-sized list sorted, you don't need to sort the original N-sized input array. That way the time complexity would be O(N * log(K)) in the worst case (assuming you need to update the K-sized list almost every time).
The requirements said that N was very large, but K is much smaller, so O(N * log(K)) is also smaller than O(N * log(N)).
You only need to update the K-sized list for each record that is larger than the K-th largest element before it. For a randomly distributed list with N much larger than K, that will be negligible, so the time complexity will be closer to O(N).
For the K-sized list, you can take a look at the implementation of Is there a PriorityQueue implementation with fixed capacity and custom comparator? , which uses a PriorityQueue with some additional logic around it.
There is an algorithm to do this in worst-case time complexity O(n*log(k)) with very benign time constants (since there is just one pass through the original array, and the inner part that contributes to the log(k) is only accessed relatively seldomly if the input data is well-behaved).
Initialize a priority queue implemented with a binary heap A of maximum size k (internally using an array for storage). In the worst case, this has O(log(k)) for inserting, deleting and searching/manipulating the minimum element (in fact, retrieving the minimum is O(1)).
Iterate through the original unsorted array, and for each value v:
If A is not yet full then
insert v into A,
else, if v>min(A) then (*)
insert v into A,
remove the lowest value from A.
(*) Note that A can return repeated values if some of the highest k values occur repeatedly in the source set. You can avoid that by a search operation to make sure that v is not yet in A. You'd also want to find a suitable data structure for that (as the priority queue has linear complexity), i.e. a secondary hash table or balanced binary search tree or something like that, both of which are available in java.util.
The java.util.PriorityQueue helpfully guarantees the time complexity of its operations:
this implementation provides O(log(n)) time for the enqueing and dequeing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size).
Note that as laid out above, we only ever remove the lowest (first) element from A, so we enjoy the O(log(k)) for that. If you want to avoid duplicates as mentioned above, then you also need to search for any new value added to it (with O(k)), which opens you up to a worst-case overall scenario of O(n*k) instead of O(n*log(k)) in case of a pre-sorted input array, where every single element v causes the inner loop to fire.

algorithm - sort most repeated number first (descending order)

I have a list of numbers, for example:
10 4
5 3
7 1
-2 2
first line means the number 10 repeats 4 times, the second line means the number 5 repeats thrice and so on. The objective is to sort these numbers, the most repeated first in descending order. I think using Hashmap to record the data and then feeding it to treeset and sort by value would be the most efficient way -> O(n log n), but is there a more efficient way? I've heard this problem is solved with max-heap, but I dont think heap can do better than O(n log n).
I think with bucket-style sorting, O(N) complexity is possible. At least in theory. But it comes with additional cost in terms of memory for the buckets. That may make the approach intractable in practice.
The buckets are HashSets. Each bucket holds all numbers with the same count. For fast access, we keep the buckets in an ArrayList, each bucket at the index position of its count.
Like the OP, we use a HashMap to associate numbers with counters. When a number arrives, we increment the counter and move the number from the bucket of the old count to the bucket of the new count. That keeps the numbers sorted at all times.
Each arriving number takes O(1) to process, so all take O(N).
You can get a sorted list of Map.Entry<Integer,Integer> like this:
List<Map.Entry<Integer,Integer>> entries = map.entrySet()
.stream()
.sorted((a,b)->a.getValue().compareTo(b.getValue()))
.collect(Collectors.toList());
It should be in O(n*log(n)).
(I Replaced my original answer.)
GeekForGeeks has example code for sorting a HashMap by value.
Since a HashMap can't be sorted, and the order of elements in a LinkedHashMap can't be changed unless they are removed and re-entered in a different order, the code copies the contents into a List, then sorts the list. The author(s) want to retain the data in a HashMap, so the sorted list is then copied to a new LinkedHashMap.
For the O/P's purposes, <Integer, Integer> would be used where the example has <String, Integer>. Also, since the desired order is descending, o2 and o1 would be reversed in the return statement of the comparator:
return (o2.getValue()).compareTo(o1.getValue());
There are mathematical proofs that no comparison sort can run in less than O(n log n).
Assuming the number is unique,
Why sort? just loop through every key save max occurence, and you’ll get it in O(n)
maxNum = 0
maxValue = -1
loop numbers:
if (numbers[i].value > maxValue) :
maxValue = numbers[i].value
maxNum = numbers[i].key
return maxNum;

Find the only unique element in an array of a million elements

I was asked this question in a recent interview.
You are given an array that has a million elements. All the elements are duplicates except one. My task is to find the unique element.
var arr = [3, 4, 3, 2, 2, 6, 7, 2, 3........]
My approach was to go through the entire array in a for loop, and then create a map with index as the number in the array and the value as the frequency of the number occurring in the array. Then loop through our map again and return the index that has value of 1.
I said my approach would take O(n) time complexity. The interviewer told me to optimize it in less than O(n) complexity. I said that we cannot, as we have to go through the entire array with a million elements.
Finally, he didn't seem satisfied and moved onto the next question.
I understand going through million elements in the array is expensive, but how could we find a unique element without doing a linear scan of the entire array?
PS: the array is not sorted.
I'm certain that you can't solve this problem without going through the whole array, at least if you don't have any additional information (like the elements being sorted and restricted to certain values), so the problem has a minimum time complexity of O(n). You can, however, reduce the memory complexity to O(1) with a XOR-based solution, if every element is in the array an even number of times, which seems to be the most common variant of the problem, if that's of any interest to you:
int unique(int[] array)
{
int unpaired = array[0];
for(int i = 1; i < array.length; i++)
unpaired = unpaired ^ array[i];
return unpaired;
}
Basically, every XORed element cancels out with the other one, so your result is the only element that didn't cancel out.
Assuming the array is un-ordered, you can't. Every value is mutually exclusive to the next so nothing can be deduced about a value from any of the other values?
If it's an ordered array of values, then that's another matter and depends entirely on the ordering used.
I agree the easiest way is to have another container and store the frequency of the values.
In fact, since the number of elements in the array was fix, you could do much better than what you have proposed.
By "creating a map with index as the number in the array and the value as the frequency of the number occurring in the array", you create a map with 2^32 positions (assuming the array had 32-bit integers), and then you have to pass though that map to find the first position whose value is one. It means that you are using a large auxiliary space and in the worst case you are doing about 10^6+2^32 operations (one million to create the map and 2^32 to find the element).
Instead of doing so, you could sort the array with some n*log(n) algorithm and then search for the element in the sorted array, because in your case, n = 10^6.
For instance, using the merge sort, you would use a much smaller auxiliary space (just an array of 10^6 integers) and would do about (10^6)*log(10^6)+10^6 operations to sort and then find the element, which is approximately 21*10^6 (many many times smaller than 10^6+2^32).
PS: sorting the array decreases the search from a quadratic to a linear cost, because with a sorted array we just have to access the adjacent positions to check if a current position is unique or not.
Your approach seems fine. It could be that he was looking for an edge-case where the array is of even size, meaning there is either no unmatched elements or there are two or more. He just went about asking it the wrong way.

Big-O of finding common elements in sorted and unsorted arrays

Having a difficult time grasping big O when it comes to printing out the common elements in two arrays. On of size a, the other of size b.
I know two unsorted arrays is O(ab)
But what about when a is unsorted and b is sorted?
When both are sorted?
Any explanation would be great.
An efficient way to do this is to use a hash table (HashMap in Java). An algorithm is:
foreach element of the smallest array (1) O(N)
add element to the hash table (2) O(1)
foreach element of the biggest array (3) O(M)
if element is in the hash table (4) O(1)
print element
The time complexity of every step is annotated in the pseudo code above.
Iterating over the elements of an array of size X has a time complexity of O(X) and a constant memory complexity (the size of an element)
Inserting an element in a hash table has a constant time and space complexity
Checking if an element is in a hash table has constant time and space complexity
So the overall time complexity is O(N + M), and the space complexity is O(N)
Note that
this complexity is not influenced by whether the arrays are sorted or not
this algorithm works only if your smallest array can fit in memory
it is better to build the hash table using the smallest array, because it will reduce in a smaller hash table in memory
Hope that helps!
I'm assuming that you want to use $O(1)$ additional memory, because otherwise, you could solve either problem (sorted or unsorted) in $O(a + b)$ by using a hash table. Also, I'm going to assume the elements in each array are unique, but the algorithms can easily be mended if not.
If both are unsorted, you can in fact improve on your approach by simply sorting one of the arrays, then using the approach below. This gives $O((a+b)\min(\lg a, \lg b)$.
If a is unsorted and b is sorted, you can iterate for all elements in a, and check if they're in b by using binary search. This gives an $O(b + a\lg b)$ approach.
If both are sorted, you can easily get $O(a + b)$ by using a simple two-pointer algorithm (successively increment indexes on both arrays).

Find nearest number in unordered array

Given a large unordered array of long random numbers and a target long, what's the most efficient algorithm for finding the closest number?
#Test
public void findNearest() throws Exception {
final long[] numbers = {90L, 10L, 30L, 50L, 70L};
Assert.assertEquals("nearest", 10L, findNearest(numbers, 12L));
}
Iterate through the array of longs once. Store the current closest number and the distance to that number. Continue checking each number if it is closer, and just replace the current closest number when you encounter a closer number.
This gets you best performance of O(n).
Building a binary tree as suggested by other answerer will take O(nlogn). Of course future search will only take O(logn)...so it may be worth it if you do a lot of searches.
If you are pro, you can parallelize this with openmp or thread library, but I am guessing that is out of the scope of your question.
If you do not intend to do multiple such requests on the array there is no better way then the brute force linear time check of each number.
If you will do multiple requests on the same array first sort it and then do a binary search on it - this will reduce the time for such requests to O(log(n)) but you still pay the O(n*log(n)) for the sort so this is only reasonable if the number of requests is reasonably large i.e. k*n >>(a lot bigger then) n*log(n) + k* log(n) where k is the number of requests.
If the array will change, then create a binary search tree and do a lower bound request on it. This again is only reasonable if the nearest number request is relatively large with comparison to array change requests and also to the number of elements. As the cost of building the tree is O(n*log(n)) and also the cost of updating it is O(logn) you need to have k*log(n) + n*log(n) + k*log(n) <<(a lot smaller then) k*n
IMHO, I think that you should use a Binary Heap (http://en.wikipedia.org/wiki/Binary_heap) which has the insertion time of O(log n), being O(n log n) for the entire array. For me, the coolest thing about the binary heap is that it can be made inside from your own array, without overhead. Take a look the heapfy section.
"Heapfying" your array turns possible to get the bigger/lower element in O(1).
if you build a binary search tree from your numbers and search against. O(log n) would be the complexity in worst case. In your case you won't search for equality instead, you'll looking for the smallest return value through subtraction
I would check the difference between the numbers while iterating through the array and save the min value for that difference.
If you plan to use findNearest multiple times I would calculate the difference while sorting (with an sorting algorithm of complexity n*log(n)) after each change of values in that array
The time complex to do this job is O(n), the length of the numbers.
final long[] numbers = {90L, 10L, 30L, 50L, 70L};
long tofind = 12L;
long delta = Long.MAX_VALUE;
int index = -1;
int i = 0;
while(i < numbers.length){
Long tmp = Math.abs(tofind - numbers[i]);
if(tmp < delta){
delta = tmp;
index = i;
}
i++;
}
System.out.println(numbers[index]); //if index is not -1
But if you want to find many times with different values such as 12L against the same numbers array, you may sort the array first and binary search against the sorted numbers array.
If your search is a one-off, you can partition the array like in quicksort, using the input value as pivot.
If you keep track - while partitioning - of the min item in the right half, and the max item in the left half, you should have it in O(n) and 1 single pass over the array.
I'd say it's not possible to do it in less than O(n) since it's not sorted and you have to scan the input at the very least.
If you need to do many subsequent search, then a BST could help indeed.
You could do it in below steps
Step 1 : Sort array
Step 2 : Find index of the search element
Step 3 : Based on the index, display the number that are at the Right & Left Side
Let me know incase of any queries...

Categories