Would it be efficient to sort with indexes - java

So I was thinking of a new sorting algorithm that might be efficient but I am not too sure about that.
1) Imagine we have an array a of only positive numbers.
2) We go through the array and find the biggest number n.
3) We create a new array b of the size n+1.
4) We go through every entry in the unsorted array and increase the value in the second array at the index of the number of the unsorted array we are looking at by one. (In pseudo-code this means: b[a[i]]++; while a[i] is the number we are currently looking at)
5) Once we have done this with every element in a, the array b stores at every index the exact amount of numbers of this index. (For example: b[0] = 3 means that we had 3 zeros in the initial array a)
6) We go through the whole array b and skip all the empty fields and create a new List or Array out of it.
So I can imagine that this algorithm can be very fast and efficient for smaller numbers only since at the end we have to go through the whole array b to build the sorted one, which is going to be really time consuming.
If we for example have an array a = {1000, 1} it would still check 1001 elements in array b wether or not they are 0 even though we only have 2 elements in the initial array.
With smaller numbers however we should almost get a O(n) result? I am not too sure about that and that's why I am asking you. Maybe I am even missing something really important. Thanks for your help in advance :)

Congratulations on independently re-discovering the counting sort.
This is indeed a very good sorting strategy for situations when the range is limited, and the number of items is significantly greater than the number of items in your array.
In situations when the range is greater than the number of items in the array a traditional sorting algorithm would give you better performance.
Algorithms of this kind are called pseudo-polynomial.

we should almost get a O(n) result
You get O(N+M) result, when M - max number in first array. Plus, you spend O(M) memory, so it have sense only if M is small. See counting sort

Related

Best way to retrieve K largest elements from large unsorted arrays?

I recently had a coding test during an interview. I was told:
There is a large unsorted array of one million ints. User wants to retrieve K largest elements. What algorithm would you implement?
During this, I was strongly hinted that I needed to sort the array.
So, I suggested to use built-in sort() or maybe a custom implementation if performance really mattered. I was then told that using a Collection or array to store the k largest and for-loop it is possible to achieve approximately O(N), in hindsight, I think it's O(N*k) because each iteration needs to compare to the K sized array to find the smallest element to replace, while the need to sort the array would cause the code to be at least O(N log N).
I then reviewed this link on SO that suggests priority queue of K numbers, removing the smallest number every time a larger element is found, which would also give O(N log N). Write a program to find 100 largest numbers out of an array of 1 billion numbers
Is the for-loop method bad? How should I justify pros/cons of using the for-loop or the priorityqueue/sorting methods? I'm thinking that if the array is already sorted, it could help by not needing to iterate through the whole array again, i.e. if some other method of retrieval is called on the sorted array, it should be constant time. Is there some performance factor when running the actual code that I didn't consider when theorizing pseudocode?
Another way of solving this is using Quickselect. This should give you a total average time complexity of O(n). Consider this:
Find the kth largest number x using Quickselect (O(n))
Iterate through the array again (or just through the right-side partition) (O(n)) and save all elements ≥ x
Return your saved elements
(If there are repeated elements, you can avoid them by keeping count of how many duplicates of x you need to add to the result.)
The difference between your problem and the one in the SO question you linked to is that you have only one million elements, so they can definitely be kept in memory to allow normal use of Quickselect.
There is a large unsorted array of one million ints. The user wants to retrieve the K largest elements.
During this, I was strongly hinted that I needed to sort the array.
So, I suggested using a built-in sort() or maybe a custom
implementation
That wasn't really a hint I guess, but rather a sort of trick to deceive you (to test how strong your knowledge is).
If you choose to approach the problem by sorting the whole source array using the built-in Dual-Pivot Quicksort, you can't obtain time complexity better than O(n log n).
Instead, we can maintain a PriorytyQueue which would store the result. And while iterating over the source array for each element we need to check whether the queue has reached the size K, if not the element should be added to the queue, otherwise (is size equals to K) we need to compare the next element against the lowest element in the queue - if the next element is smaller or equal we should ignore it if it is greater the lowest element has to be removed and the new element needs to be added.
The time complexity of this approach would be O(n log k) because adding a new element into the PriorytyQueue of size k costs O(k) and in the worst-case scenario this operation can be performed n times (because we're iterating over the array of size n).
Note that the best case time complexity would be Ω(n), i.e. linear.
So the difference between sorting and using a PriorytyQueue in terms of Big O boils down to the difference between O(n log n) and O(n log k). When k is much smaller than n this approach would give a significant performance gain.
Here's an implementation:
public static int[] getHighestK(int[] arr, int k) {
Queue<Integer> queue = new PriorityQueue<>();
for (int next: arr) {
if (queue.size() == k && queue.peek() < next) queue.remove();
if (queue.size() < k) queue.add(next);
}
return toIntArray(queue);
}
public static int[] toIntArray(Collection<Integer> source) {
return source.stream().mapToInt(Integer::intValue).toArray();
}
main()
public static void main(String[] args) {
System.out.println(Arrays.toString(getHighestK(new int[]{3, -1, 3, 12, 7, 8, -5, 9, 27}, 3)));
}
Output:
[9, 12, 27]
Sorting in O(n)
We can achieve worst case time complexity of O(n) when there are some constraints regarding the contents of the given array. Let's say it contains only numbers in the range [-1000,1000] (sure, you haven't been told that, but it's always good to clarify the problem requirements during the interview).
In this case, we can use Counting sort which has linear time complexity. Or better, just build a histogram (first step of Counting Sort) and look at the highest-valued buckets until you've seen K counts. (i.e. don't actually expand back to a fully sorted array, just expand counts back into the top K sorted elements.) Creating a histogram is only efficient if the array of counts (possible input values) is smaller than the size of the input array.
Another possibility is when the given array is partially sorted, consisting of several sorted chunks. In this case, we can use Timsort which is good at finding sorted runs. It will deal with them in a linear time.
And Timsort is already implemented in Java, it's used to sort objects (not primitives). So we can take advantage of the well-optimized and thoroughly tested implementation instead of writing our own, which is great. But since we are given an array of primitives, using built-in Timsort would have an additional cost - we need to copy the contents of the array into a list (or array) of wrapper type.
This is a classic problem that can be solved with so-called heapselect, a simple variation on heapsort. It also can be solved with quickselect, but like quicksort has poor quadratic worst-case time complexity.
Simply keep a priority queue, implemented as binary heap, of size k of the k smallest values. Walk through the array, and insert values into the heap (worst case O(log k)). When the priority queue is too large, delete the minimum value at the root (worst case O(log k)). After going through the n array elements, you have removed the n-k smallest elements, so the k largest elements remain. It's easy to see the worst-case time complexity is O(n log k), which is faster than O(n log n) at the cost of only O(k) space for the heap.
Here is one idea. I will think for creating array (int) with max size (2147483647) as it is max value of int (2147483647). Then for every number in for-each that I get from the original array just put the same index (as the number) +1 inside the empty array that I created.
So in the end of this for each I will have something like [1,0,2,0,3] (array that I created) which represent numbers [0, 2, 2, 4, 4, 4] (initial array).
So to find the K biggest elements you can make backward for over the created array and count back from K to 0 every time when you have different element then 0. If you have for example 2 you have to count this number 2 times.
The limitation of this approach is that it works only with integers because of the nature of the array...
Also the representation of int in java is -2147483648 to 2147483647 which mean that in the array that need to be created only the positive numbers can be placed.
NOTE: if you know that there is max number of the int then you can lower the created array size with that max number. For example if the max int is 1000 then your array which you need to create is with size 1000 and then this algorithm should perform very fast.
I think you misunderstood what you needed to sort.
You need to keep the K-sized list sorted, you don't need to sort the original N-sized input array. That way the time complexity would be O(N * log(K)) in the worst case (assuming you need to update the K-sized list almost every time).
The requirements said that N was very large, but K is much smaller, so O(N * log(K)) is also smaller than O(N * log(N)).
You only need to update the K-sized list for each record that is larger than the K-th largest element before it. For a randomly distributed list with N much larger than K, that will be negligible, so the time complexity will be closer to O(N).
For the K-sized list, you can take a look at the implementation of Is there a PriorityQueue implementation with fixed capacity and custom comparator? , which uses a PriorityQueue with some additional logic around it.
There is an algorithm to do this in worst-case time complexity O(n*log(k)) with very benign time constants (since there is just one pass through the original array, and the inner part that contributes to the log(k) is only accessed relatively seldomly if the input data is well-behaved).
Initialize a priority queue implemented with a binary heap A of maximum size k (internally using an array for storage). In the worst case, this has O(log(k)) for inserting, deleting and searching/manipulating the minimum element (in fact, retrieving the minimum is O(1)).
Iterate through the original unsorted array, and for each value v:
If A is not yet full then
insert v into A,
else, if v>min(A) then (*)
insert v into A,
remove the lowest value from A.
(*) Note that A can return repeated values if some of the highest k values occur repeatedly in the source set. You can avoid that by a search operation to make sure that v is not yet in A. You'd also want to find a suitable data structure for that (as the priority queue has linear complexity), i.e. a secondary hash table or balanced binary search tree or something like that, both of which are available in java.util.
The java.util.PriorityQueue helpfully guarantees the time complexity of its operations:
this implementation provides O(log(n)) time for the enqueing and dequeing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size).
Note that as laid out above, we only ever remove the lowest (first) element from A, so we enjoy the O(log(k)) for that. If you want to avoid duplicates as mentioned above, then you also need to search for any new value added to it (with O(k)), which opens you up to a worst-case overall scenario of O(n*k) instead of O(n*log(k)) in case of a pre-sorted input array, where every single element v causes the inner loop to fire.

Length of list pattern

I have a list where I am trying to find the sum of combination of the lists entries, except the entries where both values to add are equal to each other (ie 2+2 would not be added) and add them to another list.
As an example:
[1,2,3] would yield the list of sums [3,4,5] because 1+2=5,1+3=4, and 2+3=5
However, my issues arises with not knowing how many sums will be produced. I am working in java and am limited to native arrays, therefore the size of the array has to be set before I can add the sum values to it.
I know I would not be able to find the exact size of the sum list due to the possibility that a sum would not get added if the two elements are the same, but I am trying to ballpark it so I don't have massive arrays.
The closest 'formula' I have gotten is setting the following, but it is never precisely what the max value would be for any list
(list length of original numbers * list length of original numbers) / 2
I am trying to keep time complexity in mind, so keeping a running count of how many sums there are, setting an array to that size, and looping through the original list again would not be efficient.
Any suggestions?
Can you add same sums to array, I mean, your array is {1,2,3,4,5}. Would you print the both result of 1+5 and 2+4 =6.
If your answer is yes. You can get the length of array and multiply it with 1 less and divide them to 2. For instance; our array → {1,2,3,4,5} the lenght is 5 the length of our result array will be 5*4/2=10.
Or you can use lists in java if you cant define a length for array. Keep in mind.

merging and sorting 2 sorted arrays with big o looking for clarification.

This is a school work. I am not looking for code help but since my teacher isn't helping I came here.
I am asked to merge and sort two sorted arrays following two cases:
When sizes of the two arrays are equal
When sizes of the two arrays are different
Now I have done case 2 which also does case 1 :/ I just don't get it how I could write a code for case 1 or how it could differ from case 2. array length doesn't connect with the problem or I am not understanding correctly.
Then I am asked to compute big(o).
I am not looking for code here. If anyone by any chance understands what my teacher is asking really please give me hints to solve it.
It is very good to learn instead of copying.
as you suggest, there is no difference between case 1 and 2 but the worst case of algorithms depend on your solution. So I describe my solution (no code) and give you its worst case.
You can in both case, the arrays must ends with infinity so add infinity to them. then iterate over all of elements of each array, at each time, pick the one which is smaller and put in in your result array (merge of tow arrays).
With this solution, you can calculate worst case easily. we must iterate both of arrays once, and we add a infinity to both of them, if their length is n and m so our worst and best case is O(m + n) (you do m + n + 2 - 1 comparison and -1 because you don't compare the end of both array, I mean infinity)
but why adding infinity add the end of array? because for that we must make a copy of array with one more space? it is one way and worst case of that is O(m + n) for copying arrays too. but there is another solution too. you can compare until you get at the end of array, then you must add the rest of array which is not compared completely to end of your result array. but with infinity, it is automatic.
I hope helped you. if there is something wrong, comment it.
Merging two sorted arrays is a linear complexity operation. This means in terms of Big-O notation it is O(m+n) where m and n are lengths of two sorted arrays.
So when you say the array length doesn't connect with the problem your understanding is correct. Irrespective of the lengths of two sorted arrays the merging of these arrays involves taking elements from each sorted array and comparing them and copying the one to new array(depending whether you want the merged sorted array in ascending or descending order) and incrementing the counter of the array from which you copied the element to new sorted array.
Another way to approach this question is to look at each array as having a head and a tail, and solving the problem recursively. This way, we can use a base case, two arrays of size 1, to sort through the entirety of the two arrays m and n. Since both arrays are already sorted, simply compare the two heads of each array and add the element that comes first to your newly-created merged array, and move to the next element in that array. Your function will call itself again after adding the element. This will keep happening until one of the two arrays is empty. Now, you can simply add what is left of the nonempty array to the end of your merged array, and you are done.
I'm not sure if your professor will allow you to use recursive calls, but this method could make the coding much easier. Runtime would still be O(m+n), as you are basically iterating through both arrays once.
Hope this helps.

Find the only unique element in an array of a million elements

I was asked this question in a recent interview.
You are given an array that has a million elements. All the elements are duplicates except one. My task is to find the unique element.
var arr = [3, 4, 3, 2, 2, 6, 7, 2, 3........]
My approach was to go through the entire array in a for loop, and then create a map with index as the number in the array and the value as the frequency of the number occurring in the array. Then loop through our map again and return the index that has value of 1.
I said my approach would take O(n) time complexity. The interviewer told me to optimize it in less than O(n) complexity. I said that we cannot, as we have to go through the entire array with a million elements.
Finally, he didn't seem satisfied and moved onto the next question.
I understand going through million elements in the array is expensive, but how could we find a unique element without doing a linear scan of the entire array?
PS: the array is not sorted.
I'm certain that you can't solve this problem without going through the whole array, at least if you don't have any additional information (like the elements being sorted and restricted to certain values), so the problem has a minimum time complexity of O(n). You can, however, reduce the memory complexity to O(1) with a XOR-based solution, if every element is in the array an even number of times, which seems to be the most common variant of the problem, if that's of any interest to you:
int unique(int[] array)
{
int unpaired = array[0];
for(int i = 1; i < array.length; i++)
unpaired = unpaired ^ array[i];
return unpaired;
}
Basically, every XORed element cancels out with the other one, so your result is the only element that didn't cancel out.
Assuming the array is un-ordered, you can't. Every value is mutually exclusive to the next so nothing can be deduced about a value from any of the other values?
If it's an ordered array of values, then that's another matter and depends entirely on the ordering used.
I agree the easiest way is to have another container and store the frequency of the values.
In fact, since the number of elements in the array was fix, you could do much better than what you have proposed.
By "creating a map with index as the number in the array and the value as the frequency of the number occurring in the array", you create a map with 2^32 positions (assuming the array had 32-bit integers), and then you have to pass though that map to find the first position whose value is one. It means that you are using a large auxiliary space and in the worst case you are doing about 10^6+2^32 operations (one million to create the map and 2^32 to find the element).
Instead of doing so, you could sort the array with some n*log(n) algorithm and then search for the element in the sorted array, because in your case, n = 10^6.
For instance, using the merge sort, you would use a much smaller auxiliary space (just an array of 10^6 integers) and would do about (10^6)*log(10^6)+10^6 operations to sort and then find the element, which is approximately 21*10^6 (many many times smaller than 10^6+2^32).
PS: sorting the array decreases the search from a quadratic to a linear cost, because with a sorted array we just have to access the adjacent positions to check if a current position is unique or not.
Your approach seems fine. It could be that he was looking for an edge-case where the array is of even size, meaning there is either no unmatched elements or there are two or more. He just went about asking it the wrong way.

how to find the maximum subsequence sum in a circular linked list

I am aware of the maximum subarray sum problem and its O(n) algorithm. This questions modifies that problem by using a circular linked list:
Find the sequence of numbers in a circular linked list with maximum sum.
Now what if sum of all entries is zero?
To me, the only approach is to modify the array solution and have the algorithm loop around and start over at the beginning at the list once the first iteration is done. Then do the same thing for up to 2 times the entire list and find the max. The down side is that there might be many very tricky to handle if I do it this way, for example, if the list looks like:
2 - 2 - 2 - 2 back to front
Then it's very tricky to not include the same element twice
Is there a better algorithm?
Thanks!!
First of all, it doesn't matter if the datastructure is a linked list or an array, so I will use array for simplicity.
I don't really understand your algorithm, but it seems that you are going to do something like duplicate the array at the back of the original, and run the Kadane's algorithm on this doubled-array. This is a wrong approach, and a counter example has been given by #RanaldLam.
To solve it, we need to discuss it in three cases:
All negative. In this case, the maximum of the array is the answer, and an O(N) scan will do the job;
The maximum sub-array does not require a wrapping, for example a = {-1, 1, 2, -3}. In this case, a normal Kadane's algorithm will do the job, time complexity O(N);
The maximum sub-array requires a wrapping, for example a = {1, -10, 1}. Actually, this case implies another fact: since elements inside the maximum sub-array requires a wrapping, then the elements that are not inside the maximum sub-array does not require a wrapping. Therefore, as long as we know the sum of these non-contributing elements, we can calculate the correct sum of contributing elements by subtracting max_non_contributing_sum from total sum of the array.
But how to calculate max_non_contributing_sum in case 3? This is a bit tricky: since these non-contributing elements do not require wrapping, so we can simply invert the sign of every elements and run Kadane's algorithm on this inverted array, which requires O(N).
Finally, we should compare the sum of non-wrapping (case 2) and wrapping (case 3), the answer should be the bigger one.
As a summary, all cases require O(N), thus the total complexity of the algorithm is O(N).
Your absolutly right. There is no better algorithm.

Categories