let's say I have an array of the size n,I want to divine it to k new arrays of the size n/k.-what's the running time of this step may be?****I thought since when we split an array to 2 we look at it like 2^x=n =>x=log N => O(log n) then it works the same here too: k^(n/k)=n => n/k=log N ****but what's next?
now I run the bubble sort algorithm on each of the k arrays-O(n^2) and I use a merge algorithm on all the k arrays to make a sorted array of the size n-let's say the merge complexity is O(kn).
In addition I wan't to find a K so that I could minimizing the runtime of the algorithm,how can I do it?I thought taking derivative of the runtime function and finding it's minimum will do,it is the right way?
Merge sort splits the array into successively smaller pieces until it gets down to a bunch of 2-element subarrays. Then it begins to apply the merge algorithm on successively larger subarrays.
Imagine you have an array of 16 elements. The merge sort does its merges like this:
8 merges of two 1-item subarrays
4 merges of two 2-item subarrays
2 merges of two 4-item subarrays
1 merge of two 8-item subarrays
There are four (log2(16)) passes, and in each pass it examines every item. Each pass is O(n). So the running time for this merge sort is O(n * log2(n)).
Now, imagine you have an array with 81 items, and you want to merge it using a 3-way merge sort. Now you have the following sequence of merges:
27 merges of three 1-item subarrays (gives 27 3-item subarrays)
9 merges of three 3-item subarrays (gives 9 9-item subarrays)
3 merges of three 9-item subarrays (gives 3 27-item subarrays)
1 merge of three 27-item subarrays
There are four (log3(81)) passes. Each merge is O(m * log2(k)), where m is the total number of items to be merged, and k is the number of lists. So the first pass has 27 merges that do 3*log2(3) comparisons. The next pass has 9 merges that do 9*log2(3) comparisons, etc. It ends up that the total merge is O(n * log3(n) * log2(3))
You can see that the 3-way merge sort lets you do fewer passes (a 3-way merge sort of 16 items would require only three passes), but each pass is a little more expensive. What you have to determine is if:
n * logk(n) * log2(k) < n * log2(n)
Where k is the number of subarrays you want to split the array into. I'll let you do that math.
You have to be careful, though, because asymptotic analysis doesn't take into account real-world effects. For example, a 2-way merge is incredibly simple. When you go to a k-way merge where k > 2, you end up having to use a heap or other priority queue data structure, which has quite a bit of overhead. So even if the math above tells you that a 3-way merge sort should be faster, you'll want to benchmark it against the standard 2-way merge.
Update
You're right. If you simplify the equation, you end up with the equations being the same. So the computational complexity is the same regardless of the value of k.
That makes sense, because if k = x, then you end up with a heap sort.
So then you have to determine if there's a point where the merge overhead, which increases as k increases, is offset by the decreased number of passes. You'll probably need to determine that empirically.
Traditionally we use mergesort for external sorting algorithms, and the answer to this question has been dominated by one fact. A mergesort requires streaming data from multiple files and writing to a single one. The bottleneck is in the streaming, and not in CPU. If you are trying to stream from too many locations on a disk at once, the disk breaks down and starts to do random seeks. Your throughput on random seeks sucks.
The right answer on your hardware will vary (and especially if you are using SSD drives), but traditional Unix sort settled on a 16 way merge as a reasonable default.
Related
I recently had a coding test during an interview. I was told:
There is a large unsorted array of one million ints. User wants to retrieve K largest elements. What algorithm would you implement?
During this, I was strongly hinted that I needed to sort the array.
So, I suggested to use built-in sort() or maybe a custom implementation if performance really mattered. I was then told that using a Collection or array to store the k largest and for-loop it is possible to achieve approximately O(N), in hindsight, I think it's O(N*k) because each iteration needs to compare to the K sized array to find the smallest element to replace, while the need to sort the array would cause the code to be at least O(N log N).
I then reviewed this link on SO that suggests priority queue of K numbers, removing the smallest number every time a larger element is found, which would also give O(N log N). Write a program to find 100 largest numbers out of an array of 1 billion numbers
Is the for-loop method bad? How should I justify pros/cons of using the for-loop or the priorityqueue/sorting methods? I'm thinking that if the array is already sorted, it could help by not needing to iterate through the whole array again, i.e. if some other method of retrieval is called on the sorted array, it should be constant time. Is there some performance factor when running the actual code that I didn't consider when theorizing pseudocode?
Another way of solving this is using Quickselect. This should give you a total average time complexity of O(n). Consider this:
Find the kth largest number x using Quickselect (O(n))
Iterate through the array again (or just through the right-side partition) (O(n)) and save all elements ≥ x
Return your saved elements
(If there are repeated elements, you can avoid them by keeping count of how many duplicates of x you need to add to the result.)
The difference between your problem and the one in the SO question you linked to is that you have only one million elements, so they can definitely be kept in memory to allow normal use of Quickselect.
There is a large unsorted array of one million ints. The user wants to retrieve the K largest elements.
During this, I was strongly hinted that I needed to sort the array.
So, I suggested using a built-in sort() or maybe a custom
implementation
That wasn't really a hint I guess, but rather a sort of trick to deceive you (to test how strong your knowledge is).
If you choose to approach the problem by sorting the whole source array using the built-in Dual-Pivot Quicksort, you can't obtain time complexity better than O(n log n).
Instead, we can maintain a PriorytyQueue which would store the result. And while iterating over the source array for each element we need to check whether the queue has reached the size K, if not the element should be added to the queue, otherwise (is size equals to K) we need to compare the next element against the lowest element in the queue - if the next element is smaller or equal we should ignore it if it is greater the lowest element has to be removed and the new element needs to be added.
The time complexity of this approach would be O(n log k) because adding a new element into the PriorytyQueue of size k costs O(k) and in the worst-case scenario this operation can be performed n times (because we're iterating over the array of size n).
Note that the best case time complexity would be Ω(n), i.e. linear.
So the difference between sorting and using a PriorytyQueue in terms of Big O boils down to the difference between O(n log n) and O(n log k). When k is much smaller than n this approach would give a significant performance gain.
Here's an implementation:
public static int[] getHighestK(int[] arr, int k) {
Queue<Integer> queue = new PriorityQueue<>();
for (int next: arr) {
if (queue.size() == k && queue.peek() < next) queue.remove();
if (queue.size() < k) queue.add(next);
}
return toIntArray(queue);
}
public static int[] toIntArray(Collection<Integer> source) {
return source.stream().mapToInt(Integer::intValue).toArray();
}
main()
public static void main(String[] args) {
System.out.println(Arrays.toString(getHighestK(new int[]{3, -1, 3, 12, 7, 8, -5, 9, 27}, 3)));
}
Output:
[9, 12, 27]
Sorting in O(n)
We can achieve worst case time complexity of O(n) when there are some constraints regarding the contents of the given array. Let's say it contains only numbers in the range [-1000,1000] (sure, you haven't been told that, but it's always good to clarify the problem requirements during the interview).
In this case, we can use Counting sort which has linear time complexity. Or better, just build a histogram (first step of Counting Sort) and look at the highest-valued buckets until you've seen K counts. (i.e. don't actually expand back to a fully sorted array, just expand counts back into the top K sorted elements.) Creating a histogram is only efficient if the array of counts (possible input values) is smaller than the size of the input array.
Another possibility is when the given array is partially sorted, consisting of several sorted chunks. In this case, we can use Timsort which is good at finding sorted runs. It will deal with them in a linear time.
And Timsort is already implemented in Java, it's used to sort objects (not primitives). So we can take advantage of the well-optimized and thoroughly tested implementation instead of writing our own, which is great. But since we are given an array of primitives, using built-in Timsort would have an additional cost - we need to copy the contents of the array into a list (or array) of wrapper type.
This is a classic problem that can be solved with so-called heapselect, a simple variation on heapsort. It also can be solved with quickselect, but like quicksort has poor quadratic worst-case time complexity.
Simply keep a priority queue, implemented as binary heap, of size k of the k smallest values. Walk through the array, and insert values into the heap (worst case O(log k)). When the priority queue is too large, delete the minimum value at the root (worst case O(log k)). After going through the n array elements, you have removed the n-k smallest elements, so the k largest elements remain. It's easy to see the worst-case time complexity is O(n log k), which is faster than O(n log n) at the cost of only O(k) space for the heap.
Here is one idea. I will think for creating array (int) with max size (2147483647) as it is max value of int (2147483647). Then for every number in for-each that I get from the original array just put the same index (as the number) +1 inside the empty array that I created.
So in the end of this for each I will have something like [1,0,2,0,3] (array that I created) which represent numbers [0, 2, 2, 4, 4, 4] (initial array).
So to find the K biggest elements you can make backward for over the created array and count back from K to 0 every time when you have different element then 0. If you have for example 2 you have to count this number 2 times.
The limitation of this approach is that it works only with integers because of the nature of the array...
Also the representation of int in java is -2147483648 to 2147483647 which mean that in the array that need to be created only the positive numbers can be placed.
NOTE: if you know that there is max number of the int then you can lower the created array size with that max number. For example if the max int is 1000 then your array which you need to create is with size 1000 and then this algorithm should perform very fast.
I think you misunderstood what you needed to sort.
You need to keep the K-sized list sorted, you don't need to sort the original N-sized input array. That way the time complexity would be O(N * log(K)) in the worst case (assuming you need to update the K-sized list almost every time).
The requirements said that N was very large, but K is much smaller, so O(N * log(K)) is also smaller than O(N * log(N)).
You only need to update the K-sized list for each record that is larger than the K-th largest element before it. For a randomly distributed list with N much larger than K, that will be negligible, so the time complexity will be closer to O(N).
For the K-sized list, you can take a look at the implementation of Is there a PriorityQueue implementation with fixed capacity and custom comparator? , which uses a PriorityQueue with some additional logic around it.
There is an algorithm to do this in worst-case time complexity O(n*log(k)) with very benign time constants (since there is just one pass through the original array, and the inner part that contributes to the log(k) is only accessed relatively seldomly if the input data is well-behaved).
Initialize a priority queue implemented with a binary heap A of maximum size k (internally using an array for storage). In the worst case, this has O(log(k)) for inserting, deleting and searching/manipulating the minimum element (in fact, retrieving the minimum is O(1)).
Iterate through the original unsorted array, and for each value v:
If A is not yet full then
insert v into A,
else, if v>min(A) then (*)
insert v into A,
remove the lowest value from A.
(*) Note that A can return repeated values if some of the highest k values occur repeatedly in the source set. You can avoid that by a search operation to make sure that v is not yet in A. You'd also want to find a suitable data structure for that (as the priority queue has linear complexity), i.e. a secondary hash table or balanced binary search tree or something like that, both of which are available in java.util.
The java.util.PriorityQueue helpfully guarantees the time complexity of its operations:
this implementation provides O(log(n)) time for the enqueing and dequeing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size).
Note that as laid out above, we only ever remove the lowest (first) element from A, so we enjoy the O(log(k)) for that. If you want to avoid duplicates as mentioned above, then you also need to search for any new value added to it (with O(k)), which opens you up to a worst-case overall scenario of O(n*k) instead of O(n*log(k)) in case of a pre-sorted input array, where every single element v causes the inner loop to fire.
The following quote is from "Comparison with other sort algorithms"
section from Wikipedia Merge Sort page
On typical modern architectures, efficient quicksort implementations
generally outperform mergesort for sorting RAM-based arrays.[citation
needed] On the other hand, merge sort is a stable sort and is more
efficient at handling slow-to-access sequential media.
My questions:
Why does Quicksort outperform Mergesort when the data to be sorted can all fit into memory? If all data needed are cached or in memory wouldn't it be fast for both Quicksort and Mergesort to access?
Why is Mergesort more efficient at handling slow-to-access sequential data (such as from disk in the case where the data to be sorted can't all fit into memory)?
(move from my comments below to here)In an array arr of primitives (data are sequential) of n elements. The pair of elements that has to be read and compared in MergeSort is arr[0] and arr[n/2] (happens in the final merge). Now think the pair of elements that has to be read and compared in QuickSort is arr[1] and arr[n] (happens in the first partition, assume we swap the randomly chosen pivot with the first element). We know data are read in blocks and load into cache, or disk to memory (correct me if I am wrong) then isn't there a better chance for the needed data gets load together in one block when using MergeSort? It just seems to me MergeSort would always have the upperhand because it is likely comparing elements that are closer together. I know this is False (see graph below) because QuickSort is obviously faster...... I know MergeSort is not in place and requires extra memory and that is likely to slow things down. Other than that what pieces am I missing in my analysis?
images are from Princeton CS MergeSort and QuickSort slides
My Motive:
I want to understand these above concepts because they are one of the main reasons of why mergeSort is preferred when sorting LinkedList,or none sequential data and quickSort is preferred when sorting Array, or sequential data. And why mergeSort is used to sort Object in Java and quickSort is used to sort primitive type in java.
update: Java 7 API actually uses TimSort to sort Object, which is a hybrid of MergeSort and InsertionSort. For primitives Dual-Pivot QuickSort. These changes were implemented starting in Java SE 7. This has to do with the stability of the sorting algorithm. Why does Java's Arrays.sort method use two different sorting algorithms for different types?
Edit:
I will appreciate an answer that addresses the following aspects:
I know the two sorting algorithms differ in the number of moves, read, and comparisons. If those are that reasons contribute to the behaviors I see listed in my questions (I suspected it) then a thorough explanation of how the steps and process of the sorting algorithm results it having advantages or disadvantages seeking data from disk or memory will be much appreciated.
Examples are welcome. I learn better with examples.
note: if you are reading #rcgldr's answer. check out our conversation in the chat room it has lots of good explanations and details. https://chat.stackoverflow.com/rooms/161554/discussion-between-rcgldr-and-oliver-koo
The main difference is that merge sort does more moves, but fewer compares than quick sort. Even in the case of sorting an array of native types, quick sort is only around 15% faster, at least when I've tested it on large arrays of pseudo random 64 bit unsigned integers, which should be quick sort's best case, on my system (Intel 3770K 3.5ghz, Windows 7 Pro 64 bit, Visual Studio 2015, sorting 16 million pseudo random 64 bit unsigned integers, 1.32 seconds for quick sort, 1.55 seconds for merge sort, 1.32/1.55 ~= 0.85, so quick sort was about 15% faster than merge sort). My test was with a quick sort that had no checks to avoid worst case O(n^2) time or O(n) space. As checks are added to quick sort to reduce or prevent worst case behavior (like fall back to heap sort if recursion becomes too deep), the speed advantage decreases to less than 10% (which is the difference I get between VS2015's implementation of std::sort (modified quick sort) versus std::stable_sort (modified merge sort).
If sorting "strings", it's more likely that what is being sorted is an array of pointers (or references) to those strings. This is where merge sort is faster, because the moves involve pointers, while the compares involve a level of indirection and comparison of strings.
The main reason for choosing quick sort over merge sort is not speed, but space requirement. Merge sort normally uses a second array the same size as the original. Quick sort and top down merge sort also need log(n) stack frames for recursion, and for quick sort limiting stack space to log(n) stack frames is done by only recursing on the smaller partition, and looping back to handle the larger partition.
In terms of cache issues, most recent processors have 4 or 8 way associative caches. For merge sort, during a merge, the two input runs will end up in 2 of the cache lines, and the one output run in a 3rd cache line. Quick sort scans the data before doing swaps, so the scanned data will be in cache, although in separate lines if the two elements being compared / swapped are located far enough from each other.
For an external sort, some variation of bottom up merge sort is used. This because merge sort merge operations are sequential (the only random access occurs when starting up a new pair of runs), which is fast in the case of hard drives, or in legacy times, tape drives (a minimum of 3 tapes drives is needed). Each read or write can be for very large blocks of data, reducing average access time per element in the case of a hard drive, since a large number of elements are read or written at a time with each I/O.
It should also be noted that most merge sorts in libraries are also some variation of bottom up merge sort. Top down merge sort is mostly a teaching environment implementation.
If sorting an array of native types on a processor with 16 registers, such as an X86 in 64 bit mode, 8 of the registers used as start + end pointers (or references) for 4 runs, then a 4-way merge sort is often about the same or a bit faster than quick sort, assuming a compiler optimizes the pointers or references to be register based. It's a similar trade off, like quick sort, 4-way merge sort does more compares (1.5 x compares), but fewer moves (0.5 x moves) than traditional 2-way merge sort.
It should be noted that these sorts are cpu bound, not memory bound. I made a multi-threaded version of a bottom up merge sort, and in the case of using 4 threads, the sort was 3 times faster. Link to Windows example code using 4 threads:
https://codereview.stackexchange.com/questions/148025/multithreaded-bottom-up-merge-sort
I am on a mission of sorting somewhat large array of unsigned, 64-bit, randomly generated integers (over 5E7 elements). Can you direct me to a parallel sorting algorithm that might exhibit almost linear speedup at least in the case of random data?
I am working with Java, in case it makes any difference with regard to fast sorting.
Edit: Note that this question is primarily concerned with parallel sorts capable to achieve near-linear speedup. (Meaning, when the amount of executing cores grows from P to 2P, the time spent by a parallel sort drops to 55 - 50 percent of the computation performed on P cores.)
Well if you got a lot of memory you can use Bucketsort. One other algorithm that goes well with parallelism is Quicksort
From the Wikipedia article on Quicksort,
Like merge sort, quicksort can also be parallelized due to its
divide-and-conquer nature. Individual in-place partition operations
are difficult to parallelize, but once divided, different sections of
the list can be sorted in parallel. The following is a straightforward
approach: If we have processors, we can divide a list of elements
into sublists in O(n) average time, then sort each of these in
average time. Ignoring the O(n) preprocessing and merge times, this is
linear speedup. If the split is blind, ignoring the values, the merge
naïvely costs O(n). If the split partitions based on a succession of
pivots, it is tricky to parallelize and naïvely costs O(n). Given
O(log n) or more processors, only O(n) time is required overall,
whereas an approach with linear speedup would achieve O(log n) time
for overall.
Obviously mergesort is another alternative. I think quicksort gives better average-case performance.
Quicksort and merge sort are both fairly easy to parallelize. Oracle has a fork/join-based integer merge sort here, which you could probably use (if not as-is, then at least as inspiration).
Say you have a few computers (5 on amazon cluster right?) and you want ascending sorting. Split your array into smaller chunks so it fits on each machine.
Assuming you have n chunks/arrays. Have each machine quicksort its chunk. This sorting
will be in parallel (more or less depending on chunk size and machine speed etc).
When done sorintg, have the machines merge the chunks;
You can do this in 2 ways:
2 machines at a time (you're building a merge tree). The merging will happen, again, in parallel. The problem is that the array will grow big due to merging and you have to cache to disk, so when you merge again the machine reads from disk. So some penalty here.
You can do n machines at a time. Have one coordinator machine which takes the min from all the other machines' arrays. This way the coordinator machine builds the entire sorted array by taking the smallest number from each of the other sorted arrays.
Bitonic sort is an algorithm targeted for parallel machines. Here is a sequential Java version and a parallel C++ version to help you get started.
Java 6's mergesort implementation in Arrays.java uses an insertion-sort if the array length is less than some threshold. This value is hard-coded to 7. As the algorithm is recursive, this eventually happens many times for a large array. The canonical merge-sort algorithm does not do this, just using merge-sort all the way down until there is only 1 element in the list.
Is this an optimisation? If so, how is it supposed to help? And why 7? The insertion sort (even of <=7 things) increases the number of comparisons required to sort a large array dramatically - so will add cost to a sort where compareTo() calls are slow.
(x-axis is size of array, y-axis is # of comparisons, for different values of INSERTIONSORT_THRESHOLD)
Yes this is intentional. While the Big-O of mergesort is less than that of quadratic sorts such as insertion sort, the operations it does are more complex and thus slower.
Consider sorting an array of length 8. Merge sort makes ~14 recursive calls to itself in addition to 7 merge operations. Each recursive call contributes some non-trivial overhead to the run-time. Each merge operation involves a loop where index variables must be initialized, incremented, and compared, temporary arrays must be copied, etc. All in all, you can expect well over 300 "simple" operations.
On the other hand, insertion sort is inherently simple and uses about 8^2=64 operations which is much faster.
Think about it this way. When you sort a list of 10 numbers by hand, do you use merge sort? No, because your brain is much better at doing simple things like like insertion sort. However if I gave you a year to sort a list of 100,000 numbers, you might be more inclined to merge sort it.
As for the magic number 7, it is empirically derived to be optimal.
EDIT: In a standard insertion sort of 8 elements, the worst case scenario leads to ~36 comparisons. In a canonical merge sort, you have ~24 comparisons. Adding in the overhead from the method calls and complexity of operations, insertion sort should be faster. Additionally if you look at the average case, insertion sort would make far fewer comparisons than 36.
Insertion sort is n(n-1)/2 and merge sort is n*(log n with base 2 ).
Considering this -
For Array of Length 5 => Insetion sort = 10 and merge sort is 11.609
For Array of Length 6 => Insetion sort = 15 and merge sort is 15.509
For Array of Length 7 => Insetion sort = 21 and merge sort is 19.651
For Array of Length 8 => Insetion sort = 28 and merge sort is 24
From above data it is clear, till length 6, insetion sort is faster and after 7, merge sort is efficient.
That explains why 7 is used.
My understanding is that this is an empirically derived value, where the time required for an insertion sort is actually lower, despite a (possible) higher number of comparisons required. This is so because near the end of a mergesort, the data is likely to be almost sorted, which makes insertion sort perform well.
I have a Big O notation question. Say I have a Java program that does the following things:
Read an Array of Integers into a HashMap that keeps track of how many occurrences of the Integers exists in the array. [1,2,3,1] would be [1->2, 2->1, 3->1].
Then I grab the Keys from the HashMap and place them in an Array:
Set<Integer> keys = dictionary.keySet();
Integer[] keysToSort = new Integer[keys.size()];
keys.toArray(keysToSort);
Sort the keyArray using Arrays.sort.
Then iterate through the sorted keyArray grabbing the corresponding value from the HashMap, in order to display or format the results.
I think I know the following:
Step 1 is O(n)
Step 3 is O(n log n) if I'm to believe the Java API
Step 4 is O(n)
Step 2: When doing this type of calculation I should know how Java implements the Set class toArray method. I would assume that it iterates through the HashMap retrieving the Keys. If that's the case I'll assume its O(n).
If sequential operations dictate I add each part then the final calculation would be
O(n + n·log n + n+n) = O(3n+n·log n).
Skip the constants and you have O(n+n log n). Can this be reduced any further or am I just completely wrong?
I believe O(n + nlogn) can be further simplified to just O(nlogn). This is because the n becomes asymptotically insignificant compared to the nlogn because they are different orders of complexity. The nlogn is of a higher order than n. This can be verified on the wikipedia page by scrolling down to the Order of Common Functions section.
When using complex data structures like hash maps you do need to know how it retrieves the object, not all data structures have the same retrieval process or time to retrieve elements.
This might help you with the finding the Big O of complex data types in Java:
http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf
Step 2 takes O(capacity of the map).
Step 1 and 4 can get bad if you have many keys with same hash code (i.e. O(number of those keys) for a single lookup or change, multiply with the number of those lookups/changes).
O(n + n·log n) = O(n·log n)
You are correct to worry a little about step 2. As far as I can tell the Java API does not specify running times for these operations.
As for O(n + n log n) Treebranch is right. You can reduce that to O(n log n) the reason being that for some base value n0 n log n > c*n forall c /= 0, n > n0 this is obviously the case, since no matter what number you chose for c you could use an n0 set to 2^c+1
First,
Step 1 is only O(n) if inserting integers into a HashMap is O(1). In Perl, the worse case for inserting into a hash is O(N) for N items (aka amortised O(1)), and that's if you discount the length of the key (which is acceptable here). HashMap could be less efficient depending on how it addresses certain issues.
Second,
O(N) is O(N log N), so O(N + N log N) is O(N log N).
One thing big O doesn't tell you is that how big the scaling factor is. It also assume you have an ideal machine. The reason this is imporant is that read from a file is likely to be far more expensive than everything else you do.
If you actually time this you will get something which is startup cost + read time. The startup cost is likely to be the largest for even one million records. The read time will be propertional to the number of bytes read (i.e. the length of the numbers can matter) If you have 100 million the read time is likely to be more important. If you have one billion records, alot will depend on the number of unique entries rather than the total number of entries. The number of unique entries is limited to ~2 billion.
BTW: To perform the counting more efficiently, try TIntIntHashMap which can minimise object creation making it several times faster.
Of course I am only talking about real machines which big O doesn't consider ;)
The point I am making is that you can do a big O calculation but it will not be informative as to how a real application will behave.