In Java, why is comparing two elements of relatively time consuming

In Java, why is comparing two elements of relatively time consuming - java

So I am on a program that uses, insertion sort, selection sort, and merge sort. I time all the programs and make a table on which are the fastest. I understand why merge sort is more efficient than selection sort and insertion sort(b/c of the effectiveness of comparing elements).
My question is why does comparing elements of an array relatively consuming and why does it make insertion and selection sort less efficient.
Note : I am new to java and couldn't find anything on this topic. Thx for your responses.

My question is why comparing 2 elements of an array relatively consuming ....
Relative to what?
In fact, the time taken to compare two instances of some class depends on how the compareTo or compare method is implemented. However comparison is typically expensive because of the nature of the computation.
For example, if you have to compare two strings that are equal (but different objects), you have to compare each character in one string with the corresponding character in the other one. For strings of length M, that is M character comparisons plus the overheads of looping over the characters. (Obviously, the comparison is cheaper in other cases .... depending on how different the strings are, for example.)
and why does it make insertion and selection sort less efficient.
The reason that insertion and selection sort are slower (for large datasets) is because they do more comparisons than other more complicated algorithms. Given a dataset with N elements:
The number of comparisons for quicksort and similar is proportional to N * logN
The number of comparisons for insertion sort and similar is proportional to N * N.
As N gets bigger N * N gets bigger than N * log N irrespective of the constants of proportionality.
Assuming that the datasets and element classes are the same, if you do more comparisons, that takes more CPU time.
The other thing to note is that the number of comparisons performed by a sort algorithm is typically proportional to other CPU overheads of the algorithm. That means that it is typically safe (though not mathematically sound) to use the comparison count as a proxy for the overall complexity of a sort algorithm.

why comparing 2 elements of an array relatively consuming
As asked in Stephen C's answer, relative to what?
Selection sort and insertion sort have time complexity O(n^2), while merge sort has time complexity O(n log(n)), so for reasonably large n, merge sort will be faster, but not because of compare overhead compared to the O(n^2) sorts.
For merge sort on an optimizing compiler, where compared elements are loaded into registers (assuming the elements fit in registers), then the compare overhead is small since the move will will write the value that was loaded into a register rather than read it from memory again.
As for compare overhead, if sorting an array of primitives, indexing is used to access the primitives, but if sorting an array of objects, which is usually implemented as an array of pointers to objects, the compare overhead is increased due to dereferencing of pointers. This would impact a comparison of quick sort versus merge sort (more moves, fewer compares), but the issue with merge sort versus insertion sort or selection sort is the O(n^2) time complexity versus merge sort O(n log(n)) time complexity.
In the case of sorting an array of objects, there's also the issue of sorting the pointers versus sorting the objects, which is a cache locality issue. Depending on object size, it may be better to sort the objects rather than sort the pointers, but this isn't really related to compare overhead as asked in the original question.

Related

Which of the given sorting algorithms will be fastest when run on an array that happens to already be in order?

Question:
Which sorting algorithm will be fastest when run on an array that happens to already be in order?
(A) It is not possible to know which will be fastest.
(B) selection sort
(C) insertion sort
(D) binary sort
(E) All of these algorithms will run at the same speed.
I have been doing some research for a homework assignment and have been getting conflicting answers. Some places say it is insertion, while some say both are equal and yet others say it can't be determined. Very confused right now, would appreciate some help.

C Insertion sort
Is normally always the fastest and easiest to implement when an array is already nearly or completely sorted. As we have less operations.
Selection sort will still do pair wise comparison and binary sort will also be slightly slower.

I would say that insertion sort because:
Insertion sort is a simple sorting algorithm, it builds the final sorted array one item at a time. It is much less efficient on large lists than other sort algorithms.
Advantages of Insertion Sort:
1) It is very simple.
2) It is very efficient for small data sets.
3) It is stable; i.e., it does not change the relative order of elements with equal keys.
4) In-place; i.e., only requires a constant amount O(1) of additional memory space.
Insertion sort iterates through the list by consuming one input element at each repetition, and growing a sorted output list. On a repetition, insertion sort removes one element from the input data, finds the location it belongs within the sorted list, and inserts it there. It repeats until no input elements remain.
I read this at- http://www.java2novice.com/java-sorting-algorithms/insertion-sort/

Sorting an array of partially sorted primitive integers in Java

The array in question may hold any integer which equals or is bigger than zero and the numbers are unique. The numbers have to be in ascending order.
The array's size will usually be less than 100.
Most of the array is already sorted. By most I mean on avarage atleast 90% of it.
I've found this implementation of TimSort but it is not for primitive values. Autoboxing would cause a lot of overhead.
Performance is most crucial as the sorting-algorithm will be called many times.

Use Arrays.sort:
int[] array = /* something */;
Arrays.sort(array);
Being only one line, this is (obviously) both extremely simple to use and very readable. That should be your #1 priority when writing code. It's also going to be pretty darn fast, because the writers of the standard library have put a lot of effort into performance, particularly relating to sorting algorithms.
The only situation in which you should not use Arrays.sort is if you've almost entirely finished your system, profiled it carefully, and determined that the part of the code that sorts your array is the bottleneck. Even then, you still might not be able to write your own sorting algorithm that performs noticeably better.

Depends what you mean by "almost sorted". Insertion Sort is a very efficient algorithm if the array is nearly sorted (linear complexity when it is sorted), but the performance can vary depending on whether the outliers are close or far from their final sorted position. For example, [1,2,3,4,6,5,7,8,9] will be slightly faster to sort than [1,3,4,5,6,7,8,9,2].

Why is merge sort used by java to sort an array greater than elements 7

According to Wikipedia:
"In Java, the Arrays.sort() methods use merge sort or a tuned
quicksort depending on the datatypes and for implementation efficiency
switch to insertion sort when fewer than seven array elements are
being sorted"
But why? Both merge sort and quick sort are O(n log n).

Where the algorithms differ is their typical case behavior and this is where insertion sort is one of the worst. On the other hand, for very small collections (n ≈ n2) insertion sort's simplicity wins.
Java's algorithm selection rules prefer QuickSort first, and only fall back to something else due to specific restrictions. QuickSort, namely, is an unstable sort and thus is only acceptable for the sorting of primitives. For reference types, TimSort is used as of OpenJDK 7 (previously MergeSort).

It's not that ad-hoc:
Arrays.java's sort method uses quicksort for arrays of primitives and merge sort for arrays of objects.
Why does Java's Arrays.sort method use two different sorting algorithms for different types?
Also, according to the docs:
For example, the algorithm used by sort(Object[]) does not have to be a mergesort, but it does have to be stable.
And another quote from the javadoc:
This sort is guaranteed to be stable: equal elements will not be
reordered as a result of the sort.
Implementation note: This implementation is a stable, adaptive,
iterative mergesort that requires far fewer than n lg(n) comparisons
when the input array is partially sorted, while offering the
performance of a traditional mergesort when the input array is
randomly ordered. If the input array is nearly sorted, the
implementation requires approximately n comparisons. Temporary storage
requirements vary from a small constant for nearly sorted input arrays
to n/2 object references for randomly ordered input arrays.
The implementation takes equal advantage of ascending and descending
order in its input array, and can take advantage of ascending and
descending order in different parts of the the same input array. It is
well-suited to merging two or more sorted arrays: simply concatenate
the arrays and sort the resulting array.
The implementation was adapted from Tim Peters's list sort for Python
( TimSort).

Quicksort and mergesort are both O(n log n) in average performance. In the worst case, quicksort is O(n^2).

They moved to tuned merge sort because found that statistically for real partially sorted data sets this method may be a little faster than quick sort.

Well Java 7 uses Timsort - a hybrid of Merge and insertion. As a matter of fact it's widely used - android, Octave,python use it too.
http://en.wikipedia.org/wiki/Timsort

Why does Java 6 Arrays#sort(Object[]) change from mergesort to insertionsort for small arrays?

Java 6's mergesort implementation in Arrays.java uses an insertion-sort if the array length is less than some threshold. This value is hard-coded to 7. As the algorithm is recursive, this eventually happens many times for a large array. The canonical merge-sort algorithm does not do this, just using merge-sort all the way down until there is only 1 element in the list.
Is this an optimisation? If so, how is it supposed to help? And why 7? The insertion sort (even of <=7 things) increases the number of comparisons required to sort a large array dramatically - so will add cost to a sort where compareTo() calls are slow.
(x-axis is size of array, y-axis is # of comparisons, for different values of INSERTIONSORT_THRESHOLD)

Yes this is intentional. While the Big-O of mergesort is less than that of quadratic sorts such as insertion sort, the operations it does are more complex and thus slower.
Consider sorting an array of length 8. Merge sort makes ~14 recursive calls to itself in addition to 7 merge operations. Each recursive call contributes some non-trivial overhead to the run-time. Each merge operation involves a loop where index variables must be initialized, incremented, and compared, temporary arrays must be copied, etc. All in all, you can expect well over 300 "simple" operations.
On the other hand, insertion sort is inherently simple and uses about 8^2=64 operations which is much faster.
Think about it this way. When you sort a list of 10 numbers by hand, do you use merge sort? No, because your brain is much better at doing simple things like like insertion sort. However if I gave you a year to sort a list of 100,000 numbers, you might be more inclined to merge sort it.
As for the magic number 7, it is empirically derived to be optimal.
EDIT: In a standard insertion sort of 8 elements, the worst case scenario leads to ~36 comparisons. In a canonical merge sort, you have ~24 comparisons. Adding in the overhead from the method calls and complexity of operations, insertion sort should be faster. Additionally if you look at the average case, insertion sort would make far fewer comparisons than 36.

Insertion sort is n(n-1)/2 and merge sort is n*(log n with base 2 ).
Considering this -
For Array of Length 5 => Insetion sort = 10 and merge sort is 11.609
For Array of Length 6 => Insetion sort = 15 and merge sort is 15.509
For Array of Length 7 => Insetion sort = 21 and merge sort is 19.651
For Array of Length 8 => Insetion sort = 28 and merge sort is 24
From above data it is clear, till length 6, insetion sort is faster and after 7, merge sort is efficient.
That explains why 7 is used.

My understanding is that this is an empirically derived value, where the time required for an insertion sort is actually lower, despite a (possible) higher number of comparisons required. This is so because near the end of a mergesort, the data is likely to be almost sorted, which makes insertion sort perform well.

Why does java.util.Arrays.sort(Object[]) use 2 kinds of sorting algorithms?

I found that java.util.Arrays.sort(Object[]) use 2 kinds of sorting algorithms(in JDK 1.6).
pseudocode:
if(array.length<7)
insertionSort(array);
else
mergeSort(array);
Why does it need 2 kinds of sorting here? for efficiency?

It's important to note that an algorithm that is O(N log N) is not always faster in practice than an O(N^2) algorithm. It depends on the constants, and the range of N involved. (Remember that asymptotic notation measures relative growth rate, not absolute speed).
For small N, insertion sort in fact does beat merge sort. It's also faster for almost-sorted arrays.
Here's a quote:
Although it is one of the elementary sorting algorithms with O(N^2) worst-case time, insertion sort is the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead).
For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divide-and-conquer sorting algorithms, such as merge sort or quick sort.
Here's another quote from Best sorting algorithm for nearly sorted lists paper:
straight insertion sort is best for small or very nearly sorted lists
What this means is that, in practice:
Some algorithm A1 with higher asymptotic upper bound may be preferable than another known algorithm A2 with lower asymptotic upper bound
Perhaps A2 is just too complicated to implement
Or perhaps it doesn't matter in the range of N considered
See e.g. Coppersmith–Winograd algorithm
Some hybrid algorithms may adapt different algorithms depending on the input size
Related questions
Which sorting algorithm is best suited to re-sort an almost fully sorted list?
Is there ever a good reason to use Insertion Sort?
A numerical example
Let's consider these two functions:
f(x) = 2x^2; this function has a quadratic growth rate, i.e. "O(N^2)"
g(x) = 10x; this function has a linear growth rate, i.e. "O(N)"
Now let's plot the two functions together:
Source: WolframAlpha: plot 2x^2 and 10x for x from 0 to 10
Note that between x=0..5, f(x) <= g(x), but for any larger x, f(x) quickly outgrows g(x).
Analogously, if A1 is a quadratic algorithm with a low overhead, and A2 is a linear algorithm with a high overhead, for smaller input, A1 may be faster than A2.
Thus, you can, should you choose to do so, create a hybrid algorithm A3 which simply selects one of the two algorithms depending on the size of the input. Whether or not this is worth the effort depends on the actual parameters involved.
Many tests and comparisons of sorting algorithms have been made, and it was decided that because insertion sort beats merge sort for small arrays, it was worth it to implement both for Arrays.sort.

It's for speed. The overhead of mergeSort is high enough that for short arrays it would be slower than insertion sort.

Quoted from: http://en.wikipedia.org/wiki/Insertion_sort
Some divide-and-conquer algorithms such as quicksort and mergesort sort by
recursively dividing the list into smaller sublists which are then sorted.
A useful optimization in practice for these algorithms is to use insertion
sort for sorting small sublists, where insertion sort outperforms these more
complex algorithms. The size of list for which insertion sort has the advantage
varies by environment and implementation, but is typically between eight and
twenty elements.

It appears that they believe mergeSort(array) is slower for short arrays. Hopefully they actually tested that.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.