Finding big O time complexity function using only run time data [closed]

Finding big O time complexity function using only run time data [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In a project I have for my algorithm class we have to run 5 different sorting methods of unknown types and gather running time data for each of them using the doubling method for the problem size. We then have to use the ratio of the running times to calculate what the time complexity functions. The sorting methods used are selection sort, insertion sort, merge sort, and quicksort (randomized and non-randomized). We have to use empirical analysis to determine which type of sorting method is used in the five unknown methods in the program. My question is how does one go from the ratio to the function. I know that N = 2^k so we can use log(base2)ratio = k but I am not sure how that correlates with the time complexity of say mergesort which is O(N * log N).

The Big-O notation more or less describes a function, where the input N is the size of the collection, and the output is how much time will be taken. I would suggest benchmarking your algorithms by running a variety of sample input sizes, and then collecting the running times. For example, for selection sort you might collect this data:
N | running time (ms)
1000 | 0.1
10000 | 10
100000 | 1000
1000000 | 100000
If you plot this, using a tool like R or Matlab, or maybe Excel if you are feeling lazy, you will see that the running time varies with the square of the sample size N. That is, multiplying the sample size by 10 results in a 100-fold increase in running time. This is O(N^2) behavior.
For the other algorithms, you may collect similar benchmark data, and also create plots.
Note that you have to keep in mind things like startup time which Java can take to begin running your actual code. The way to deal with this is to take many data points. Overall, linear, logarithmic, etc. behavior should still be discernible.

On a log-log graph (log of size vs log of running time) you will find that O(n^k) is a line of slope k. That will let you tell O(n) from O(n^2) very easily.
To tell O(n) from O(n log(n)) just graph f(n)/n vs log(n). A function that is O(n) will look like a horizontal line, and a function that is O(n log(n)) will look like a line with slope 1.
Don't forget to throw both ordered and unordered data at your methods.

you can just look at the growth by time;
If linear (O(n)), doubling the input space just doubles the time, t -> 2t
If quasi-linear(O(nlogn)), doubling the space just increase by 2n(log2n), t->2t(log2t)
If quadratic (O(n^2)), doubling the input space just quadratically,t -> 4t^2
Note that the timings are theoretical. expect the values around some threshold.

Related

Heap Sort vs Merge Sort in Speed [duplicate]

This question already has answers here:
Quicksort superiority over Heap Sort
(6 answers)
Closed 4 years ago.
Which algorithm is faster when iterating through a large array: heap sort or merge sort? Why is one of these algorithms faster than the other?

Although time complexity is the same, the constant factors are not. Generally merge sort will be significantly faster on a typical system with a 4 or greater way cache, since merge sort will perform sequential reads from two runs and sequential writes to a single merged run. I recall a merge sort written in C was faster than an optimized heap sort written in assembly.
One issue is that heap sort swaps data, that's two reads and two writes per swap, while merge sort moves data, one read and one write per move.
The main drawback for merge sort is a second array (or vector) of the same size as the original (or optionally 1/2 the size of the original) is needed for working storage, on a PC with 4 GB or more of RAM, this usually isn't an issue.
On my system, Intel 3770K 3.5 ghz, Windows 7 Pro 64 bit, Visual Studio 2015, to sort 2^24 = 16,777,216 64 bit unsigned integers, heap sort takes 7.98 seconds while bottom up merge sort takes 1.59 seconds and top down merge sort takes 1.65 seconds.

Both sort methods have the same time complexity, and are optimal. The time required to merge in a merge sort is counterbalanced by the time required to build the heap in heapsort. The merge sort requires additional space. The heapsort may be implemented using additional space, but does not require it. Heapsort, however, is unstable, in that it doesn't guarantee to leave 'equal' elements unchanged. If you test both methods fairly and under the same conditions, the differences will be minimal.

would you choose the speed of an algorithm over its efficient use of memory? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In a past examination paper, there was a question that gave two methods to check if int[] A contained all the same values as int[] B (two unsorted arrays of size N), an engineer had to decide which implementation he would use.
The first method used a single for loop with a nested call to linear search; its asymptotic run time was calculated to be theta(n^2) and additional memory usage for creating copies of A and B is around (N*4N + 4N) bytes
The second method used an additional int[] C, that is a copy of B, which is sorted. A for loop is used with a nested binary search; it's asymptotic run time was calculated as theta(nlog(n^2)) and additional memory usage for creating copies of A and B is around (4N + 4N + N*4N + 4N) bytes (the first two 4N's are due to C being a copy of B, and the subsequent copy of C being created in the sort(C); function)
The final question asks which implementation the Engineer should use, I believe that a faster algorithm would be the better option, as for larger inputs the faster algorithm would cut the time of computation drastically, although the draw back is that with larger inputs, he runs the risk of an OutOfMemory Error, I understand one could state either method, depending on the size of the arrays, although my question is, in the majority of cases, which implementation is the better implementation?

The first alg. has complexity of theta (n^2) and the second of theta ( n log(n^2)).
Hence, the second one is much faster for n great enough.
As you mentioned, the memory usage comes into account and one would argue that the second one consumes a lot more memory but that is not true.
The first alg. consumes: n*4n+4n= 4n^2 + 4n
The second alg. consumes: 4n+4n+4n+n*4n=4n^2+12n
Suppose n is equal to 1000 then the first alg consumes 4004000 and the second 4012000 memory. So there is no big difference in memory consumption between the two algorithms.
So, from a memory consumption perspective it does not really matter which alg. you choose and in terms of complexity they both consumes theta (n^2) memory.

A relevant question to many a programmer, but the answer depends entirely on the context. If the software runs on a system with a little memory, then the algorithm with a smaller footprint will likely be better. On the other hand, a real-time system needs speed; so the faster algorithm would likely be better.
It is important to also recognize the more subtle issues that arise during execution, e.g. when the additional memory requirements of the faster algorithm force the system to use paging and, in turn, slow down execution.
Thus, it is important to understand in-context the benefit-cost trade off for each algorithm. And as always, emphasize code readability and sound algorithm design/integration.

Why does Java documentation not include time complexity? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I find it really surprising that the java does not specify any time or space complexities for any of the Collection libraries. Given that garbage collection in java is unpredictable, hence nothing is guaranteed, but isn't it helpful to give a average time complexity at least? What am I missing here?

The time complexities are dependant on how you use the collections, however they generally follow the standard time complexities. You can find out the time complexity of an array, linked list, tree, or hash map anywhere, but there is no requirement that an implementation follow these complexities.
In short, time complexity is for an ideal machine, not real machine with real implementations, so even if you know the time complexity, the details of the actual use case can be more important.

The time complexities are mostly self-explanatory based on the implementation. LinkedList is going to be constant time to add items to the end, approaching linear to add items in the middle. HashMap is going to be near constant access time, ArrayList will be linear, until it needs to grow the array, etc.

I don't know what you're talking about. HashSet:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
ArrayList:
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.

There are time complexities for the major ones, warnings about methods' performance and references to original works. I think these comments are useful, and big-O's might not always be useful.
I.e. Arrays.sort(Object[] array):
Implementation note: This implementation is a stable, adaptive,
iterative mergesort that requires far fewer than n lg(n) comparisons
when the input array is partially sorted, while offering the
performance of a traditional mergesort when the input array is
randomly ordered. If the input array is nearly sorted, the
implementation requires approximately n comparisons. Temporary
storage requirements vary from a small constant for nearly sorted
input arrays to n/2 object references for randomly ordered input
arrays.
The implementation takes equal advantage of ascending and
descending order in its input array, and can take advantage of
ascending and descending order in different parts of the the same
input array. It is well-suited to merging two or more sorted arrays:
simply concatenate the arrays and sort the resulting array.
The implementation was adapted from Tim Peters's list sort for
Python ([TimSort - http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
It uses techiques from Peter McIlroy's "Optimistic
Sorting and Information Theoretic Complexity", in Proceedings of the
Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474,
January 1993.
Or CopyOnWriteArrayList:
This is ordinarily too costly, but may be more efficient
than alternatives when traversal operations vastly outnumber
mutations, and is useful when you cannot or don't want to synchronize
traversals, yet need to preclude interference among concurrent
threads. The "snapshot" style iterator method uses a reference to
the state of the array at the point that the iterator was created.
This array never changes during the lifetime of the iterator, so
interference is impossible and the iterator is guaranteed not to
throw ConcurrentModificationException.

Big Oh for (n log n) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am currently studying basic algorithms for Big Oh. I was wondering if anyone can show me what the code for (n log n) in Java using Big Oh would be like or direct me to any SO page where one exists.
Since I am just a beginner, I can only imagine the code before I write it. So, theoretically (at least), it should contain one for loop where we have something of n times. Then for the log n, we can use the while loop. So then the loop is executed n times and the while loop is executed log base 2 times. At least that is how I am imagining it in my head but seeing the code would clear things up.

int n = 100
for(int i = 0; i < n; i++) //this loop is executed n times, so O(n)
{
for(int j = n; j > 0; j/=2) //this loop is executed O(log n) times
{
}
}
Explanation:
The outer for loop should be clear; it is executed n times. Now to the inner loop. In the inner loop, you take n and always divide it by 2. So, you ask yourself: How many times can I divide n by 2?
It turns out that this is O (log n). In fact, the base of log is 2, but in Big-O notation, we remove the base since it only adds factors to our log that we are not interested in.
So, you are executing a loop n times, and within that loop, you are executing another loop log(n) times. So, you have O(n) * O(log n) = O(n log n).

A very popular O(n log n) algorithm is merge sort. http://en.wikipedia.org/wiki/Merge_sort for example of the algorithm and pseudocode. The log n part of the algorithm is achieved through breaking down the problem into smaller subproblems, in which the height of the recursion tree is log n.
A lot of sorting algortihms has the running time of O(n log n). Refer to http://en.wikipedia.org/wiki/Sorting_algorithm for more examples.

Algorithms with a O(.) time complexity involving log n's typically involve some form of divide and conquer.
For example, in MergeSort the list is halved, each part is individually merge-sorted and then the two halves are merged together. Each list is halved.
Whenever you have work being halved or reduced in size by some fixed factor, you'll usually end up with a log n component of the O(.).
In terms of code, take a look at the algorithm for MergeSort. The important feature, of typical implementations, is that it is recursive (note that TopDownSplitMerge calls itself twice in the code given on Wikipedia).
All good standard sorting algorithms have O(n log n) time complexity and it's not possible to do better in the worst case, see Comparison Sort.
To see what this looks like in Java code, just search! Here's one example.

http://en.wikipedia.org/wiki/Heapsort
Simple example is just like you described - execute n times some operation that takes log(n) time.
Balanced binary trees have log(n) height, so some tree algorithms will have such complexity.

What sort does Java Collections.sort(nodes) use?

I think it is MergeSort, which is O(n log n).
However, the following output disagrees:
-1,0000000099000391,0000000099000427
1,0000000099000427,0000000099000346
5,0000000099000391,0000000099000346
1,0000000099000427,0000000099000345
5,0000000099000391,0000000099000345
1,0000000099000346,0000000099000345
I am sorting a nodelist of 4 nodes by sequence number, and the sort is doing 6 comparisons.
I am puzzled because 6 > (4 log(4)). Can someone explain this to me?
P.S. It is mergesort, but I still don't understand my results.
Thanks for the answers everyone. Thank you Tom for correcting my math.

O(n log n) doesn't mean that the number of comparisons will be equal to or less than n log n, just that the time taken will scale proportionally to n log n. Try doing tests with 8 nodes, or 16 nodes, or 32 nodes, and checking out the timing.

You sorted four nodes, so you didn't get merge sort; sort switched to insertion sort.
In Java, the Arrays.sort() methods use merge sort or a tuned quicksort depending on the datatypes and for implementation efficiency switch to insertion sort when fewer than seven array elements are being sorted. (Wikipedia, emphasis added)
Arrays.sort is used indirectly by the Collections classes.
A recently accepted bug report indicates that the Sun implementation of Java will use Python's timsort in the future: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6804124
(The timsort monograph, linked above, is well worth reading.)

An algorithm A(n) that processes an amount of data n is in O(f(n)), for some function f, if there exist two strictly positive constants C_inf and C_sup such that:
C_inf . f(n) < ExpectedValue(OperationCount(A(n))) < C_sup . f(n)
Two things to note:
The actual constants C could be anything, and do depend on the relative costs of operations (depending on the language, the VM, the architecture, or your actual definition of an operation). On some platforms, for instance, + and * have the same cost, on some other the later is an order of magnitude slower.
The quantity ascribed as "in O(f(n))" is an expected operation count, based on some probably arbitrary model of the data you are dealing with. For instance, if your data is almost completely sorted, a merge-sort algorithm is going to be mostly O(n), not O(n . Log(n)).

I've written some stuff you may be interested in about the Java sort algorithm and taken some performance measurements of Collections.sort(). The algorithm at present is a mergesort with an insertion sort once you get down to a certain size of sublists (N.B. this algorithm is very probably going to change in Java 7).
You should really take the Big O notation as an indication of how the algorithm will scale overall; for a particular sort, the precise time will deviate from the time predicted by this calculation (as you'll see on my graph, the two sort algorithms that are combined each have different performance characteristics, and so the overall time for a sort is a bit more complex).
That said, as a rough guide, for every time you double the number of elements, if you multiply the expected time by 2.2, you won't be far out. (It doesn't make much sense really to do this for very small lists of a few elements, though.)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.