Estimating actual (not theoretic) runtime complexity of an implementation

Estimating actual (not theoretic) runtime complexity of an implementation - java

Anyone in computer science will know that HeapSort is O(n log n) worst case in theory, while QuickSort is O(n^2) worst case. However, in practice, a well implemented QuickSort (with good heuristics) will outperform HeapSort on every single data set. On one hand, we barely observe the worst case, and on the other hand e.g. CPU cache lines, prefetching etc. make an enormous difference in many simple tasks. And while e.g. QuickSort can handle presorted data (with a good heuristic) in O(n), HeapSort will always reorganize the data in O(n log n), as it doesn't exploit the existing structure.
For my toy project caliper-analyze, I've recently been looking into methods for estimating the actual average complexity of algorithms from benchmarking results. In particular, I've tried Lawson and Hanson's NNLS fitting with different polynomials.
However, it doesn't work too well yet. Sometimes I get usable results, sometimes I don't. I figure that it may help to just do larger benchmarks, in particular try more parameters.
The following results are for sorting Double objects, in a SAW pattern with 10% randomness. n was only up to 500 for this run, so it is not very representative for actual use... The numbers are the estimated runtime dependency on the size. The output is hand-edited and manually sorted, so it does not reflect what the tool currently provides!
BubbleSortTextbook LINEAR: 67.59 NLOG2N: 1.89 QUADRATIC: 2.51
BubbleSort LINEAR: 54.84 QUADRATIC: 1.68
BidirectionalBubbleSort LINEAR: 52.20 QUADRATIC: 1.36
InsertionSort LINEAR: 17.13 NLOG2N: 2.97 QUADRATIC: 0.86
QuickSortTextbook NLOG2N: 18.15
QuickSortBo3 LINEAR: 59.74 QUADRATIC: 0.12
Java LINEAR: 6.81 NLOG2N: 12.33
DualPivotQuickSortBo5 NLOG2N: 11.28
QuickSortBo5 LINEAR: 3.35 NLOG2N: 9.67
You can tell that while in this particular setting (often it does not at all work satisfactory) the results largely agree with the known behavior: bubble sort is really costly, and a good heuristic on QuickSort is much better. However, e.g. QuickSort with median-of-three heuristics ends up with an O(n + n^2) estimation, for example, while the other QuickSorts are estimated as O(n + n log n)
So now to my actual questions:
Do you know algorithms/approaches/tools that perform runtime complexity analysis from benchmark data, in order to predict which implementation (as you can see above, I'm interested in comparing different implementations of the same algorithm!) performs best on real data?
Do you know scientific articles with respect to this (estimating average complexity of implementations)?
Do you know robust fitting methods that will help getting more accurate estimates here? E.g. a regularized version of NNLS.
Do you know rules-of-thumb of how many samples one needs to get a reasonable estimate? (in particular, when should the tool refrain from giving any estimate, as it will likely be inaccurate anyway?)
And let me emphasizes once more that I'm not interested in theoretical complexity or formal analysis. I'm interested in seeing how implementations (of theoretically even identical algorithms) perform on benchmark data on real CPUs... the numerical factors for a common range are of key interest to me, more than the asymptotic behaviour. (and no, on the long run it is not just time complexity and sorting. But I'm interested in index structures and other parameters. And caliper can e.g. also measure memory consumption, if I'm not mistaken) Plus, I'm working in java. An approach that just calls a Matlab builtin is of no use to me, as I'm not living in the matlab world.
If I have time, I will try to re-run some of these benchmarks with a larger variety of sizes, so I get more data points. Maybe it will then just work... but I believe there are more robust regression methods that I could use to get better estimates even from smaller data sets. Plus, I'd like to detect when the sample is just too small to do any prediction at all!

If you want the actual complexity you are better of measuring it. Trying to guess how a program will perform without measuring is very unreliable.
The same program can perform very differently on a different machine. e.g. one algo might be faster on one machine but slower on another.
Your programs can be slower depending on what else the machine is doing so an algo which looks good but makes heavy use of resources like caches, can be slower and make other programs slower when it has to share those resources.
Testing an algo on a machine by itself can be up to 2-5x faster than trying to use it in a real program.
Do you know rules-of-thumb of how many samples one needs to get a reasonable estimate? (in particular, when should the tool refrain from giving any estimate, as it will likely be inaccurate anyway?)
For determining a percentile like 90% or 99% you need 1/(1-p)^2 i.e. for 99%tile you need at least 10,000 samples after warmup. For 99.9%tile you need one million.

Related

Runtime of BigInteger operations? [duplicate]

What complexity are the methods multiply, divide and pow in BigInteger currently? There is no mention of the computational complexity in the documentation (nor anywhere else).

If you look at the code for BigInteger (provided with JDK), it appears to me that
multiply(..) has O(n^2) (actually the method is multiplyToLen(..)). The code for the other methods is a bit more complex, but you can see yourself.
Note: this is for Java 6. I assume it won't differ in Java 7.

As noted in the comments on #Bozho's answer, Java 8 and onwards use more efficient algorithms to implement multiplication and division than the naive O(N^2) algorithms in Java 7 and earlier.
Java 8 multiplication adaptively uses either the naive O(N^2) long multiplication algorithm, the Karatsuba algorithm or the 3 way Toom-Cook algorithm depending in the sizes of the numbers being multiplied. The latter are (respectively) O(N^1.58) and O(N^1.46).
Java 8 division adaptively uses either Knuth's O(N^2) long division algorithm or the Burnikel-Ziegler algorithm. (According to the research paper, the latter is 2K(N) + O(NlogN) for a division of a 2N digit number by an N digit number, where K(N) is the Karatsuba multiplication time for two N-digit numbers.)
Likewise some other operations have been optimized.
There is no mention of the computational complexity in the documentation (nor anywhere else).
Some details of the complexity are mentioned in the Java 8 source code. The reason that the javadocs do not mention complexity is that it is implementation specific, both in theory and in practice. (As illustrated by the fact that the complexity of some operations is significantly different between Java 7 and 8.)

There is a new "better" BigInteger class that is not being used by the sun jdk for conservateism and lack of useful regression tests (huge data sets). The guy that did the better algorithms might have discussed the old BigInteger in the comments.
Here you go http://futureboy.us/temp/BigInteger.java

Measure it. Do operations with linearly increasing operands and draw the times on a diagram.
Don't forget to warm up the JVM (several runs) to get valid benchmark results.
If operations are linear O(n), quadratic O(n^2), polynomial or exponential should be obvious.
EDIT: While you can give algorithms theoretical bounds, they may not be such useful in practice. First of all, the complexity does not give the factor. Some linear or subquadratic algorithms are simply not useful because they are eating so much time and resources that they are not adequate for the problem on hand (e.g. Coppersmith-Winograd matrix multiplication).
Then your computation may have all kludges you can only detect by experiment. There are preparing algorithms which do nothing to solve the problem but to speed up the real solver (matrix conditioning). There are suboptimal implementations. With longer lengths, your speed may drop dramatically (cache missing, memory moving etc.). So for practical purposes, I advise to do experimentation.
The best thing is to double each time the length of the input and compare the times.
And yes, you do find out if an algorithm has n^1.5 or n^1.8 complexity. Simply quadruple
the input length and you need only the half time for 1.5 instead of 2. You get again nearly half the time for 1.8 if you multiply the length 256 times.

Best primitive type for fast number comparison?

I've got a function that does several hundred million iterations, trying to find the optimal combination of a given set of possibilities. All of my data is pre-calculated and nearly all the arithmetic is simple >= or <= comparison of these pre-calculated values.
I'm wondering if there's an advantage to using certain primitive types (int, long, double) when doing this simple comparison.
I know I could go and run a test to see which is "best" but it's also important to understand the underlying reasoning. For example, maybe int is most easily comparable because it takes up less memory, or maybe double's floating point more easily tells what power of 10 the value is which speeds up comparison in some cases. I'm interested to know these basics and a simple test wouldn't tell me that.

This is premature optimization. You should pick one data type, make an implementation based on it, and run a performance benchmark using your actual implementation, not some made-up test that compares tens of thousands random values of a specific type.
The reason to test using your specific implementation is that there is a number of factors that have a much greater effect on the speed than the timing of raw comparisons:
Cache hit ratio - accessing memory is multiple times slower than accessing a cached value. Re-structuring your loops when accessing large arrays of data could speed up your program by a large factor without changing the number of raw comparisons that your program performs
Branch predictions - keeping CPU pipeline going is very important. If your loops and your data are structured in a way that optimizes the number of correct branch predictions, your code runs a lot faster than code with large number of incorrect branch predictions
It is not possible to measure any of these metrics until you have your actual algorithm implemented. Once you optimized the actual implementation for cache and branching, switching the underlying data type becomes a relatively easy task.

Big-O notation of a function in Java by experiment

I need to write a program that determines the Big-O notation of an algorithm in Java.
I don't have access to the algorithms code, so it must be based on experimentation and execution times. I don't know where to start.
Can someone help me?
Edit: The only thing i know about the algorithm is that takes an integer value and doesn't have a return

Firstly, you need to be aware that what such a program does it to provided an evidence-based guess as to what complexity class the algorithm belongs to. It can give the wrong answer. (Indeed, in complicated cases where the complexity class is unusual, wrong answers are increasingly likely.)
In short, this is NOT complexity analysis.
The general approach would be:
Run the algorithm multiple times with values of N across the range, measuring the execution times. Repeat multiple times for each N, to ensure that you are getting consistent measurements.
Try to fit the experimental results to different kinds of curves; i.e. linear, quadratic, logarithmic. Note that it is the fit for large values of N that matters. So when you check for "goodness of fit", use a measure that gives increasing weight to the larger data points.
This is intended as a start point. For example, I'm expecting that you will do your own research on issues such as:
how to get reliable execution-time measurements (for Java),
how to do curve fitting in a mathematically sound way, and
dealing with the case where the execution times get too long to measure experimentally for large N.

You could do some experiments and graph the amount of input vs the time spent executing the function. Then you could compare it to the well known curves associated with Big-O or try to estimate the equation.

Since you don't have access to the algorithm's source code, the only thing you can do is to measure how long the algorithm takes for inputs of different size, and then try to extrapolate a function from that. Since you are doing experiments, you now enter the field of statistics, so maybe you can use ideas from that area, such as regression analysis.

Programmatically determine asymptotic runtime of a given algorithm? [duplicate]

I wonder whether there is any automatic way of determining (at least roughly) the Big-O time complexity of a given function?
If I graphed an O(n) function vs. an O(n lg n) function I think I would be able to visually ascertain which is which; I'm thinking there must be some heuristic solution which enables this to be done automatically.
Any ideas?
Edit: I am happy to find a semi-automated solution, just wondering whether there is some way of avoiding doing a fully manual analysis.

It sounds like what you are asking for is an extention of the Halting Problem. I do not believe that such a thing is possible, even in theory.
Just answering the question "Will this line of code ever run?" would be very difficult if not impossible to do in the general case.
Edited to add:
Although the general case is intractable, see here for a partial solution: http://research.microsoft.com/apps/pubs/default.aspx?id=104919
Also, some have stated that doing the analysis by hand is the only option, but I don't believe that is really the correct way of looking at it. An intractable problem is still intractable even when a human being is added to the system/machine. Upon further reflection, I suppose that a 99% solution may be doable, and might even work as well as or better than a human.

You can run the algorithm over various size data sets, and you could then use curve fitting to come up with an approximation. (Just looking at the curve you create probably will be enough in most cases, but any statistical package has curve fitting).
Note that some algorithms exhibit one shape with small data sets, but another with large... and the definition of large remains a bit nebulous. This means that an algorithm with a good performance curve could have so much real world overhead that (for small data sets) it doesn't work as well as the theoretically better algorithm.
As far as code inspection techniques, none exist. But instrumenting your code to run at various lengths and outputting a simple file (RunSize RunLength would be enough) should be easy. Generating proper test data could be more complex (some algorithms work better/worse with partially ordered data, so you would want to generate data that represented your normal use-case).
Because of the problems with the definition of "what is large" and the fact that performance is data dependent, I find that static analysis often is misleading. When optimizing performance and selecting between two algorithms, the real world "rubber hits the road" test is the only final arbitrator I trust.

A short answer is that it's impossible because constants matter.
For instance, I might write a function that runs in O((n^3/k) + n^2). This simplifies to O(n^3) because as n approaches infinity, the n^3 term will dominate the function, irrespective of the constant k.
However, if k is very large in the above example function, the function will appear to run in almost exactly n^2 until some crossover point, at which the n^3 term will begin to dominate. Because the constant k will be unknown to any profiling tool, it will be impossible to know just how large a dataset to test the target function with. If k can be arbitrarily large, you cannot craft test data to determine the big-oh running time.

I am surprised to see so many attempts to claim that one can "measure" complexity by a stopwatch. Several people have given the right answer, but I think that there is still room to drive the essential point home.
Algorithm complexity is not a "programming" question; it is a "computer science" question. Answering the question requires analyzing the code from the perspective of a mathematician, such that computing the Big-O complexity is practically a form of mathematical proof. It requires a very strong understanding of the fundamental computer operations, algebra, perhaps calculus (limits), and logic. No amount of "testing" can be substituted for that process.
The Halting Problem applies, so the complexity of an algorithm is fundamentally undecidable by a machine.
The limits of automated tools applies, so it might be possible to write a program to help, but it would only be able to help about as much as a calculator helps with one's physics homework, or as much as a refactoring browser helps with reorganizing a code base.
For anyone seriously considering writing such a tool, I suggest the following exercise. Pick a reasonably simple algorithm, such as your favorite sort, as your subject algorithm. Get a solid reference (book, web-based tutorial) to lead you through the process of calculating the algorithm complexity and ultimately the "Big-O". Document your steps and results as you go through the process with your subject algorithm. Perform the steps and document your progress for several scenarios, such as best-case, worst-case, and average-case. Once you are done, review your documentation and ask yourself what it would take to write a program (tool) to do it for you. Can it be done? How much would actually be automated, and how much would still be manual?
Best wishes.

I am curious as to why it is that you want to be able to do this. In my experience when someone says: "I want to ascertain the runtime complexity of this algorithm" they are not asking what they think they are asking. What you are most likely asking is what is the realistic performance of such an algorithm for likely data. Calculating the Big-O of a function is of reasonable utility, but there are so many aspects that can change the "real runtime performance" of an algorithm in real use that nothing beats instrumentation and testing.
For example, the following algorithms have the same exact Big-O (wacky pseudocode):
example a:
huge_two_dimensional_array foo
for i = 0, i < foo[i].length, i++
for j = 0; j < foo[j].length, j++
do_something_with foo[i][j]
example b:
huge_two_dimensional_array foo
for j = 0, j < foo[j].length, j++
for i = 0; i < foo[i].length, i++
do_something_with foo[i][j]
Again, exactly the same big-O... but one of them uses row ordinality and one of them uses column ordinality. It turns out that due to locality of reference and cache coherency you might have two completely different actual runtimes, especially depending on the actual size of the array foo. This doesn't even begin to touch the actual performance characteristics of how the algorithm behaves if it's part of a piece of software that has some concurrency built in.
Not to be a negative nelly but big-O is a tool with a narrow scope. It is of great use if you are deep inside algorithmic analysis or if you are trying to prove something about an algorithm, but if you are doing commercial software development the proof is in the pudding, and you are going to want to have actual performance numbers to make intelligent decisions.
Cheers!

This could work for simple algorithms, but what about O(n^2 lg n), or O(n lg^2 n)?
You could get fooled visually very easily.
And if its a really bad algorithm, maybe it wouldn't return even on n=10.

Proof that this is undecidable:
Suppose that we had some algorithm HALTS_IN_FN(Program, function) which determined whether a program halted in O(f(n)) for all n, for some function f.
Let P be the following program:
if(HALTS_IN_FN(P,f(n)))
{
while(1);
}
halt;
Since the function and the program are fixed, HALTS_IN_FN on this input is constant time. If HALTS_IN_FN returns true, the program runs forever and of course does not halt in O(f(n)) for any f(n). If HALTS_IN_FN returns false, the program halts in O(1) time.
Thus, we have a paradox, a contradiction, and so the program is undecidable.

A lot of people have commented that this is an inherently unsolvable problem in theory. Fair enough, but beyond that, even solving it for any but the most trivial cases would seem to be incredibly difficult.
Say you have a program that has a set of nested loops, each based on the number of items in an array. O(n^2). But what if the inner loop is only run in a very specific set of circumstances? Say, on average, it's run in aprox log(n) cases. Suddenly our "obviously" O(n^2) algorithm is really O(n log n). Writing a program that could determine if the inner loop would be run, and how often, is potentially more difficult than the original problem.
Remember O(N) isn't god; high constants can and will change the playing field. Quicksort algorithms are O(n log n) of course, but when the recursion gets small enough, say down to 20 items or so, many implementations of quicksort will change tactics to a separate algorithm as it's actually quicker to do a different type of sort, say insertion sort with worse O(N), but much smaller constant.
So, understand your data, make educated guesses, and test.

I think it's pretty much impossible to do this automatically. Remember that O(g(n)) is the worst-case upper bound and many functions perform better than that for a lot of data sets. You'd have to find the worst-case data set for each one in order to compare them. That's a difficult task on its own for many algorithms.

You must also take care when running such benchmarks. Some algorithms will have a behavior heavily dependent on the input type.
Take Quicksort for example. It is a worst-case O(n²), but usually O(nlogn). For two inputs of the same size.
The traveling salesman is (I think, not sure) O(n²) (EDIT: the correct value is 0(n!) for the brute force algotithm) , but most algorithms get rather good approximated solutions much faster.
This means that the the benchmarking structure has to most of the time be adapted on an ad hoc basis. Imagine writing something generic for the two examples mentioned. It would be very complex, probably unusable, and likely will be giving incorrect results anyway.

Jeffrey L Whitledge is correct. A simple reduction from the halting problem proves that this is undecidable...
ALSO, if I could write this program, I'd use it to solve P vs NP, and have $1million... B-)

I'm using a big_O library (link here) that fits the change in execution time against independent variable n to infer the order of growth class O().
The package automatically suggests the best fitting class by measuring the residual from collected data against each class growth behavior.
Check the code in this answer.
Example of output,
Measuring .columns[::-1] complexity against rapid increase in # rows
--------------------------------------------------------------------------------
Big O() fits: Cubic: time = -0.017 + 0.00067*n^3
--------------------------------------------------------------------------------
Constant: time = 0.032 (res: 0.021)
Linear: time = -0.051 + 0.024*n (res: 0.011)
Quadratic: time = -0.026 + 0.0038*n^2 (res: 0.0077)
Cubic: time = -0.017 + 0.00067*n^3 (res: 0.0052)
Polynomial: time = -6.3 * x^1.5 (res: 6)
Logarithmic: time = -0.026 + 0.053*log(n) (res: 0.015)
Linearithmic: time = -0.024 + 0.012*n*log(n) (res: 0.0094)
Exponential: time = -7 * 0.66^n (res: 3.6)
--------------------------------------------------------------------------------

I guess this isn't possible in a fully automatic way since the type and structure of the input differs a lot between functions.

Well, since you can't prove whether or not a function even halts, I think you're asking a little much.
Otherwise #Godeke has it.

I don't know what's your objective in doing this, but we had a similar problem in a course I was teaching. The students were required to implement something that works at a certain complexity.
In order not to go over their solution manually, and read their code, we used the method #Godeke suggested. The objective was to find students who used linked list instead of a balansed search tree, or students who implemented bubble sort instead of heap sort (i.e. implementations that do not work in the required complexity - but without actually reading their code).
Surprisingly, the results did not reveal students who cheated. That might be because our students are honest and want to learn (or just knew that we'll check this ;-) ). It is possible to miss cheating students if the inputs are small, or if the input itself is ordered or such. It is also possible to be wrong about students who did not cheat, but have large constant values.
But in spite of the possible errors, it is well worth it, since it saves a lot of checking time.

As others have said, this is theoretically impossible. But in practice, you can make an educated guess as to whether a function is O(n) or O(n^2), as long as you don't mind being wrong sometimes.
First time the algorithm, running it on input of various n. Plot the points on a log-log graph. Draw the best-fit line through the points. If the line fits all the points well, then the data suggests that the algorithm is O(n^k), where k is the slope of the line.
I am not a statistician. You should take all this with a grain of salt. But I have actually done this in the context of automated testing for performance regressions. The patch here contains some JS code for it.

If you have lots of homogenious computational resources, I'd time them against several samples and do linear regression, then simply take the highest term.

It's easy to get an indication (e.g. "is the function linear? sub-linear? polynomial? exponential")
It's hard to find the exact complexity.
For example, here's a Python solution: you supply the function, and a function that creates parameters of size N for it. You get back a list of (n,time) values to plot, or to perform regression analysis. It times it once for speed, to get a really good indication it would have to time it many times to minimize interference from environmental factors (e.g. with the timeit module).
import time
def measure_run_time(func, args):
start = time.time()
func(*args)
return time.time() - start
def plot_times(func, generate_args, plot_sequence):
return [
(n, measure_run_time(func, generate_args(n+1)))
for n in plot_sequence
]
And to use it to time bubble sort:
def bubble_sort(l):
for i in xrange(len(l)-1):
for j in xrange(len(l)-1-i):
if l[i+1] < l[i]:
l[i],l[i+1] = l[i+1],l[i]
import random
def gen_args_for_sort(list_length):
result = range(list_length) # list of 0..N-1
random.shuffle(result) # randomize order
# should return a tuple of arguments
return (result,)
# timing for N = 1000, 2000, ..., 5000
times = plot_times(bubble_sort, gen_args_for_sort, xrange(1000,6000,1000))
import pprint
pprint.pprint(times)
This printed on my machine:
[(1000, 0.078000068664550781),
(2000, 0.34400010108947754),
(3000, 0.7649998664855957),
(4000, 1.3440001010894775),
(5000, 2.1410000324249268)]

How does the modulus operator in java function?

I'm about to start optimizations on an enormous piece of code and I need to know exactly which operations are performed when the modulus operator is used. I have been searching for quite a while, but I can't find anything about the machine code behind it. Any ideas?

If you need to know
exactly which operations are performed when the modulus operator is used
then I would suggest you are "doing it wrong".
Modulus may be different depending on OS and underlying architecture. It may vary or it may not, but if you need to rely on the implementation, it is likely that your time could best be spent elsewhere. The implementation is not guaranteed to stay the same, or to be consistent across different machines.
Why do you believe modulus to be a major source of computation? Regardless of its implementation, the operation is very likely to be a constant - i.e, if it is operating within an algorithm which has big-O greater than constant time, optimize the algorithm first.
Ask yourself why you need to optimize. Is the computation taking (significantly) longer than expected?
Then ask yourself where 90 - 99% of the computation is being spent. Try using a profiler to get numbers, even if you think you know where time is being spent. It may give you a clue or shed light on a bug.

The modulus operator on integers is built in on most platforms. An instruction with the timing comparable to that of division is performed, producing the modulus.
The compiler can perform an optimization for divisors that are powers of two: instead of performing modulos of, say, x % 512, the compiler can use a potentially faster x & 0x01FF.

Any ideas?
Yes, don't waste your time on it. There's going to be other bits of code you can improve far more than trying to beat the compiler at it's own job

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.