I wonder whether there is any automatic way of determining (at least roughly) the Big-O time complexity of a given function?
If I graphed an O(n) function vs. an O(n lg n) function I think I would be able to visually ascertain which is which; I'm thinking there must be some heuristic solution which enables this to be done automatically.
Any ideas?
Edit: I am happy to find a semi-automated solution, just wondering whether there is some way of avoiding doing a fully manual analysis.
It sounds like what you are asking for is an extention of the Halting Problem. I do not believe that such a thing is possible, even in theory.
Just answering the question "Will this line of code ever run?" would be very difficult if not impossible to do in the general case.
Edited to add:
Although the general case is intractable, see here for a partial solution: http://research.microsoft.com/apps/pubs/default.aspx?id=104919
Also, some have stated that doing the analysis by hand is the only option, but I don't believe that is really the correct way of looking at it. An intractable problem is still intractable even when a human being is added to the system/machine. Upon further reflection, I suppose that a 99% solution may be doable, and might even work as well as or better than a human.
You can run the algorithm over various size data sets, and you could then use curve fitting to come up with an approximation. (Just looking at the curve you create probably will be enough in most cases, but any statistical package has curve fitting).
Note that some algorithms exhibit one shape with small data sets, but another with large... and the definition of large remains a bit nebulous. This means that an algorithm with a good performance curve could have so much real world overhead that (for small data sets) it doesn't work as well as the theoretically better algorithm.
As far as code inspection techniques, none exist. But instrumenting your code to run at various lengths and outputting a simple file (RunSize RunLength would be enough) should be easy. Generating proper test data could be more complex (some algorithms work better/worse with partially ordered data, so you would want to generate data that represented your normal use-case).
Because of the problems with the definition of "what is large" and the fact that performance is data dependent, I find that static analysis often is misleading. When optimizing performance and selecting between two algorithms, the real world "rubber hits the road" test is the only final arbitrator I trust.
A short answer is that it's impossible because constants matter.
For instance, I might write a function that runs in O((n^3/k) + n^2). This simplifies to O(n^3) because as n approaches infinity, the n^3 term will dominate the function, irrespective of the constant k.
However, if k is very large in the above example function, the function will appear to run in almost exactly n^2 until some crossover point, at which the n^3 term will begin to dominate. Because the constant k will be unknown to any profiling tool, it will be impossible to know just how large a dataset to test the target function with. If k can be arbitrarily large, you cannot craft test data to determine the big-oh running time.
I am surprised to see so many attempts to claim that one can "measure" complexity by a stopwatch. Several people have given the right answer, but I think that there is still room to drive the essential point home.
Algorithm complexity is not a "programming" question; it is a "computer science" question. Answering the question requires analyzing the code from the perspective of a mathematician, such that computing the Big-O complexity is practically a form of mathematical proof. It requires a very strong understanding of the fundamental computer operations, algebra, perhaps calculus (limits), and logic. No amount of "testing" can be substituted for that process.
The Halting Problem applies, so the complexity of an algorithm is fundamentally undecidable by a machine.
The limits of automated tools applies, so it might be possible to write a program to help, but it would only be able to help about as much as a calculator helps with one's physics homework, or as much as a refactoring browser helps with reorganizing a code base.
For anyone seriously considering writing such a tool, I suggest the following exercise. Pick a reasonably simple algorithm, such as your favorite sort, as your subject algorithm. Get a solid reference (book, web-based tutorial) to lead you through the process of calculating the algorithm complexity and ultimately the "Big-O". Document your steps and results as you go through the process with your subject algorithm. Perform the steps and document your progress for several scenarios, such as best-case, worst-case, and average-case. Once you are done, review your documentation and ask yourself what it would take to write a program (tool) to do it for you. Can it be done? How much would actually be automated, and how much would still be manual?
Best wishes.
I am curious as to why it is that you want to be able to do this. In my experience when someone says: "I want to ascertain the runtime complexity of this algorithm" they are not asking what they think they are asking. What you are most likely asking is what is the realistic performance of such an algorithm for likely data. Calculating the Big-O of a function is of reasonable utility, but there are so many aspects that can change the "real runtime performance" of an algorithm in real use that nothing beats instrumentation and testing.
For example, the following algorithms have the same exact Big-O (wacky pseudocode):
example a:
huge_two_dimensional_array foo
for i = 0, i < foo[i].length, i++
for j = 0; j < foo[j].length, j++
do_something_with foo[i][j]
example b:
huge_two_dimensional_array foo
for j = 0, j < foo[j].length, j++
for i = 0; i < foo[i].length, i++
do_something_with foo[i][j]
Again, exactly the same big-O... but one of them uses row ordinality and one of them uses column ordinality. It turns out that due to locality of reference and cache coherency you might have two completely different actual runtimes, especially depending on the actual size of the array foo. This doesn't even begin to touch the actual performance characteristics of how the algorithm behaves if it's part of a piece of software that has some concurrency built in.
Not to be a negative nelly but big-O is a tool with a narrow scope. It is of great use if you are deep inside algorithmic analysis or if you are trying to prove something about an algorithm, but if you are doing commercial software development the proof is in the pudding, and you are going to want to have actual performance numbers to make intelligent decisions.
Cheers!
This could work for simple algorithms, but what about O(n^2 lg n), or O(n lg^2 n)?
You could get fooled visually very easily.
And if its a really bad algorithm, maybe it wouldn't return even on n=10.
Proof that this is undecidable:
Suppose that we had some algorithm HALTS_IN_FN(Program, function) which determined whether a program halted in O(f(n)) for all n, for some function f.
Let P be the following program:
if(HALTS_IN_FN(P,f(n)))
{
while(1);
}
halt;
Since the function and the program are fixed, HALTS_IN_FN on this input is constant time. If HALTS_IN_FN returns true, the program runs forever and of course does not halt in O(f(n)) for any f(n). If HALTS_IN_FN returns false, the program halts in O(1) time.
Thus, we have a paradox, a contradiction, and so the program is undecidable.
A lot of people have commented that this is an inherently unsolvable problem in theory. Fair enough, but beyond that, even solving it for any but the most trivial cases would seem to be incredibly difficult.
Say you have a program that has a set of nested loops, each based on the number of items in an array. O(n^2). But what if the inner loop is only run in a very specific set of circumstances? Say, on average, it's run in aprox log(n) cases. Suddenly our "obviously" O(n^2) algorithm is really O(n log n). Writing a program that could determine if the inner loop would be run, and how often, is potentially more difficult than the original problem.
Remember O(N) isn't god; high constants can and will change the playing field. Quicksort algorithms are O(n log n) of course, but when the recursion gets small enough, say down to 20 items or so, many implementations of quicksort will change tactics to a separate algorithm as it's actually quicker to do a different type of sort, say insertion sort with worse O(N), but much smaller constant.
So, understand your data, make educated guesses, and test.
I think it's pretty much impossible to do this automatically. Remember that O(g(n)) is the worst-case upper bound and many functions perform better than that for a lot of data sets. You'd have to find the worst-case data set for each one in order to compare them. That's a difficult task on its own for many algorithms.
You must also take care when running such benchmarks. Some algorithms will have a behavior heavily dependent on the input type.
Take Quicksort for example. It is a worst-case O(n²), but usually O(nlogn). For two inputs of the same size.
The traveling salesman is (I think, not sure) O(n²) (EDIT: the correct value is 0(n!) for the brute force algotithm) , but most algorithms get rather good approximated solutions much faster.
This means that the the benchmarking structure has to most of the time be adapted on an ad hoc basis. Imagine writing something generic for the two examples mentioned. It would be very complex, probably unusable, and likely will be giving incorrect results anyway.
Jeffrey L Whitledge is correct. A simple reduction from the halting problem proves that this is undecidable...
ALSO, if I could write this program, I'd use it to solve P vs NP, and have $1million... B-)
I'm using a big_O library (link here) that fits the change in execution time against independent variable n to infer the order of growth class O().
The package automatically suggests the best fitting class by measuring the residual from collected data against each class growth behavior.
Check the code in this answer.
Example of output,
Measuring .columns[::-1] complexity against rapid increase in # rows
--------------------------------------------------------------------------------
Big O() fits: Cubic: time = -0.017 + 0.00067*n^3
--------------------------------------------------------------------------------
Constant: time = 0.032 (res: 0.021)
Linear: time = -0.051 + 0.024*n (res: 0.011)
Quadratic: time = -0.026 + 0.0038*n^2 (res: 0.0077)
Cubic: time = -0.017 + 0.00067*n^3 (res: 0.0052)
Polynomial: time = -6.3 * x^1.5 (res: 6)
Logarithmic: time = -0.026 + 0.053*log(n) (res: 0.015)
Linearithmic: time = -0.024 + 0.012*n*log(n) (res: 0.0094)
Exponential: time = -7 * 0.66^n (res: 3.6)
--------------------------------------------------------------------------------
I guess this isn't possible in a fully automatic way since the type and structure of the input differs a lot between functions.
Well, since you can't prove whether or not a function even halts, I think you're asking a little much.
Otherwise #Godeke has it.
I don't know what's your objective in doing this, but we had a similar problem in a course I was teaching. The students were required to implement something that works at a certain complexity.
In order not to go over their solution manually, and read their code, we used the method #Godeke suggested. The objective was to find students who used linked list instead of a balansed search tree, or students who implemented bubble sort instead of heap sort (i.e. implementations that do not work in the required complexity - but without actually reading their code).
Surprisingly, the results did not reveal students who cheated. That might be because our students are honest and want to learn (or just knew that we'll check this ;-) ). It is possible to miss cheating students if the inputs are small, or if the input itself is ordered or such. It is also possible to be wrong about students who did not cheat, but have large constant values.
But in spite of the possible errors, it is well worth it, since it saves a lot of checking time.
As others have said, this is theoretically impossible. But in practice, you can make an educated guess as to whether a function is O(n) or O(n^2), as long as you don't mind being wrong sometimes.
First time the algorithm, running it on input of various n. Plot the points on a log-log graph. Draw the best-fit line through the points. If the line fits all the points well, then the data suggests that the algorithm is O(n^k), where k is the slope of the line.
I am not a statistician. You should take all this with a grain of salt. But I have actually done this in the context of automated testing for performance regressions. The patch here contains some JS code for it.
If you have lots of homogenious computational resources, I'd time them against several samples and do linear regression, then simply take the highest term.
It's easy to get an indication (e.g. "is the function linear? sub-linear? polynomial? exponential")
It's hard to find the exact complexity.
For example, here's a Python solution: you supply the function, and a function that creates parameters of size N for it. You get back a list of (n,time) values to plot, or to perform regression analysis. It times it once for speed, to get a really good indication it would have to time it many times to minimize interference from environmental factors (e.g. with the timeit module).
import time
def measure_run_time(func, args):
start = time.time()
func(*args)
return time.time() - start
def plot_times(func, generate_args, plot_sequence):
return [
(n, measure_run_time(func, generate_args(n+1)))
for n in plot_sequence
]
And to use it to time bubble sort:
def bubble_sort(l):
for i in xrange(len(l)-1):
for j in xrange(len(l)-1-i):
if l[i+1] < l[i]:
l[i],l[i+1] = l[i+1],l[i]
import random
def gen_args_for_sort(list_length):
result = range(list_length) # list of 0..N-1
random.shuffle(result) # randomize order
# should return a tuple of arguments
return (result,)
# timing for N = 1000, 2000, ..., 5000
times = plot_times(bubble_sort, gen_args_for_sort, xrange(1000,6000,1000))
import pprint
pprint.pprint(times)
This printed on my machine:
[(1000, 0.078000068664550781),
(2000, 0.34400010108947754),
(3000, 0.7649998664855957),
(4000, 1.3440001010894775),
(5000, 2.1410000324249268)]
I'd like to have a solid understanding of when (ignoring available memory space) it makes sense to store the result of a comparison instead of recalculating it. What is the tipping point for justifying the time cost incurred by storage? Is it 2, 3, or 4 comparisons? More?
For example, in this particular case, which option (in general) will perform better in terms of speed?
Option 1:
int result = id.compareTo(node.id);
return result > 0 ? 1 : result < 0 ? -1 : 0;
Option 2:
return id.compareTo(node.id) > 0 ? 1 : id.compareTo(node.id) < 0 ? -1 : 0;
I tried to profile the two options myself in order to answer my own question, but I don't have much experience with this sort of performance testing and, as such, would rather get a more definitive answer from someone with either more experience or else a better grasp of the theoretical elements involved.
I know it's not a big deal and that most of the time the difference will be negligible. However, I'm a perfectionist, and I'd really just like to resolve this particular issue so that I can get on with my life, haha.
Additionally, I think the answer is likely to prove enlightening in regards to similar situations I may encounter in the future wherein the difference might very well be significant (such as when the cost of a comparison or memory allocation is either unable to be incurred or else complex enough to cause a real issue concerning performance).
Answers should be relevant to programming with Java and not other languages, please.
I know I've mentioned it a few times already, but PLEASE focus answers ONLY on the SPEED DIFFERENCE! I am well aware that many other factors can and should be taken into account when writing code, but here I want just a straight-forward argument for which is FASTER.
Experience tells me that option 1 should be faster, because you're making just one call to the compare method and storing the result for reuse. Facts that support this belief are that local variables live on the stack and making a method call involves a lot more work from the stack than just pushing a value onto it. However profiling is the best and safest way to compare two implementations.
The first thing to realise is that the java compiler and JVM together may optimise your code how it wishes to get the job done most efficiently (as long as certain rules are followed). Chances are there is no difference in performance, and chances are also that what is actually executed is not what you think it is.
One really important difference however is in debugging: if you put a break point on the return statement for the store-in-variable version, you can see what was returned from the call, otherwise you can't see that in a debugger. Even more handy is when you seemingly uselessly store the value to be returned from the method in a variable, then return it, so you may see what's going to be returned from a method while debugging, otherwise there's no way to see it.
Option 1 cannot be slower than 2, if the compiler optimizes then both could be equal, but then still 1) is more readable, compacter, and better testable.
So there is no argument for Option 2).
If you like you could change to final int result = ....
Although i expect that the compiler is so clever that the final keyword makes no difference in this case, and the final makes the code a bit less readable.
option1 one always preferred one ,because here the real world scenarion
----->ok lets
1) thread exceution at id.compareTo(node.id) > 0 ? 1 , in this process some other thread
changes the value of node.id right after id.compareTo(node.id) > 0 ? 1 before going to
id.compareTo(node.id) < 0 ? -1 : 0 this check , the result not identical?
performance wise option1 has more performance when there is bit of functionality exisist in checking.
When does it make sense to store the result of a comparison versus recalculating a comparison in terms of speed?
Most of the time, micro-optimizations like option #1 versus option #2 don't make any significant difference. Indeed, it ONLY makes a significant performance difference if:
the comparison is expensive,
the comparison is performed a large number of times, AND
performance matters.
Indeed, the chances are that you have alrady spent more time and money thinking about this than will be saved over the entire useful lifetime of the application.
Instead of focussing on performance, you should be focussing on making your code readable. Think about the next person who has to read and modify the code, and make it so that he/she is less likely to misread it.
In this case, the first option is more readable than the second one. THAT is why you should use it, not performance reasons. (Though, if anything, the first version is probably faster.)
A simple question about java performance. If I write a loop
for(int i=0;i<n;++i) buffer[(k++)%buffer.length]=something;
in which something is a non trivial digital filter. With this code I have a modulo operation at every write. This feels a bit silly because the Java VM will check that anyway. Therefore I would assume that a consturct using an ArrayIndexOutOfBounds would be faster (the buffer contains 1'000'000 numbers, so we won't have that overflow too often)
int i;
try
{
for(i=0;i<n;++i,++k) buffer[k]=something;
}
catch (ArrayIndexOutOfBounds e)
{
k=0;
for(;i<n;++i,++k) buffer[k]=something;
}
A third solution could be to calculate in advance at what point we would overflow and then split the loop manually in two. The code to determine how far the loop can go is executed every 768 samples, so from that perspective it might be slower than the catch method.
The problem here, aside from the silly duplication of code, which I will gladly sacrifice on the altar of performance, is that we have more code. And there it often appears that java doesn't optimize as well as with smaller routines.
So my question is: what strategy is the most performant ? Anybody experience with this type of construct ? Also, can anybody shed a light on the performance on android devices of both constructs ?
Your answer depends on your target platform. You've added the Android tag, so I'm going to answer in terms of Dalvik and (let's say) a Nexus 4.
First, the ARMv7-A architecture doesn't provide integer division instructions. Your modulus will be computed in software every time through the loop, which is going to slow you down a bit. (This is why it's best to use power-of-2 sizes for hash tables -- you can use a bit mask rather than a mod.)
Second, throwing an exception is expensive. The VM has to create the exception object, and initialize it with a snapshot of the current stack. In addition to the immediate overhead, you're creating X number of objects that have to be cleaned up later, and increasing the possibility that the VM will have to stop you mid-computation and collect garbage.
Third, generally speaking, any computation you can pull out of the inner loop represents a win, so manually testing for array overrun on every loop iteration is unsatisfying. You don't want to add a test for k vs. length to the loop header or body if you can avoid it. (A JIT compiler may do something like this -- if it can tell that the array index never walks off the end of the array, it doesn't have to do a per-element bounds check.)
Based on the (still slightly vague) sense of what you're doing and how many times you're doing it, I'd say the best option is to compute the "break" position ahead of the loop, and iterate the necessary number of times.
I'm curious to know how this turns out in practice. :-)
Recently someone told me that a comparison involving smaller integers will be faster, e.g.,
// Case 1
int i = 0;
int j = 1000;
if(i<j) {//do something}
// Case 2
int j = 6000000;
int i = 500000;
if(i<j){//do something else}
so, comparison (if condition) in Case 1 should be faster than that in Case 2. In Case 2 the integers will take more space to store but that can affect the comparison, I am not sure.
Edit 1: I was thinking of binary representation of i & j, e.g., for i=0, it will be 0 and for j=1000 its 1111101000 (in 32-bit presentation it should be: 22 zeros followed by 1111101000, completely forgot about 32-bit or 64-bit representation, my bad!)
I tried to look at JVM Specification and bytecode of a simple comparison program, nothing made much sense to me. Now the question is how does comparison work for numbers in java? I guess that will also answer why (or why not) any of the cases will be faster.
Edit 2: I am just looking for a detailed explanation, I am not really worried about micro optimizations
If you care about performance, you really only need to consider what happens when the code is compiled to native code. In this case, it is the behaviour of the CPU which matters. For simple operations almost all CPUs work the same way.
so, comparison (if condition) in Case 1 should be faster than that in Case 2. In Case 2 the integers will take more space to store but that can affect the comparison, I am not sure.
An int is always 32-bit in Java. This means it always takes the same space. In any case, the size of the data type is not very important e.g. short and byte are often slower because the native word size is 32-bit and/or 64-bit and it has to extract the right data from a larger type and sign extend it.
I tried to look at JVM Specification and bytecode of a simple comparison program, nothing made much sense to me.
The JLS specifies behaviour, not performance characteristics.
Now the question is how does comparison work for numbers in java?
It works the same way it does in just about every other compiled language. It uses a single machine code instruction to compare the two values and another to perform a condition branch as required.
I guess that will also answer why (or why not) any of the cases will be faster.
Most modern CPUs use branch prediction. To keep the CPUs pipeline full it attempt to predict which branch will be taken (before it know it is the right one to take) so there is no break in the instruction executed. WHen this works well the branch has almost no cost. When it mis-predicts, the pipeline can be filled with instructions for a branch which was the wrong guess and this can cause a significant delay as it clears the pipeline and takes the right branch. In the worst case it can mean 100s of clock cycles delay. e.g.
Consider the following.
int i; // happens to be 0
if (i > 0) {
int j = 100 / i;
Say the branch is usually taken. This means the pipeline could be loaded with an instruction which triggers an interrupt (or Error in Java) before it knows the branch will not be taken. This can result in a complex situation which takes a while to unwind correctly.
These are called Mispredicted branches In short a branch which goes the same way every/most of the time is faster, a branch which suddenly changes or is quite random (e.g. in sorting and tree data structures) will be slower.
Int might be faster on 32-bit system and long might be faster on 64-bit system.
Should I bother about it? No you dont code for system configuration you code based on what are your requirements. Micro optinizations never work and they might introduce some unprecedented issues.
Small Integer objects can be faster because java treats them specific.
All Integer values for -128 && i <= 127 are stored in IntegerCache an inner class of Integer
The following code in python takes very long to run. (I couldn't wait until the program ended, though my friend told me for him it took 20 minutes.)
But the equivalent code in Java runs in approximately 8 seconds and in C it takes 45 seconds.
I expected Python to be slow but not this much, and in case of C which I expected to be faster than Java was actually slower. Is the JVM using some loop unrolling technique to achieve this speed? Is there any reason for Python being so slow?
import time
st=time.time()
for i in xrange(0,100000):
for j in xrange(0,100000):
continue;
print "Time taken : ",time.time()-st
Your test is not measuring anything meaningful.
A language's performance in the real world has little to do with how quickly it executes a tight loop.
Frankly, I'm intrigued that C and Java took as long as they did; I would have expected both of their compilers to realize that there was nothing happening inside the inner loop, and have optimized both of them away into nonexistence (and 0 seconds).
Python, on the other hand, is still interpreted (I could be wrong about this). In any case, it looks like the outer loop is needing to construct 100,000 xrange objects on which to run the empty inner loop, and that's unlikely to be optimized away.
So all you're really measuring is various compilers' ability to see through the fact that no real computing work is being done.
The lesson is: Performance is never what you expect. Therefore, always measure, never believe.
Some reasons why you might see these numbers (and from the first sentence, some of these might be completely wrong):
C is compiled for an "i586" processor (also called Pentium). That CPU was sold from 1993 to about 2000. Have you seen one lately? Guess not. So the C code isn't really optimized for what your CPU can do (or to put it another way around: Today's CPUs try very hard to be a fast Pentium CPU). Java, OTOH, is compiled for your CPU as the code is loaded. It can pull some tricks that C simply can't. The price is that the C program starts in 15ms while the Java program needs 4 Seconds.
Python has no JIT (just in time compiler), yet. All code is converted into bytecode which is then interpreted. This means the loop above is turned into a dozen bytecode instructions which are then interpreted by a C program. That just takes time. Python is not meant for huge loops, it's meant for smart algorithms which you simply can't express in any other language (at least not with the same amount of code and readability).
So just as it doesn't make sense to go shopping with a 18t truck (you can transport anything but you won't find a space to park it), chose your programming language according to the problem you need to solve. It has to be small&fast? C. Just fast? Java. Flexible? Python. Flexible&Fast: Python with a helper library in C (like NumPy).
Is there any reason for Python being so slow?
Yes.
But what does it matter? You've created 100,000 xrange objects. Why? What does that matter? What is your real question on performance? What algorithm do you actually have that's actually too slow?
for i in xrange(0,100000): # Creates one xrange object
for j in xrange(0,100000): # Creates a fresh xrange object each time through the loop
for i in xrange(0, 10000):
for j in xrange(0, 10000):
pass
or
for i in xrange(0, 100000000):
pass
Python 2.6.5 - Time taken : 8.50866484642
PyPy 1.3 - Time taken : 1.55999398232
reason of slow work not in creation of xrange objects
gcc 4.2 with the -O1 flag or higher optimize away the loop and the program takes 1 milli second to execute.
This benchmark is not very representative as it is very far from any real world use.
You're doing a nested loop for a reason, and you never leave it empty.
Python doesn't optimize away the loop, although I see no technical reason why it couldn't.
Python is slower than C because it's further from the machine language. xrange is a nice abstraction but it adds a heavy layer of machine code compared to a simple C loop.
C source:
int main( void ){
int i, j;
for (i=0;i<100000;i++){
for (j=0;j<100000;j++){
continue;
}
}
return 0;
}
A good compiler would optimise away the loop.
Assuming the loop isn't optimised away, I'd expect Python to be something like 100 times slower than the C version