Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
What I need is to an explanation on how to determine it, here are few examples and hope you can help me find their complexity using Big-O Notation:
For each of the following, find the dominant term(s) having the sharpest increase in n and give the time complexity using Big-O notation.
Consider that we always have n>m.
Expression Dominant term(s) O(…)
5+ 0.01n^3 + 25m^3
500n +100n^1.5 + 50nlogn
0.3n+ 5n^1.5 +2.5n^1.75
n^2logn +n(log2m)^2
mlog3n +nlog2n
50n+5^3 m + 0.01n^2
It's fairly simple.
As n rises to large numbers (towards infinity), some parts of the expression becomes meaningless, so remove them.
Also, O() notation is relativistic, not absolute, meaning there is no scale, so constant factors are meaningless, so remove them.
Example: 100 + 2*n. At low numbers 100 is the main contributor to the result, but as n increases, it becomes meaningless. Since there is no scale, n and 2n is the same thing, i.e. a linear curve, so the result is O(n).
Or said another way, you choose the most extreme curve in the expression from this graph:
(source: bigocheatsheet.com)
Let's take your second example: 500n +100n^1.5 + 50nlogn
1st part is O(n).
2nd part is O(n^1.5).
3rd part is O(nlogn).
Fastest rising curve is O(n^1.5), so that is the answer.
Related
This question already has answers here:
What is a plain English explanation of "Big O" notation?
(43 answers)
Closed 4 years ago.
Why do we just take the highest degree of polynomial for Big Oh notation. I understand that we can drop the constants as they won't matter for a very high value of 'n'.
But, say an algorithm takes (nlogn + n) time, then why do we ignore 'n' in this case. And the big Oh comes out to be O(nlogn).
Big Oh has to be the upper bound of time taken by the algorithm. So, shouldn't it be (nlogn + n), even for very high values of n?
Because O is asymptotic comparison which answers the question how the function compare for large n. Lower degrees of polynomial become insignificant for function behavior once n is sufficiently large.
One way to see that is: "nlog(n) + n" is smaller than "2nlog(n)". Now you can drop the 2.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In a project I have for my algorithm class we have to run 5 different sorting methods of unknown types and gather running time data for each of them using the doubling method for the problem size. We then have to use the ratio of the running times to calculate what the time complexity functions. The sorting methods used are selection sort, insertion sort, merge sort, and quicksort (randomized and non-randomized). We have to use empirical analysis to determine which type of sorting method is used in the five unknown methods in the program. My question is how does one go from the ratio to the function. I know that N = 2^k so we can use log(base2)ratio = k but I am not sure how that correlates with the time complexity of say mergesort which is O(N * log N).
The Big-O notation more or less describes a function, where the input N is the size of the collection, and the output is how much time will be taken. I would suggest benchmarking your algorithms by running a variety of sample input sizes, and then collecting the running times. For example, for selection sort you might collect this data:
N | running time (ms)
1000 | 0.1
10000 | 10
100000 | 1000
1000000 | 100000
If you plot this, using a tool like R or Matlab, or maybe Excel if you are feeling lazy, you will see that the running time varies with the square of the sample size N. That is, multiplying the sample size by 10 results in a 100-fold increase in running time. This is O(N^2) behavior.
For the other algorithms, you may collect similar benchmark data, and also create plots.
Note that you have to keep in mind things like startup time which Java can take to begin running your actual code. The way to deal with this is to take many data points. Overall, linear, logarithmic, etc. behavior should still be discernible.
On a log-log graph (log of size vs log of running time) you will find that O(n^k) is a line of slope k. That will let you tell O(n) from O(n^2) very easily.
To tell O(n) from O(n log(n)) just graph f(n)/n vs log(n). A function that is O(n) will look like a horizontal line, and a function that is O(n log(n)) will look like a line with slope 1.
Don't forget to throw both ordered and unordered data at your methods.
you can just look at the growth by time;
If linear (O(n)), doubling the input space just doubles the time, t -> 2t
If quasi-linear(O(nlogn)), doubling the space just increase by 2n(log2n), t->2t(log2t)
If quadratic (O(n^2)), doubling the input space just quadratically,t -> 4t^2
Note that the timings are theoretical. expect the values around some threshold.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In a past examination paper, there was a question that gave two methods to check if int[] A contained all the same values as int[] B (two unsorted arrays of size N), an engineer had to decide which implementation he would use.
The first method used a single for loop with a nested call to linear search; its asymptotic run time was calculated to be theta(n^2) and additional memory usage for creating copies of A and B is around (N*4N + 4N) bytes
The second method used an additional int[] C, that is a copy of B, which is sorted. A for loop is used with a nested binary search; it's asymptotic run time was calculated as theta(nlog(n^2)) and additional memory usage for creating copies of A and B is around (4N + 4N + N*4N + 4N) bytes (the first two 4N's are due to C being a copy of B, and the subsequent copy of C being created in the sort(C); function)
The final question asks which implementation the Engineer should use, I believe that a faster algorithm would be the better option, as for larger inputs the faster algorithm would cut the time of computation drastically, although the draw back is that with larger inputs, he runs the risk of an OutOfMemory Error, I understand one could state either method, depending on the size of the arrays, although my question is, in the majority of cases, which implementation is the better implementation?
The first alg. has complexity of theta (n^2) and the second of theta ( n log(n^2)).
Hence, the second one is much faster for n great enough.
As you mentioned, the memory usage comes into account and one would argue that the second one consumes a lot more memory but that is not true.
The first alg. consumes: n*4n+4n= 4n^2 + 4n
The second alg. consumes: 4n+4n+4n+n*4n=4n^2+12n
Suppose n is equal to 1000 then the first alg consumes 4004000 and the second 4012000 memory. So there is no big difference in memory consumption between the two algorithms.
So, from a memory consumption perspective it does not really matter which alg. you choose and in terms of complexity they both consumes theta (n^2) memory.
A relevant question to many a programmer, but the answer depends entirely on the context. If the software runs on a system with a little memory, then the algorithm with a smaller footprint will likely be better. On the other hand, a real-time system needs speed; so the faster algorithm would likely be better.
It is important to also recognize the more subtle issues that arise during execution, e.g. when the additional memory requirements of the faster algorithm force the system to use paging and, in turn, slow down execution.
Thus, it is important to understand in-context the benefit-cost trade off for each algorithm. And as always, emphasize code readability and sound algorithm design/integration.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a Rectangle class. It has a length, breadth and area (all ints). I want to hash it such that every rectangle with the same length and breadth hashes to the same value. What is a way to do this?
EDIT: I understand its a broad question. Thats why I asked for "a" way to do it. Not the best way.
A good and simple scheme is to calculate the hash for a pair of integers as follows:
hash = length * CONSTANT + width
Empirically, you will get best results (i.e. the fewest number collisions) if CONSTANT is a prime number. A lot of people1 recommend a value like 31, but the best choice depends on the most likely range of the length and width value. If they are strictly bounded, and small enough, then you could do better than 31.
However, 31 is probably good enough for practical purposes2. A few collisions at this level is unlikely to make a significant performance difference, and even a perfect hashing function does not eliminate collisions at the hash table level ... where you use the modulus of the hash value.
1 - I'm not sure where this number comes from, or whether there are empirical studies to back it up ... in the general case. I suspect it comes from hashing of (ASCII) strings. But 31 is prime ... and it is a Mersenne prime (2^7 - 1) which means it could be computed using a shift and a subtraction if hardware multiple is slow.
2 - I'm excluding cases where you need to worry about someone deliberately creating hash function collisions in an attempt to "break" something.
You can use the Apache Commons library, which has a HashCodeBuilder class. Assuming you have a Rectangle class with a width and a height, you can add the following method:
#Override
public int hashCode(){
return new HashCodeBuilder().append(width).append(height).append(children).toHashCode();
}
What you want (as clarified in your comment on the question) is not possible. There are N possible hashCodes, one for each int, where N is approximately 4.2 billion. Assuming rectangles must have positive dimensions, there are ((N * N) / 4) possible rectangles. How do you propose to make them fit into N hashCodes? When N is > 4, you have more possible rectangles than hashCodes.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I did an algoritms refresher and I (re)read about a sorting algorithm that runs in linear time, namely Counting Sort.
To be honest I had forgotten about it.
I understand the structure and logic and the fact that it runs in linear time is a very attactive quality.
But I have the folowing question:
As I understanding the concrete implementation of the algorithm relies on 2 things:
1) The range of input numbers is small (otherwise the intermediate array will be huge and with many gaps).
2) We actually know the range of numbers.
Taking that these 2 assumptions are correct (please correct me otherwise), I was wondering what is the best application domain that this algorithm applies to.
I mean specifically in Java, is an implementation like the following Java/Counting sort sufficient:
public static void countingSort(int[] a, int low, int high)
{
int[] counts = new int[high - low + 1]; // this will hold all possible values, from low to high
for (int x : a)
counts[x - low]++; // - low so the lowest possible value is always 0
int current = 0;
for (int i = 0; i < counts.length; i++)
{
Arrays.fill(a, current, current + counts[i], i + low); // fills counts[i] elements of value i + low in current
current += counts[i]; // leap forward by counts[i] steps
}
}
or it is not a trivial matter to come up with the high and low?
Is there a specific application in Java that counting sort is best suited for?
I assume there are subtleties like these otherwise why would anyone bother with all the O(nlogn) algorithms?
Algorithms are not about the language, so this is language-agnostic. As you have said - use counting sort when the domain is small. If you have only three numbers - 1, 2, 3 it is far better to sort them with counting sort, than a quicksort, heapsort or whatever which are O(nlogn). If you have a specific question, feel free to ask.
It is incorrect to say counting sort is O(n) in the general case, the "small range of element" is not a recommendation but a key assumption.
Counting sort assumes that each of the elements is an integer in the
range 1 to k, for some integer k. When k = O(n), the Counting-sort
runs in O(n) time.
In the general situation, the range of key k is independent to the number of element n and can be arbitrarily large. For example, the following array:
{1, 1000000000000000000000000000000000000000000000000000000000000000000000000000}
As for the exact break even point value of k and n where counting sort outperform traditional sort, it is hugely dependent on the implementation and best done via benchmarking (trying both out).