time complexity for java arrayList remove(element) - java

I was trying to graph the Time Complexity of ArrayList's remove(element) method.
My understanding is that it should be O(N), however, its giving me a O(1). Can anyone point out what i did wrong here??
Thank you in advance.
public static void arrayListRemoveTiming(){
long startTime, midPointTime, stopTime;
// Spin the computer until one second has gone by, this allows this
// thread to stabilize;
startTime = System.nanoTime();
while (System.nanoTime() - startTime < 1000000000) {
}
long timesToLoop = 100000;
int N;
ArrayList<Integer> list = new ArrayList<Integer>();
// Fill up the array with 0 to 10000
for (N = 0; N < timesToLoop; N++)
list.add(N);
startTime = System.nanoTime();
for (int i = 0; i < list.size() ; i++) {
list.remove(i);
midPointTime = System.nanoTime();
// Run an Loop to capture other cost.
for (int j = 0; j < timesToLoop; j++) {
}
stopTime = System.nanoTime();
// Compute the time, subtract the cost of running the loop
// from the cost of running the loop.
double averageTime = ((midPointTime - startTime) - (stopTime - midPointTime))
/ timesToLoop;
System.out.println(averageTime);
}
}

The cost of a remove is O(n) as you have to shuffle the elements to the "right" of that point "left" by one:
Delete D
|
V
+-----+-----+-----+-----+-----+-----+-----+
| A | B | C | D | E | F | G |
+-----+-----+-----+-----+-----+-----+-----+
<------------------
Move E, F, G left
If your test code is giving you O(1) then I suspect you're not measuring it properly :-)
The OpenJDK source, for example, has this:
public E remove(int index) {
rangeCheck(index);
modCount++;
E oldValue = elementData(index);
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index, numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
and the System.arraycopy is the O(n) cost for this function.
In addition, I'm not sure you've thought this through very well:
for (int i = 0; i < list.size() ; i++)
list.remove(i);
This is going to remove the following elements from the original list:
0, 2, 4, 8
and so on, because the act of removing element 0 shifts all other elements left - the item that was originally at offset 1 will be at offset 0 when you've deleted the original offset 0, and you then move on to delete offset 1.

First off, you are not measuring complexity in this code. What you are doing is measuring (or attempting to measure) performance. When you graph the numbers (assuming that they are correctly measured) you get a performance curve for a particular use-case over a finite range of values for your scaling variable.
That is not the same as a computational complexity measure; i.e. big O, or related Bachman-Landau notations. These are about mathematical limits as the scaling variable tends to infinity.
And this is not just a nitpick. It is quite easy to construct examples1 where performance characteristics change markedly as N gets very large.
What are doing when you graph performance over a range of values and fit a curve is to estimate the complexity.
1 - And a real example is the average complexity of various HashMap functions which switch from O(1) to O(N) (with a very small C) when N reaches 2^31. The modality is because the hash array cannot grow beyond 2^31 slots.
The second point is that that the complexity of ArrayList.remove(index) is sensitive to the value of index as well as the list length.
The "advertised" complexity of O(N) for the average and worst cases.
In the best case, the complexity is actually O(1). Really!
This happens when you remove the last element of the list; i.e. index == list.size() - 1. That can be performed with zero copying; look at the code that #paxdiablo included in his Answer.
Now to your Question. There are a number of reasons why your code could give incorrect measurements. For example:
You are not taking account of JIT compilation overheads and other JVM warmup effects.
I can see places where the JIT compiler could potentially optimize away entire loops.
The way you are measuring the time is strange. Try treating this as algebra.
((midPoint - start) - (stop - midPoint)) / count;
Now simplify ... and the midPoint term cancels out.
You are only removing half of the elements from the list, so you only measuring over the range 50,000 to 100,000 of your scaling variable. (And I expect you are then plotting against the scaling variable; i.e. you are plotting f(N + 5000) against N.
The time intervals you are measuring could be too small for the clock resolution on your machine. (Read the javadocs for nanoTime() to see what resolution it guarantees.)
I recommend that people wanting to avoid mistakes like the above should read:
How do I write a correct micro-benchmark in Java?

remove(int) removes the element at the ith INDEX which is O(1)
You probably want remove( Object ) which is O(N) you would need to call remove(Integer.valueOf(i))
it would be more obvious if your list didn't have the elements in order

Related

Big O notation of the following min and max recursive code in Java

So n is the length of array a and p is an array of int of len 2.both elements are zero in p. The first call is findbigO(a, n-1, p).
findbigO(int[] a, int i, int[] p)
if (i == 0) {
p[0] = a[0];
p[1] = a[0];
} else {
findbigO(a, iā€1, p);
if (a[i] < p[0]]) {
p[0] = a[i];
}
if (a[i] > p[1]]) {
p[1] = a[i];
}
}
The code basically finds the max and min in an array and stores them in an different array P. I am trying to figure out the Big O of this code. i think its Big O of n since recursion is called n times depending on the length of the array. what do you guys think
Well, i in the first call is by definition n-1, i.e. of the same magnitude as n. Thus, for big-O-over-n notation purposes, the initial i can be treated as n.
The code itself is, other than the recursive invocation, constant time: There are no fors or whiles or any other way in which this code's # of executions is affected by anything.
The recursive call neccessarily marches towards the end condition (i == 0, when no recursion occurs), and does so in O(n) time: In fact, after exactly n steps, 0 will have been reached.
Thus, we have an O(1) 'loop' being executed O(i-initial) times, where i-initial is the same magnitude as n, which combines to O(1) * O(n) which is just O(n).
To help you out and to confirm big O notation, here's the 'point' of big-O:
Make a 2D line graph. On the x-axis, put 'n'. On the y-axis, put 'time taken by the CPU'.
Then fill in this chart. It'll be messy at first (maybe in one of the runs, your winamp switches songs or whatnot), but go far enough to the right and the algorithmic complexity of the input will start being the deciding factor. It 'balances out', in other words, into a recognizable graph. What does that graph look like?
For this algorithm, a straight line, that is not horizontal. In other words, it looks like y = C*x with C being some constant. That's what O(n) means: That graph will eventually stabilize and look like y = C*x does, for some C.
O(n^2) would mean: The graph eventually stabilizes into something that looks like y = x^2. That's also why O(x^2 + x) is not a thing, because y = x^2 + x, once you go far enough to the right, looks like just y = x^2 does on its right flank.
findbigO(n) = findbigO(n-1) + (1)
findbigO(n) = (findbigO(n-2) + O(1)) + (1)
...
findbigO(n) = findbigO(n-n) + n*O(1)
findbigO(n) = findbigO(0) + n*O(1)
findbigO(n) = O(1) + n*O(1)
findbigO(n) = O(1) + n*O(1)
findbigO(n) = O(1) + O(n)
findbigO(n) <= O(n) + O(n)
findbigO(n) <= 2*O(n)
findbigO in O(n)

I believe this algorithm is O(N). Am I correct?

This algorithm reverses an array of N integers. I believe this algorithm is O(N) because for each loop iteration, the four lines of code are executed once thus completing the job in 4N time.
public static void reverseTheNumbers(int[] list) {
for (int i = 0; i < list.length / 2; i++) {
int j = list.length - 1 - i;
int temp = list[i];
list[i] = list[j];
list[j] = temp;
}
}
There isn't such a thing as 4N time. The algorithm is linear because as you increase the size of the input the runtime of the algorithm increases proportionally. In other words if you doubled the size of list you would expect the algorithm to take twice as long.
It doesn't matter how many operations you do inside your loop - as long as they are each constant time (relative to the input) the runtime of the loop is determined simply by the number of iterations.
Put another way, these four statements are - all together - an O(1) operation.
int j = list.length - 1 - i;
int temp = list[i];
list[i] = list[j];
list[j] = temp;
There's nothing significant about the fact that this sequence of steps is expressed in four statements in Java syntax - experimenting with javap suggests these four lines compiles into ~20 bytecode commands, and who knows how many processor instructions that bytecode gets converted into. The good news is Big-O notation works the same regardless of the particular syntax - a sequence of operations is O(1) or constant time if its execution time is the same regardless of the input.
Therefore you're doing an O(1) operation N times; aka O(N).
Yes, you are correct. The number of operations is linearly dependent on the size of the array (N), making it an O(N) algorithm.
Yes, the complexity of the algorithm is O(n).
However, the exact "time" (because there are no constant factors in asymptotic complexity, check comment below) is not 4 times the size of the array, we could say it is 1/2*(c1+c2+c3+c3) times the size of the array, where 1/2 corresponds to each loop iteration and each c corresponds to the time needed for each operation inside theloop.
It would be 4 times the size of the array, if the algorithm was iterating the whole array 4 times.

Complexity of Bubble Sort

I have seen at lot of places, the complexity for bubble sort is O(n2).
But how can that be so because the inner loop should always runs n-i times.
for (int i = 0; i < toSort.length -1; i++) {
for (int j = 0; j < toSort.length - 1 - i; j++) {
if(toSort[j] > toSort[j+1]){
int swap = toSort[j+1];
toSort[j + 1] = toSort[j];
toSort[j] = swap;
}
}
}
And what is the "average" value of n-i ? n/2
So it runs in O(n*n/2) which is considered as O(n2)
There are different types of time complexity - you are using big O notation so that means all cases of this function will be at least this time complexity.
As it approaches infinity this can be basically n^2 time complexity worst case scenario. Time complexity is not an exact art but more of a ballpark for what sort of speed you can expect for this class of algorithm and hence you are trying to be too exact.
For example the theoretical time complexity might very well be n^2 even though it should in theory be n*n-1 because of whatever unforeseen processing overhead might be performed.
Since outer loop runs n times and for each iteration inner loop runs (n-i) times , the total number of operations can be calculated as
n*(n-i) = O(n2).
It's O(n^2),because length * length.

Finding maximum in O(logn) time?

I've always taken it for granted that iterative search is the go-to method for finding maximum values in an unsorted list.
The thought came to me rather randomly, but in a nutshell: I believe I can accomplish the task in O(logn) time with n being the input array's size.
The approach piggy-backs on merge sort: divide and conquer.
Step 1: divide the findMax() task to two sub-tasks findMax(leftHalf) and findMax(rightHalf). This division should be finished in O(logn) time.
Step 2: merge the two maximum candidates back up. Each layer in this step should take constant time O(1), and there are, per the previous step, O(logn) such layers. So it should also be done in O(1) * O(logn) = O(logn) time (pardon the abuse of notation). This is so wrong. Each comparison is done in constant time, but there are 2^j/2 such comparisons to be done (2^j pairs of candidates at level j-th).
Thus, the whole task should be completed in O(logn) time. O(n) time.
However, when I try to time it, I get results that clearly reflect a linear O(n) running time.
size = 100000000 max = 0 time = 556
size = 200000000 max = 0 time = 1087
size = 300000000 max = 0 time = 1648
size = 400000000 max = 0 time = 1990
size = 500000000 max = 0 time = 2190
size = 600000000 max = 0 time = 2788
size = 700000000 max = 0 time = 3586
How come?
Here's the code (I left the arrays uninitialized to save on pre-processing time, the method, as far as I'd tested it, accurately identifies the maximum value in unsorted arrays):
public static short findMax(short[] list) {
return findMax(list, 0, list.length);
}
public static short findMax(short[] list, int start, int end) {
if(end - start == 1) {
return list[start];
}
else {
short leftMax = findMax(list, start, start+(end-start)/2);
short rightMax = findMax(list, start+(end-start)/2, end);
return (leftMax <= rightMax) ? (rightMax) : (leftMax);
}
}
public static void main(String[] args) {
for(int j=1; j < 10; j++) {
int size = j*100000000; // 100mil to 900mil
short[] x = new short[size];
long start = System.currentTimeMillis();
int max = findMax(x);
long end = System.currentTimeMillis();
System.out.println("size = " + size + "\t\t\tmax = " + max + "\t\t\t time = " + (end - start));
System.out.println();
}
}
You should count the number of comparisons that actually take place :
In the final step, after you find the maximum of the first n/2 numbers and last n/2 nubmers, you need 1 more comparison to find the maximum of the entire set of numbers.
On the previous step you have to find the maximum of the first and second groups of n/4 numbers and the maximum of the third and fourth groups of n/4 numbers, so you have 2 comparisons.
Finally, at the end of the recursion, you have n/2 groups of 2 numbers, and you have to compare each pair, so you have n/2 comparisons.
When you sum them all you get :
1 + 2 + 4 + ... + n/2 = n-1 = O(n)
You indeed create log(n) layers.
But at the end of the day, you still go through each element of every created bucket. Therefore you go through every element. So overall you are still O(n).
With Eran's answer, you already know what's wrong with your reasoning.
But anyway, there is a theorem called the Master Theorem, which aids in the running time analysis of recursive functions.
It verses on the following equation:
T(n) = a*T(n/b) + O(n^d)
Where T(n) is the running time for a problem of size n.
In your case, the recurrence equation would be T(n) = 2*T(n/2) + O(1) So a=2, b=2, and d=0. That is the case because, for each n-sized instance of your problem, you break it into 2 (a) subproblems, of size n / 2 (b), and combines them in O(1) = O(n^0).
The master theorem simply states three cases:
if a = b^d, then the total running time is O(n^d*log n)
if a < b^d, then the total running time is O(n^d)
if a > b^d, then the total running time is O(n^(log a / log b))
Your case matches the third, so the total running time is O(n^(log 2 / log 2)) = O(n)
It is a nice exercise to try to understand the reason behind these three cases. They are merely the cases for which:
1st) We do the same amount total work for each recursion level (this is the case of mergesort), so we only multiply the merging time, O(n^d), by the number of levels, log n.
2nd) We do less work for the second recursion level than for the first, and so on. Therefore the total work is basically the one for the last merge step (first recursion level), O(n^d).
3rd) We do more work for deeper levels (your case), so the running time is O(number of leaves in the recursion tree). In your case you have n leaves for the deeper recursion level, so O(n).
There are some short videos on a Stanford cousera course which are very nice to explain the Master Method, available https://www.coursera.org/course/algo. I believe you can always preview the course, even if not enrolled.

Complexity of BigO notation

I've been doing some questions but answers not provided so I was wondering if my answers are correct
a) given that a[i....j] is an integer array with n elements and x is an integer
int front, back;
while(i <= j) {
front = (i + j) / 3;
back = 2 * (i + j) / 3;
if(a[front] == x)
return front;
if (a[back] ==x)
return back;
if(x < a[front])
j = front - 1;
else if(x > a[back])
i = back+1;
else {
j = back-1;
i = front + 1;
}
}
My answer would be O(1) but I have a feeling I'm wrong.
B)
public static void whatIs(int n) {
if (n > 0)
System.out.print(n+" "+whatIs(n/2)+" "+whatIs(n/2));
}
ans: I'm not sure whether is it log4n or logn since recursion happens twice.
A) Yes. O(1) is wrong. You are going around the loop a number of times that depends on i, j, x ... and the contents of the array. Work out how many times you go around the loop in the best and worst cases.
B) Simplify log(4*n) using log(a*b) -> log(a) + log(b) (basic high-school mathematics) and then apply the definition of big O.
But that isn't the right answer either. Once again, you should go back to first principles and count the number of times that the method gets called for a given value of the parameter n. And do a proof by induction.
Both answers are incorrect.
In the first example on each iteration either you find the number or you shrink the length of the interval by 1/3. I.e. if the length used to be n you make it (2/3)*n. Worst case you find x on the last iteration - when the length of the interval is 1. So just like with binary search the complexity is calculated via a log: the complexity is O(log3/2(n)) and this in fact is simply O(log(n))
In the second example for a given number n you perform twice the number of operations needed for n/2. Start from n = 0 and n = 1 and use induction to prove the complexity is in fact O(n).
Hope this helps.
A) This algorithm seems similar to the Golden section search. When analyzing complexity, it's sometimes easier to imagine what would happen if we would extend the data structure, rather than contracting it. Think of it like this: Every loop removes a third from the search interval. That means, that if we know exactly how long time it takes for a certain length, we could add 50% more if we're allowed to loop once more ā€“ an exponential growth. Thus, the search algorithm must have complexity O(log n).
B) Every time we add a "layer" of function calls, we need to double the number of them (since the function always calls itself twice). In other words, given a certain length and time consumption, doubling n also requires twice as many function calls in the last layer. The algorithm is O(n).

Categories