I wrote a program to test and verify the running time of "insertion sort" which should be O(n^2). The output doesn't look right to me and it doesn't seem to vary much between different runs. The other odd thing is that the second time through is always the smallest. I expect there to be greater variance every time I run the program but the run times don't seem to fluctuate as much as I would expect. I'm just wondering if there are some kind of optimizations or something being done by the JVM or compiler. I have similar code in C# and it seems to vary more and the output is as expected. I am not expecting the running times to square every time but I am expecting them to increase more than they are and I certainly expect a much greater variance at the last iteration.
Sample Output (it doesn't vary enough for me to include multiple outputs):
47
20 (this one is ALWAYS the lowest... it makes no sense!)
44
90
133
175
233
298
379
490
public class SortBench {
public static void main(String args[]){
Random rand = new Random(System.currentTimeMillis());
for(int k = 100; k <= 1000; k += 100)
{
//Keep track of time
long time = 0;
//Create new arrays each time
int[] a = new int[k];
int[] b = new int[k];
int[] c = new int[k];
int[] d = new int[k];
int[] e = new int[k];
//Insert random integers into the arrays
for (int i = 0; i < a.length; i++)
{
int range = Integer.MAX_VALUE;
a[i] = rand.nextInt(range);
b[i] = rand.nextInt(range);
c[i] = rand.nextInt(range);
d[i] = rand.nextInt(range);
e[i] = rand.nextInt(range);
}
long start = System.nanoTime();
insertionSort(a);
long end = System.nanoTime();
time += end-start;
start = System.nanoTime();
insertionSort(b);
end = System.nanoTime();
time += end-start;
start = System.nanoTime();
insertionSort(c);
end = System.nanoTime();
time += end-start;
start = System.nanoTime();
insertionSort(d);
end = System.nanoTime();
time += end-start;
start = System.nanoTime();
insertionSort(e);
end = System.nanoTime();
time += end-start;
System.out.println((time/5)/1000);
}
}
static void insertionSort(int[] a)
{
int key;
int i;
for(int j = 1; j < a.length; j++)
{
key = a[j];
i = j - 1;
while(i>=0 && a[i]>key)
{
a[i + 1] = a[i];
i = i - 1;
}
a[i + 1] = key;
}
}
}
On your first iteration, you're also measuring the JIT time (or at least some JIT time - HotSpot will progressively optimize further). Run it several times first, and then start measuring. I suspect you're seeing the benefits of HotSpot as time goes on - the earlier tests are slowed down by both the time taken to JIT and the fact that it's not running as optimal code. (Compare this with .NET, where the JIT only runs once - there's no progressive optimization.)
If you can, allocate all the memory first too - and make sure nothing is garbage collected until the end. Otherwise you're including allocation and GC in your timing.
You should also consider trying to take more samples, with n going up another order of magnitude, to get a better idea of how the time increases. (I haven't looked at what you've done carefully enough to work out whether it really should be O(n2).)
Warm up the JVM's JIT optimization of your function, memory allocators, TLB, CPU frequency, and so on before the timed region.
Add some untimed calls right after seeding the RNG, before your existing timing loop.
Random rand = new Random(System.currentTimeMillis());
// warmup
for(int k = 100; k <= 10000; k += 100)
{
int[]w = new int[1000];
for (int i = 0; i < w.length; i++)
{
int range = Integer.MAX_VALUE;
w[i] = rand.nextInt(range);
insertionSort(w);
}
}
Results with warming:
4
16
27
47
68
97
126
167
201
250
Results without warming:
62
244
514
206
42
59
80
98
122
148
Related
So I have written the insertion sort code properly to where it will successfully create arrays of 10, 1,000, 100,000 and 1,000,000 integers between 1,000 and 9,999 and complete the insertion sort algorithm just fine. However, when I attempt the last step of 10,000,000 integers, the array is created, but the code never fully completes. I have allowed it plenty of time to complete, upwards of 4 or 5 hours, to no avail. Anybody have any ideas of what the issue may be here? Is the executer having issues comprehending that many integers or what could the issue stem from? I have included a copy of the insertion algorithm that I have written.
public static void insertion(int[] a) {
int n = a.length;
for(int i = 1; i < n; i++) {
int j = i -1;
int temp = a[i];
while(j > 0 && temp < a[j]) {
a[j+1] = a[j];
j--;
}
a[j+1] = temp;
}
}
Anybody have any ideas of what the issue may be here?
When you make the array 10x larger you have to wait 100x longer as this is an O(n^2) algorithm.
Is the executer having issues comprehending that many integers or what could the issue stem from?
No, the limit is 2^31-1 and you are a long way from the limit.
Running
interface A {
static void main(String[] a) {
for (int i = 25_000; i <= 10_000_000; i *= 2) {
Random r = new Random();
int[] arr = new int[i];
for (int j = 0; j < i; j++)
arr[j] = r.nextInt();
long start = System.currentTimeMillis();
insertion(arr);
long time = System.currentTimeMillis() - start;
System.out.printf("Insertion sort of %,d elements took %.3f seconds%n",
i, time / 1e3);
}
}
public static void insertion(int[] a) {
int n = a.length;
for (int i = 1; i < n; i++) {
int j = i - 1;
int temp = a[i];
while (j > 0 && temp < a[j]) {
a[j + 1] = a[j];
j--;
}
a[j + 1] = temp;
}
}
}
prints
Insertion sort of 25,000 elements took 0.049 seconds
Insertion sort of 50,000 elements took 0.245 seconds
Insertion sort of 100,000 elements took 1.198 seconds
Insertion sort of 200,000 elements took 4.343 seconds
Insertion sort of 400,000 elements took 19.212 seconds
Insertion sort of 800,000 elements took 71.297 seconds
So my machine could take in the order of 4 hours, but it could take longer as a bigger data set doesn't fit in L3 cache, but rather main memory which is slower.
In my java program I have a for-loop looking roughly like this:
ArrayList<MyObject> myList = new ArrayList<MyObject>();
putThingsInList(myList);
for (int i = 0; i < myList.size(); i++) {
doWhatsoever();
}
Since the size of the list isn't changing, I tried to accelerate the loop by replacing the termination expression of the loop with a variable.
My idea was: Since the size of an ArrayList can possibly change while iterating it, the termination expression has to be executed each loop cycle. If I know (but the JVM doesn't), that its size will stay constant, the usage of a variable might speed things up.
ArrayList<MyObject> myList = new ArrayList<MyObject>();
putThingsInList(myList);
int myListSize = myList.size();
for (int i = 0; i < myListSize; i++) {
doWhatsoever();
}
However, this solution is slower, way slower; also making myListSize final doesn't change anything to that! I mean I could understand, if the speed didn't change at all; because maybe JVM just found out, that the size doesn't change and optimized the code. But why is it slower?
However, I rewrote the program; now the size of the list changes with each cycle: if i%2==0, I remove the last element of the list, else I add one element to the end of the list. So now the myList.size() operation has to be called within each iteration, I guessed.
I don't know if that's actually correct, but still the myList.size() termination expression is faster than using just a variable that remains constant all the time as termination expression...
Any ideas why?
Edit (I'm new here, I hope this is the way, how to do it)
My whole test program looks like this:
ArrayList<Integer> myList = new ArrayList<Integer>();
for (int i = 0; i < 1000000; i++)
{
myList.add(i);
}
final long myListSize = myList.size();
long sum = 0;
long timeStarted = System.nanoTime();
for (int i = 0; i < 500; i++)
{
for (int j = 0; j < myList.size(); j++)
{
sum += j;
if (j%2==0)
{
myList.add(999999);
}
else
{
myList.remove(999999);
}
}
}
long timeNeeded = (System.nanoTime() - timeStarted)/1000000;
System.out.println(timeNeeded);
System.out.println(sum);
Performance of the posted code (average of 10 executions):
4102ms for myList.size()
4230ms for myListSize
Without the if-then-else statements (so with constant myList size)
172ms for myList.size()
329ms for myListSize
So the speed different of both versions is still there. In the version with the if-then-else parts the percentaged differences are of course smaller because a lot of the time is invested for the add and remove operations of the list.
The problem is with this line:
final long myListSize = myList.size();
Change this to an int and lo and behold, running times will be identical. Why? Because comparing an int to a long for every iteration requires a widening conversion of the int, and that takes time.
Note that the difference also largely (but probably not completely) disappears when the code is compiled and optimised, as can be seen from the following JMH benchmark results:
# JMH 1.11.2 (released 7 days ago)
# VM version: JDK 1.8.0_51, VM 25.51-b03
# VM options: <none>
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
...
# Run complete. Total time: 00:02:01
Benchmark Mode Cnt Score Error Units
MyBenchmark.testIntLocalVariable thrpt 20 81892.018 ± 734.621 ops/s
MyBenchmark.testLongLocalVariable thrpt 20 74180.774 ± 1289.338 ops/s
MyBenchmark.testMethodInvocation thrpt 20 82732.317 ± 749.430 ops/s
And here's the benchmark code for it:
public class MyBenchmark {
#State( Scope.Benchmark)
public static class Values {
private final ArrayList<Double> values;
public Values() {
this.values = new ArrayList<Double>(10000);
for (int i = 0; i < 10000; i++) {
this.values.add(Math.random());
}
}
}
#Benchmark
public double testMethodInvocation(Values v) {
double sum = 0;
for (int i = 0; i < v.values.size(); i++) {
sum += v.values.get(i);
}
return sum;
}
#Benchmark
public double testIntLocalVariable(Values v) {
double sum = 0;
int max = v.values.size();
for (int i = 0; i < max; i++) {
sum += v.values.get(i);
}
return sum;
}
#Benchmark
public double testLongLocalVariable(Values v) {
double sum = 0;
long max = v.values.size();
for (int i = 0; i < max; i++) {
sum += v.values.get(i);
}
return sum;
}
}
P.s.:
My idea was: Since the size of an ArrayList can possibly change while
iterating it, the termination expression has to be executed each loop
cycle. If I know (but the JVM doesn't), that its size will stay
constant, the usage of a variable might speed things up.
Your assumption is wrong for two reasons: first of all, the VM can easily determine via escape analysis that the list stored in myList doesn't escape the method (so it's free to allocate it on the stack for example).
More importantly, even if the list was shared between multiple threads, and therefore could potentially be modified from the outside while we run our loop, in the absence of any synchronization it is perfectly valid for the thread running our loop to pretend those changes haven't happened at all.
As always, things are not always what they seem...
First things first, ArrayList.size() doesn't get recomputed on every invocation, only when the proper mutator is invoked. So calling it frequently is quite cheap.
Which of these loops is the fastest?
// array1 and array2 are the same size.
int sum;
for (int i = 0; i < array1.length; i++) {
sum += array1[i];
}
for (int i = 0; i < array2.length; i++) {
sum += array2[i];
}
or
int sum;
for (int i = 0; i < array1.length; i++) {
sum += array1[i];
sum += array2[i];
}
Instinctively, you would say that the second loop is the fastest since it doesn't iterate twice. However, some optimizations actually cause the first loop to be the fastest depending, for instance, on memory walking strides that cause a lot of memory cache misses.
Side-note: this compiler optimization technique is called loop
jamming.
This loop:
int sum;
for (int i = 0; i < 1000000; i++) {
sum += list.get(i);
}
is not the same as:
// Assume that list.size() == 1000000
int sum;
for (int i = 0; i < list.size(); i++) {
sum += list.get(i);
}
In the first case, the compile absolutely knows that it must iterate a million times and puts the constant in the Constant Pool, so certain optimizations can take place.
A closer equivalent would be:
int sum;
final int listSize = list.size();
for (int i = 0; i < listSize; i++) {
sum += list.get(i);
}
but only after the JVM has figured out what the value of listSize is. The final keyword gives the compiler/run-time certain guarantees that can be exploited. If the loop runs long enough, JIT-compiling will kick in, making execution faster.
Because this sparked interest in me I decided to do a quick test:
public class fortest {
public static void main(String[] args) {
long mean = 0;
for (int cnt = 0; cnt < 100000; cnt++) {
if (mean > 0)
mean /= 2;
ArrayList<String> myList = new ArrayList<String>();
putThingsInList(myList);
long start = System.nanoTime();
int myListSize = myList.size();
for (int i = 0; i < myListSize; i++) doWhatsoever(i, myList);
long end = System.nanoTime();
mean += end - start;
}
System.out.println("Mean exec: " + mean/2);
}
private static void doWhatsoever(int i, ArrayList<String> myList) {
if (i % 2 == 0)
myList.set(i, "0");
}
private static void putThingsInList(ArrayList<String> myList) {
for (int i = 0; i < 1000; i++) myList.add(String.valueOf(i));
}
}
I do not see the kind of behavior you are seeing.
2500ns mean execution time over 100000 iterations with myList.size()
1800ns mean execution time over 100000 iterations with myListSize
I therefore suspect that it's your code that is executed by the functions that is at fault. In the above example you can sometimes see faster execution if you only fill the ArrayList once, because doWhatsoever() will only do something on the first loop. I suspect the rest is being optimized away and significantly drops execution time therefore. You might have a similar case, but without seeing your code it might be close to impossible to figure that one out.
There is another way to speed up the code using for each loop
ArrayList<MyObject> myList = new ArrayList<MyObject>();
putThingsInList(myList);
for (MyObject ob: myList) {
doWhatsoever();
}
But I agree with #showp1984 that some other part is slowing the code.
Why second loop is faster than first here.
public class Test2 {
public static void main(String s[]) {
long start, end;
int[] a = new int[2500000];
int length = a.length;
start = System.nanoTime();
for (int i = 0; i < length; i++) {
a[i] += i;
}
end = System.nanoTime();
System.out.println(end - start + " nano with i < a.length ");
int[] b = new int[2500000];
start = System.nanoTime();
for (int i = b.length - 1; i >= 0; i--) {
b[i] += i;
}
end = System.nanoTime();
System.out.println(end - start + " nano with i > = 0");
}
}
Output is
6776766 nano with i < a.length
5525033 nano with i > = 0
update - I have update the question according to the suggestion but I still see the difference in time. first loop is taking more time then second loop.
Most likely it's because you're fetching the value of a.length each iteration in the first case, as opposite to once in the second case.
try doing something like
int len = a.length;
and using len as the termination border for the loop.
this could potentially reduce the time of the first loop.
If I modified your first for loop slightly, you'll get a similar time:
int alength = a.length; // pre-compute a.length
start = System.currentTimeMillis();
for (int i = 0; i < alength; i++) {
a[i] += i;
}
$ java Test
8 millis with i<a.length
6 millis with i>=0
The main reason for the difference in times is -
"... Never use System.currentTimeMillis() unless you are OK with + or - 15 ms accuracy, which is typical on most OS + JVM combinations. Use System.nanoTime() instead." – Scott Carey Found Here
Update:
I believe someone mentioned in the comments section of your question that you should also warm up the kernel your testing on, before testing micro benchmarks.
Rule 1: Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase(s). (Fewer iterations is OK on the warmup phase. The rule of thumb is several tens of thousands of inner loop iterations.)
I developed the Sieve of Eratosthenes algorithm in Java and I wanted to measure its performance.
Basically I run the "core algorithm" (not the entire application) 5000 times (with a for loop) and measure its execution time.
Here it is the code I used:
int N = 100000;
int m;
long[] microseconds = new long[5000];
for (int k = 0; k < 5000; k++) {
long start = System.nanoTime();
// Core algorithm
boolean[] isPrime = new boolean[N + 1];
for (int i = 2; i <= N; i++) {
isPrime[i] = true;
}
for (int i = 2; i * i <= N; i++) {
if (isPrime[i]) {
for (int j = i; (m = i * j) <= N; j++) {
isPrime[m] = false;
}
}
}
long end = System.nanoTime();
microseconds[k] = (end - start) / 1000;
}
// Output of the execution times on file
PrintWriter writer = null;
try {
writer = new PrintWriter("ex.txt");
} catch (FileNotFoundException ex) {
Logger.getLogger(EratosthenesSieve.class.getName()).log(Level.SEVERE, null, ex);
}
// iterate through array, write each element of array to file
for (int i = 0; i < microseconds.length; i++) {
// write array element to file
writer.print(microseconds[i]);
// write separator between array elements to file
writer.print("\n");
}
// done writing, close writer
writer.close();
The result is the following:
As you can see there are some big initial spikes (7913 and 1548) and some "periodical" spikes. How can I explain these spikes? I have already disabled Internet connection (hardware board) and all the possible services running in background (Windows 7; this means no antivirus etc.). Furthermore I set -Xmx and -Xms parameters to a very large quantity of memory. So I'm basically running the application "alone".
I know it's an hard question, but some hints would be greatly appreciated.
EDIT: I have modified my algorithm based on the "beny23" suggestion and now there are no more periodical spikes. However there are some big initial spikes.
Or (with N=1000 and not anymore N=10000):
Most likely
you get spikes due to garbage collections, use -verbose:gc to see when they occur.
code runs slower when it has not be warmed up. A loop or method needs to be called 10000 times to trigger a background compilation. You can change this threshold with -XX:CompileThreshold=10000
your machine will have significant jitter due to the way the scheduler works. Unless you have you thread bound to an isolated CPU you can expect jitter of 2-10 ms for this reason alone. http://vanillajava.blogspot.com/2013/07/micro-jitter-busy-waiting-and-binding.html
I would change your loop to avoid using * Its fast on modern CPUs, but not as cheap as +
for (int j = i, m = i * i; m <= N; j++, m += i) {
Consider the two following code samples. All benchmarking is done outside of the container being used to calculate an average of the sampled execution times. On my machine, running Windows 7 and JDK 1.6, I am seeing the average execution time in example 2 close to 1,000 times slower than that of example 1. The only explanation I can surmise is that the compiler is optimizing some code used by LinkedList to the detriment of everything else. Can someone help me understand this?
Example 1: Using Arrays
public class TimingTest
{
static long startNanos, endNanos;
static long[] samples = new long[1000];
public static void main(String[] args)
{
for (int a = 0; a < 100; a++)
{
for (int numRuns = 0; numRuns < 1000; numRuns++)
{
startNanos = System.nanoTime();
long sum = 0;
for (long i = 1; i <= 500000; i++)
{
sum += i % 13;
}
endNanos = System.nanoTime() - startNanos;
samples[numRuns] =(endNanos);
}
long avgPrim = 0L;
for (long sample : samples)
{
avgPrim += sample;
}
System.out.println("Avg: " + (avgPrim / samples.length) );
}
}
}
Example 2: Using a LinkedList
public class TimingTest
{
static long startNanos, endNanos;
static List<Long> samples = new LinkedList<Long>();
public static void main(String[] args)
{
for (int a = 0; a < 100; a++)
{
for (int numRuns = 0; numRuns < 1000; numRuns++)
{
startNanos = System.nanoTime();
long sum = 0;
int index = 0;
for (long i = 1; i <= 500000; i++)
{
sum += i % 13;
}
endNanos = System.nanoTime() - startNanos;
samples.add(endNanos);
}
long avgPrim = 0L;
for (long sample : samples)
{
avgPrim += sample;
}
System.out.println("Avg: " + (avgPrim / samples.size()));
}
}
}
Something is very wrong here: When I run the array version, I get an average execution time of 20000 nanoseconds. It is downright impossible for my 2 GHz CPU to execute 500000 loop iterations in that time, as that would imply the average loop iteration to take 20000/500000 = 0.04 ns, or 0.08 cpu cpu cycles ...
The main reason is a bug in your timing logic: In the array version, you do
int index = 0;
for every timing, hence
samples[index++] =(endNanos);
will always assign to first array element, leaving all others at their default value of 0. Hence when you take the average of the array, you get 1/1000 of the last sample, not the average of all samples.
Indeed, if you move the declaration of index outside the loop, no significant difference is reported between the two variants.
Here's a real run of your code (renamed classes for clarity, and cut the outside for loop in each to a < 1 for time's sake):
$ for f in *.class
do
class=$(echo $f | sed 's`\(.*\)\.class`\1`')
echo Running $class
java $class
done
Running OriginalArrayTimingTest
Avg: 18528
Running UpdatedArrayTimingTest
Avg: 41111273
Running LinkedListTimingTest
Avg: 41340483
Obviously, your original concern was caused by the typo #meriton pointed out, which you corrected in your question. We can see that, for your test case, both and array and a LinkedList behave almost identically. Generally speaking, insertions on a LinkedList are very fast. Since you updated your question with meriton's changes, but didn't update your claim that the former is dramatically faster than the latter, it's no longer clear what you're asking; however I hope you can see now that in this case, both data structures behave reasonably similarly.