Alright so I was working on a game in Eclipse Neon and I noticed that when I added a break statement to the program it significantly slowed down the programs speed from around 120 fps to 80 fps(which makes little sense). So I decided to test it out in another class and got similar results.
This is the code I ran:
public static int[] xArray = new int[100000];
public static int[] yArray = new int[10000];
public static void main(String[] args){
long timeStart = System.currentTimeMillis();
int numberOfLoops = 0;
int uboveMax = 0;
for(int x = 0; x < xArray.length; x++){
for(int y = 0; y < yArray.length; y++){
numberOfLoops++;
if(y > 9000){
uboveMax++;
//break;
}
}
}
long timeTaken = System.currentTimeMillis();
System.out.println("Number of Loops: " + numberOfLoops);
System.out.println("Ubove Max: " + uboveMax);
System.out.println("Time Taken(MS): " + (timeTaken - timeStart));
}
So when I ran the code (in Eclipse Neon 2 (4.6.2)) I got unexpected results:
With the break statement: Time Taken(MS): 344.8 (average from 5 tests)
Without the break statement: Time Taken(MS): 294.6 (average from 5 tests)
Then when I ran the code (in NetBeans IDE 8.2) I got expected results:
With the break statement: Time Taken(MS): 4.2 (average from 5 tests)
Without the break statement: Time Taken(MS): 556.8 (average from 5 tests)
Shouldn't the code (in Eclipse) with the break statement be running at least the same speed if not even faster? Also what is the cause of the large discrepancy between Eclipse and NetBeans, I know they are very different programs, but don't both run the code from the same JVM, is their something wrong with the Eclipse compiler? If anyone could provide an explanation for this occurrence that would be great, thanks!
This should better be a comment, but it's too long. The problem with your code is that what it does is equivalent to
System.out.println("Number of Loops: " + 1000000000);
System.out.println("Ubove Max: " + 99900000);
This could be optimized down to a few cycles. This doesn't happen as you don't give the JVM enough time and/or because of OSR (see this question for a good example).
You code (without break) could be optimized to
for(int x = 0; x < xArray.length; x++) {
numberOfLoops += yArray.length;
uboveMax += Math.max(0, yArray.length - 9001);
}
and further to
numberOfLoops += xArray.length * yArray.length;
uboveMax += xArray.length * Math.max(0, yArray.length - 9001);
The code using break would lead to a similar expression. If I was the JVM, I'd be upset with you wasting time on such a non-sense. The JVM isn't upset, and if you'd gave it a chance, you'd see times close to zero in both cases. It's possible but improbable, that (even given enough time) one of the cases doesn't get optimized as well as the other, but this is nothing essential, it's just that there are countless optimization possibilities and there are cases when the JVM misses something.
IMHO, this example will not give you any insight to why the slowdown happens with your real code. simplifying code in order to locate the problem is good, but you went too far. In reality, your variables do something and probably influence the future computation in way so that breaking out of the loop cases more work to be done in the future. Just guessing, but what else without the real code?
Related
I've run into a really strange bug, and I'm hoping someone here can shed some light as it's way out of my area of expertise.
First, relevant background information: I am running OS X 10.9.4 on a Late 2013 Macbook Pro Retina with a 2.4GHz Haswell CPU. I'm using JDK SE 8u5 for OS X from Oracle, and I'm running my code on the latest version of IntelliJ IDEA. This bug also seems to be specific only to OS X, as I posted on Reddit about this bug already and other users with OS X were able to recreate it while users on Windows and Linux, including myself, had the program run as expected with the println() version running half a second slower than the version without println().
Now for the bug: In my code, I have a println() statement that when included, the program runs at ~2.5 seconds. If I remove the println() statement either by deleting it or commenting it out, the program counterintuitively takes longer to run at ~9 seconds. It's extremely strange as I/O should theoretically slow the program down, not make it faster.
For my actual code, it's my implementation of Project Euler Problem 14. Please keep in mind I'm still a student so it's not the best implementation:
public class ProjectEuler14
{
public static void main(String[] args)
{
final double TIME_START = System.currentTimeMillis();
Collatz c = new Collatz();
int highestNumOfTerms = 0;
int currentNumOfTerms = 0;
int highestValue = 0; //Value which produces most number of Collatz terms
for (double i = 1.; i <= 1000000.; i++)
{
currentNumOfTerms = c.startCollatz(i);
if (currentNumOfTerms > highestNumOfTerms)
{
highestNumOfTerms = currentNumOfTerms;
highestValue = (int)(i);
System.out.println("New term: " + highestValue); //THIS IS THE OFFENDING LINE OF CODE
}
}
final double TIME_STOP = System.currentTimeMillis();
System.out.println("Highest term: " + highestValue + " with " + highestNumOfTerms + " number of terms");
System.out.println("Completed in " + ((TIME_STOP - TIME_START)/1000) + " s");
}
}
public class Collatz
{
private static int numOfTerms = 0;
private boolean isFirstRun = false;
public int startCollatz(double n)
{
isFirstRun = true;
runCollatz(n);
return numOfTerms;
}
private void runCollatz(double n)
{
if (isFirstRun)
{
numOfTerms = 0;
isFirstRun = false;
}
if (n == 1)
{
//Reached last term, does nothing and causes program to return to startCollatz()
}
else if (n % 2 == 0)
{
//Divides n by 2 following Collatz rule, running recursion
numOfTerms = numOfTerms + 1;
runCollatz(n / 2);
}
else if (n % 2 == 1)
{
//Multiples n by 3 and adds one, following Collatz rule, running recursion
numOfTerms = numOfTerms + 1;
runCollatz((3 * n) + 1);
}
}
}
The line of code in question has been commented in with all caps, as it doesn't look like SO does line numbers. If you can't find it, it's within the nested if() statement in my for() loop in my main method.
I've run my code multiple times with and without that line, and I consistently get the above stated ~2.5sec times with println() and ~9sec without println(). I've also rebooted my laptop multiple times to make sure it wasn't my current OS run and the times stay consistent.
Since other OS X 10.9.4 users were able to replicate the code, I suspect it's due to a low-level bug with the compliler, JVM, or OS itself. In any case, this is way outside my knowledge. It's not a critical bug, but I definitely am interested in why this is happening and would appreciate any insight.
I did some research and some more with #ekabanov and here are the findings.
The effect you are seeing only happens with Java 8 and not with Java 7.
The extra line triggers a different JIT compilation/optimisation
The assembly code of the faster version is ~3 times larger and quick glance shows it did loop unrolling
The JIT compilation log shows that the slower version successfully inlined the runCollatz while the faster didn't stating that the callee is too large (probably because of the unrolling).
There is a great tool that helps you analyse such situations, it is called jitwatch. If it is assembly level then you also need the HotSpot Disassembler.
I'll post also my log files. You can feed the hotspot log files to the jitwatch and the assembly extraction is something that you diff to spot the differences.
Fast version's hotspot log file
Fast version's assembly log file
Slow version's hotspot log file
Slow version's assembly log file
In the following code:
long startingTime = System.nanoTime();
int max = (int) Math.pow(2, 19);
for(int i = 0; i < max; ){
i++;
}
long timePass = System.nanoTime() - startingTime;
System.out.println("Time pass " + timePass / 1000000F);
I am trying to calculate how much time it take to perform simple actions on my machine.
All the calculations up to the power of 19 increase the time it takes to run this code, but when I went above 19(up to max int value 31) I was amazed to discover that it have no effect on the time it takes.
It always shows 5 milliseconds on my machine!!!
How can this be?
You have just witnessed HotSpot optimizing your entire loop to oblivion. It's smart. You need to do some real action inside the loop. I recommend introducing an int accumulator var and doing some bitwise operations on it, and finally printing the result to ensure it's needed after the loop.
On the HotSpot JVM, -XX:CompileThreshold=10000 by default. This means a loop which iterates 10K times can trigger the whole method to be optimised. In your case you are timing how long it take to detect and compile (in the background) your method.
use another System.nanoTime() in the loop. no one can optimize this.
for(int i = 0; i < max; ){
i++;
dummy+=System.nanoTime();
}
dont forget to do:
System.out.println(dummy);
after the loop. ensures non-optimization
I would like to compare the speed performance (if there were any) from the two readDataMethod() as I illustrate below.
private void readDataMethod1(List<Integer> numbers) {
final long startTime = System.nanoTime();
for (int i = 0; i < numbers.size(); i++) {
numbers.get(i);
}
final long endTime = System.nanoTime();
System.out.println("method 1 : " + (endTime - startTime));
}
private void readDataMethod2(List<Integer> numbers) {
final long startTime = System.nanoTime();
int i = numbers.size();
while (i-- > 0) {
numbers.get(i);
}
final long endTime = System.nanoTime();
System.out.println("method 2 : " + (endTime - startTime));
}
Most of the time the result I get shows that method 2 has "lower" value.
Run readDataMethod1 readDataMethod2
1 636331 468876
2 638256 479269
3 637485 515455
4 716786 420756
Does this test prove that the readDataMethod2 is faster than the earlier one ?
Does this test prove that the readDataMethod2 is faster than the earlier one ?
You are on the right track in that you're measuring comparative performance, rather than making assumptions.
However, there are lots of potential issues to be aware of when writing micro-benchmarks in Java. I would recommend that you read
How do I write a correct micro-benchmark in Java?
In the first one, you are calling numbers.size() for each iteration.
Try storing it in a variable, and check again.
The reason because of which the second version runs faster is because you are calling numbers.size() on each iteration. Replacing it by storing in a number would make it almost the same as the first one.
Does this test prove that the readDataMethod2 is faster than the earlier one ?
As #aix says, you are on the right track. However, there are a couple of specific issues with your methodology:
It doesn't look like you are "warming up" the JVM. Therefore it is conceivable that your figures could be distorted by startup effects (JIT compilation) or that none of the code has been JIT compiled.
I'd also argue that your runs are doing too little work. A 500000 nanoseconds, is 0.0005 seconds, and that's not much work. The risk is that "other things" external to your application could be introducing noise into the measurements. I'd have more confidence in runs that take tens of seconds.
This Program should do this
N 10*N 100*N 1000*N
1 10 100 1000
2 20 200 2000
3 30 300 3000
4 40 400 4000
5 50 500 5000
So here's my code:
public class ex_4_21 {
public static void main( String Args[] ){
int process = 1;
int process2 = 1;
int process22 = 1;
int process3 = 1;
int process33 = 2;
System.out.println("N 10*N 100*N 1000*N");
while(process<=5){
while(process2<=3){
System.out.printf("%d ",process2);
while(process22<=3){
process2 = process2 * 10;
System.out.printf("%d ",process2);
process22++;
}
process2++;
}
process++;
}
}
}
Can my code be more effecient? I am currently learning while loops. So far this what I got. Can anyone make this more efficient, or give me ideas on how to make my code more efficient?
This is not a homework, i am self studying java
You can use a single variable n to do this.
while(n is less than the maximum value that you wish n to be)
print n and a tab
print n * 10 and a tab
print n * 100 and a tab
print n * 1000 and a new line
n++
if the power of 10 is variable then you can try this:
while(n is less than the maximum value that you wish n to be)
while(i is less than the max power of ten)
print n * i * 10 and a tab
i++
print a newline
n++
If you must use a while loop
public class ex_4_21 {
public static void main( String Args[] ){
int process = 1;
System.out.println("N 10*N 100*N 1000*N");
while(process<=5){
System.out.println(process + " " + 10*process + " " + 100*process + " " + 1000*process + "\n");
process++;
}
}
}
You have one too many while loops (your "process2" while loop is unnecessary). You also appear to have some bugs related to the fact that the variables you are looping on in the inner loops are not re-initialized with each iteration.
I would also recommend against while loops for this; Your example fits a for loop much better; I understand you are trying to learn the looping mechanism, but part of learning should also be in deciding when to use which construct. This really isn't a performance recommendation, more an approach recommendation.
I don't have any further performance improvement suggestions, for what you are trying to do; You could obviously remove loops (dropping down to a single or even no loops), but two loops makes sense for what you are doing (allows you to easily add another row or column to the output with minimal changes).
You can try loop unrolling, similar to #Vincent Ramdhanie's answer.
However, loop unrolling and threading won't produce a significant performance improvement for such a small sample. The overhead involved in creating and launching threads (processes) takes more time than a simple while loop. The overhead in I/O will take more time than the unrolled version saves. A complex program is harder to debug and maintain than a simple one.
You're thinking is called microoptimization. Save the optimizations for larger programs and only when the requirements cannot be met or the customer(s) demand so.
I have this code that is testing Calendar.getInstance().getTimeInMillis() vs System.currentTimeMilli() :
long before = getTimeInMilli();
for (int i = 0; i < TIMES_TO_ITERATE; i++)
{
long before1 = getTimeInMilli();
doSomeReallyHardWork();
long after1 = getTimeInMilli();
}
long after = getTimeInMilli();
System.out.println(getClass().getSimpleName() + " total is " + (after - before));
I want to make sure no JVM or compiler optimization happens, so the test will be valid and will actually show the difference.
How to be sure?
EDIT: I changed the code example so it will be more clear. What I am checking here is how much time it takes to call getTimeInMilli() in different implementations - Calendar vs System.
I think you need to disable JIT. Add to your run command next option:
-Djava.compiler=NONE
You want optimization to happen, because it will in real life - the test wouldn't be valid if the JVM didn't optimize in the same way that it would in the real situation you're interested in.
However, if you want to make sure that the JVM doesn't remove calls that it could potentially consider no-ops otherwise, one option is to use the result - so if you're calling System.currentTimeMillis() repeatedly, you might sum all the return values and then display the sum at the end.
Note that you may still have some bias though - for example, there may be some optimization if the JVM can cheaply determine that only a tiny amount of time has passed since the last call to System.currentTimeMillis(), so it can use a cached value. I'm not saying that's actually the case here, but it's the kind of thing you need to think about. Ultimately, benchmarks can only really test the loads you give them.
One other thing to consider: assuming you want to model a real world situation where the code is run a lot, you should run the code a lot before taking any timing - because the Hotspot JVM will optimize progressively harder, and presumably you care about the heavily-optimized version and don't want to measure the time for JITting and the "slow" versions of the code.
As Stephen mentioned, you should almost certainly take the timing outside the loop... and don't forget to actually use the results...
What you are doing looks like benchmarking, you can read Robust Java benchmarking to get some good background about how to make it right. In few words, you don't need to turn it off, because it won't be what happens on production server.. instead you need to know the close the possible to 'real' time estimation / performance. Before optimization you need to 'warm up' your code, it looks like:
// warm up
for (int j = 0; j < 1000; j++) {
for (int i = 0; i < TIMES_TO_ITERATE; i++)
{
long before1 = getTimeInMilli();
doSomeReallyHardWork();
long after1 = getTimeInMilli();
}
}
// measure time
long before = getTimeInMilli();
for (int j = 0; j < 1000; j++) {
for (int i = 0; i < TIMES_TO_ITERATE; i++)
{
long before1 = getTimeInMilli();
doSomeReallyHardWork();
long after1 = getTimeInMilli();
}
}
long after = getTimeInMilli();
System.out.prinltn( "What to expect? " + (after - before)/1000 ); // average time
When we measure performance of our code we use this approach, it give us more less real time our code needs to work. Even better to measure code in separated methods:
public void doIt() {
for (int i = 0; i < TIMES_TO_ITERATE; i++)
{
long before1 = getTimeInMilli();
doSomeReallyHardWork();
long after1 = getTimeInMilli();
}
}
// warm up
for (int j = 0; j < 1000; j++) {
doIt()
}
// measure time
long before = getTimeInMilli();
for (int j = 0; j < 1000; j++) {
doIt();
}
long after = getTimeInMilli();
System.out.prinltn( "What to expect? " + (after - before)/1000 ); // average time
Second approach is more precise, but it also depends on VM. E.g. HotSpot can perform "on-stack replacement", it means that if some part of method is executed very often it will be optimized by VM and old version of code will be exchanged with optimized one while method is executing. Of course it takes extra actions from VM side. JRockit does not do it, optimized version of code will be used only when this method is executed again (so no 'runtime' optimization... I mean in my first code sample all the time old code will be executed... except for doSomeReallyHardWork internals - they do not belong to this method, so optimization will work well).
UPDATED: code in question was edited while I was answering ;)
Sorry, but what you are trying to do makes little sense.
If you turn off JIT compilation, then you are only going to measure how long it takes to call that method with JIT compilation turned off. This is not useful information ... because it tells you little if anything about what will happen when JIT compilation is turned on1.
The times between JIT on and off can be different by a huge factor. You are unlikely to want to run anything in production with JIT turned off.
A better approach would be to do this:
long before1 = getTimeInMilli();
for (int i = 0; i < TIMES_TO_ITERATE; i++) {
doSomeReallyHardWork();
}
long after1 = getTimeInMilli();
... and / or use the nanosecond clock.
If you are trying to measure the time taken to call the two versions of getTimeInMillis(), then I don't understand the point of your call to doSomeReallyHardWork(). A more senible benchmark would be this:
public long test() {
long before1 = getTimeInMilli();
long sum = 0;
for (int i = 0; i < TIMES_TO_ITERATE; i++) {
sum += getTimeInMilli();
}
long after1 = getTimeInMilli();
System.out.println("Took " + (after - before) + " milliseconds");
return sum;
}
... and call that a number of times, until the times printed stabilize.
Either way, my main point still stands, turning of JIT compilation and / or optimization would mean that you were measuring something that is not useful to know, and not what you are really trying to find out. (Unless, that is, you are intending to run your application in production with JIT turned off ... which I find hard to believe ...)
1 - I note that someone has commented that turning off JIT compilation allowed them to easily demonstrate the difference between O(1), O(N) and O(N^2) algorithms for a class. But I would counter that it is better to learn how to write a correct micro-benchmark. And for serious purposes, you need to learn how to derive the complexity of the algorithms ... mathematically. Even with a perfect benchmark, you can get the wrong answer by trying to "deduce" complexity from performance measurements. (Take the behavior of HashMap for example.)