Java: Strange runtime behaviour in main - java

I experience a (for me) strange runtimebehaviour in the following code:
public class Main{
private final static long ROUNDS = 1000000;
private final static double INITIAL_NUMBER = 0.45781929d;
private final static double DIFFERENCE = 0.1250120303d;
public static void main(String[] args){
doSomething();
doSomething();
doSomething();
}
private static void doSomething(){
long begin, end;
double numberToConvert, difference;
numberToConvert = INITIAL_NUMBER;
difference = DIFFERENCE;
begin = System.currentTimeMillis();
for(long i=0; i<ROUNDS; i++){
String s = "" + numberToConvert;
if(i % 2 == 0){
numberToConvert += difference;
}
else{
numberToConvert -= difference;
}
}
end = System.currentTimeMillis();
System.out.println("String appending conversion took " + (end - begin) + "ms.");
}
}
I would expect the program to print out similiar runtimes each time. However, the output I get is always like this:
String appending conversion took 473ms.
String appending conversion took 362ms.
String appending conversion took 341ms.
The first call is about 30% slower than the calls afterwards. Most of the time, the second call is also slightly slower than the third call.
java/javac versions:
javac 1.7.0_09 java version "1.7.0_09" OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-0ubuntu1~12.04.1) OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
So, my question: Why does this happen?

The Just-in-time (JIT) compiler is profiling your code on the fly and optimizing execution. The more often a piece of code is executed the better it's optimized.
See, for example, this question for more info: (How) does the Java JIT compiler optimize my code?

It is possible that other apps you have running are affecting the amount of memory you have allocated to the JVM on your machine. Try setting the same min and max memory to JVM when running the java command:
java -Xms512M -Xmx512M ...
I got fairly consistent intervals when trying to run it:
String appending conversion took 1153ms.
String appending conversion took 1095ms.
String appending conversion took 1081ms.

Related

Heap memory error in Java

I'm tinkering a bit with Java but have a lot of experience in some other languages.
I have a test problem that I know solution to (and can easily produce in Python and C++). But running the following Java code gives an
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I'm wondering if I'm making a simple mistake, I would not expect the memory footprint of this program to be very large at all:
public static void main(String[] args) {
ArrayList<Integer> longest_sequence = new ArrayList<>();
ArrayList<Integer> this_sequence;
int n = 0;
for (int i = 1; i < 1000000; i++) {
this_sequence = new ArrayList<Integer>();
n = i;
this_sequence.add(n);
while (n != 1) {
if (n % 2 == 0) {
n = n / 2;
}
else {
n = 3*n + 1;
}
this_sequence.add(n);
}
if (this_sequence.size() > longest_sequence.size()) {
longest_sequence = this_sequence;
}
}
System.out.println(longest_sequence.get(0));
System.out.println(longest_sequence.size());
}
To clarify a bit more:
A new list is created in each iteration of the program. It is either kept by assigning the longest_sequence to it, or it is discarded and overwritten by a new list instance.
I'm guessing my assumptions about that are incorrect, and instances are being preserved? The size of the lists should not be a problem (about 500 elements for the largest one).
It will fail even if you increase your heap space.
For n = 113383, an operation make you go through Integer limit and n becomes negative and this is ending in a infite loop.
It works if you change Integer by Long.
I think the easiest to avoid this is to add memory using -Xmx flag
https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrdocs/refman/optionX.html
here's official docs to read more about it
to change the VM for Eclipse you can change the amount of the MV from Windows> Preferences> Java> Installed JREs from there select the JRE and click edit, then write in the Default VM Arguments: to -Xmx1024M or any other amount of memory ...
Well, it's fairly self-explanatory: you've run out of memory.
You may want to try starting it with more memory, using the -Xmx flag, e.g.
java -Xmx2048m [whatever you'd have written before]
This will use up to 2 gigs of memory.
See the non-standard options list for more details.

In OS X, why does using println() cause my program to run faster than without println()

I've run into a really strange bug, and I'm hoping someone here can shed some light as it's way out of my area of expertise.
First, relevant background information: I am running OS X 10.9.4 on a Late 2013 Macbook Pro Retina with a 2.4GHz Haswell CPU. I'm using JDK SE 8u5 for OS X from Oracle, and I'm running my code on the latest version of IntelliJ IDEA. This bug also seems to be specific only to OS X, as I posted on Reddit about this bug already and other users with OS X were able to recreate it while users on Windows and Linux, including myself, had the program run as expected with the println() version running half a second slower than the version without println().
Now for the bug: In my code, I have a println() statement that when included, the program runs at ~2.5 seconds. If I remove the println() statement either by deleting it or commenting it out, the program counterintuitively takes longer to run at ~9 seconds. It's extremely strange as I/O should theoretically slow the program down, not make it faster.
For my actual code, it's my implementation of Project Euler Problem 14. Please keep in mind I'm still a student so it's not the best implementation:
public class ProjectEuler14
{
public static void main(String[] args)
{
final double TIME_START = System.currentTimeMillis();
Collatz c = new Collatz();
int highestNumOfTerms = 0;
int currentNumOfTerms = 0;
int highestValue = 0; //Value which produces most number of Collatz terms
for (double i = 1.; i <= 1000000.; i++)
{
currentNumOfTerms = c.startCollatz(i);
if (currentNumOfTerms > highestNumOfTerms)
{
highestNumOfTerms = currentNumOfTerms;
highestValue = (int)(i);
System.out.println("New term: " + highestValue); //THIS IS THE OFFENDING LINE OF CODE
}
}
final double TIME_STOP = System.currentTimeMillis();
System.out.println("Highest term: " + highestValue + " with " + highestNumOfTerms + " number of terms");
System.out.println("Completed in " + ((TIME_STOP - TIME_START)/1000) + " s");
}
}
public class Collatz
{
private static int numOfTerms = 0;
private boolean isFirstRun = false;
public int startCollatz(double n)
{
isFirstRun = true;
runCollatz(n);
return numOfTerms;
}
private void runCollatz(double n)
{
if (isFirstRun)
{
numOfTerms = 0;
isFirstRun = false;
}
if (n == 1)
{
//Reached last term, does nothing and causes program to return to startCollatz()
}
else if (n % 2 == 0)
{
//Divides n by 2 following Collatz rule, running recursion
numOfTerms = numOfTerms + 1;
runCollatz(n / 2);
}
else if (n % 2 == 1)
{
//Multiples n by 3 and adds one, following Collatz rule, running recursion
numOfTerms = numOfTerms + 1;
runCollatz((3 * n) + 1);
}
}
}
The line of code in question has been commented in with all caps, as it doesn't look like SO does line numbers. If you can't find it, it's within the nested if() statement in my for() loop in my main method.
I've run my code multiple times with and without that line, and I consistently get the above stated ~2.5sec times with println() and ~9sec without println(). I've also rebooted my laptop multiple times to make sure it wasn't my current OS run and the times stay consistent.
Since other OS X 10.9.4 users were able to replicate the code, I suspect it's due to a low-level bug with the compliler, JVM, or OS itself. In any case, this is way outside my knowledge. It's not a critical bug, but I definitely am interested in why this is happening and would appreciate any insight.
I did some research and some more with #ekabanov and here are the findings.
The effect you are seeing only happens with Java 8 and not with Java 7.
The extra line triggers a different JIT compilation/optimisation
The assembly code of the faster version is ~3 times larger and quick glance shows it did loop unrolling
The JIT compilation log shows that the slower version successfully inlined the runCollatz while the faster didn't stating that the callee is too large (probably because of the unrolling).
There is a great tool that helps you analyse such situations, it is called jitwatch. If it is assembly level then you also need the HotSpot Disassembler.
I'll post also my log files. You can feed the hotspot log files to the jitwatch and the assembly extraction is something that you diff to spot the differences.
Fast version's hotspot log file
Fast version's assembly log file
Slow version's hotspot log file
Slow version's assembly log file

High cost of polymorphism in Java Hotspot server

When I run my timing test program in Java Hotspot client, I get consistent behavior.
However, when I run it in Hotspot server, I get unexpected result.
Essentially, the cost of polymorphism is unacceptably high in certain situations that I've tried
to duplicate bellow.
Is this a known issue/bug with Hotspot server, or am I doing something wrong?
Test program and timing are given bellow:
Intel i7, Windows 8
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Mine2: 0.387028831 <--- polymorphic call with expected timing
Trivial: 1.545411765 <--- some more polymorphic calls
Mine: 0.727726371 <--- polymorphic call with unexpected timing. Should be about 0.38
Mine: 0.383132698 <--- direct call with expected timing
The situation gets worse as I add additional tests.
Timing of the tests near the end of the list are completely off.
interface canDoIsSquare {
boolean isSquare(long x);
}
final class Trivial implements canDoIsSquare {
#Override final public boolean isSquare(long x) {
if (x > 0) {
long t = (long) Math.sqrt(x);
return t * t == x;
}
return x == 0;
}
#Override public String toString() {return "Trivial";}
}
final class Mine implements canDoIsSquare {
#Override final public boolean isSquare(long x) {
if (x > 0) {
while ((x & 3) == 0)
x >>= 2;
if ((x & 2) != 0 || (x & 7) == 5)
return false;
final long t = (long) Math.sqrt(x);
return (t * t == x);
}
return x == 0;
}
#Override public String toString() {return "Mine";}
}
final class Mine2 implements canDoIsSquare {
#Override final public boolean isSquare(long x) {
// just duplicated code for this test
if (x > 0) {
while ((x & 3) == 0)
x >>= 2;
if ((x & 2) != 0 || (x & 7) == 5)
return false;
final long t = (long) Math.sqrt(x);
return (t * t == x);
}
return x == 0;
}
#Override final public String toString() {return "Mine2";}
}
public class IsSquared {
static final long init = (long) (Integer.MAX_VALUE / 8)
* (Integer.MAX_VALUE / 2) + 1L;
static long test1(final canDoIsSquare fun) {
long r = init;
long startTimeNano = System.nanoTime();
while (!fun.isSquare(r))
++r;
long taskTimeNano = System.nanoTime() - startTimeNano;
System.out.println(fun + ": " + taskTimeNano / 1e9);
return r;
}
static public void main(String[] args) {
Mine mine = new Mine();
Trivial trivial = new Trivial();
Mine2 mine2 = new Mine2();
test1(mine2);
test1(trivial);
test1(mine);
long r = init;
long startTimeNano = System.nanoTime();
while (!mine.isSquare(r))
++r;
long taskTimeNano = System.nanoTime() - startTimeNano;
System.out.println(mine + ": " + taskTimeNano / 1e9);
System.out.println(r);
}
}
The cost is high, indeed, but your benchmark doesn't measure anything really relevant. The JIT can optimize away most of the overhead, but you didn't give it any chance. See e.g. here.
In any case, there's no benchmark warmup and there's On Stack Replacement.
The explanation is probably that the Server Hotspot optimizes better but slower. It assumes that it has enough time and collects the necessary stats longer. So while the Client Hotspot optimized your program, the Server Hotspot was preparing itself to produce better code.
The reason for the worsening with additional tests is that the initially monomorphic call site became bimorphic and then megamorphic.
In reality it's possible that only one of the methods gets called. If you want benchmark this, you have to run each test in its own JVM. This is a real pain, but existing benchmarking frameworks do it for you.
Or you may want to measure the polymorphic case, but then you need to warm up the code with all cases first. This way you can find out which method is faster even in a single JVM (though each will be slowed down by the megamorphic call overhead.
Update
The explanation seems to be the change from monomorphic to megamorhic. When the first test was run, the JVM was knew all the classes (as the instances were already created), but was optimistically assuming that only Mine2 occurs on the call site. So it did a quick check (translated as a conditional branch, which was always correctly predicted and thus very fast), and called the proper method. As it later saw the other two instances being used there, it had to create a branch table for them (the branch prediction still works, but the overhead is higher).
Question
What's unclear: The JVM can move this test out of the loop and thus reduce it's cost to nearly nothing. I can't tell why it doesn't happen.
In short, the JIT can optimises a single method call, and two method calls, in ways it cannot with more multi-polymorphic calls. The number of possible methods which might be called on any given line is what matters and the JIT builds up this picture over time. When a method is inlined further optimisations are possible, but in your case the line in question increases the number of possible method calls from test1 over the life of the run and so it gets slower.
The way I get around this is to duplicate the short test code so each class is tested equally (assuming this is realistic) If you program will be multi-polymorphic when it is running, this is what you should test to be realistic as you can see it can change the results.
When you run the method from a fresh loop you see the benefit of only calling one method from that line of code.
Here is a table of different costs you might see depending on the number of possible methods any individual line can call. http://vanillajava.blogspot.co.uk/2012/12/performance-of-inlined-virtual-method.html
Polymorphism is not designed to improve performance and for me it is entirely reasonable that as the complexity of the polymorphism increases it should be slower.
BTW making methods final doesn't improve the performance any more. The JIT works out if you have called a sub-class on a line by line basis (as discussed)
EDIT As you can see the client JVM doesn't optimise the code as much as it is designed fr relatively light eight startup times. This means the client JVM is more consistent, but consistently slower. If you want the best performance you need to consider a number of optimisation strategies which leads to multiple possible outcomes depending on whether the optimisation is applied or not.

Does increase in the number of comments increases the execution time?

Consider the following cases:
Case 1: (Less comments in for loop)
import java.io.IOException;
public class Stopwatch {
private static long start;
public static void main(String args[]) throws IOException {
start = System.currentTimeMillis();
for (int i = 0; i < 1000000000; i++) {
/**
* Comment Line 1
* Comment Line 2
* Comment Line 3
* Comment Line 4
*/
}
System.out.println("The time taken to execute the code is: " + (System.currentTimeMillis() - start)/1000.0);
}
}
The time taken to execute the code is: 2.259
Case 2: (More comments in for loop)
import java.io.IOException;
public class Stopwatch {
private static long start;
public static void main(String args[]) throws IOException {
start = System.currentTimeMillis();
for (int i = 0; i < 1000000000; i++) {
/**
* Comment Line 1
* Comment Line 2
* Comment Line 3
* Comment Line 4
* Comment Line 5
* Comment Line 6
* Comment Line 7
* Comment Line 8
*/
}
System.out.println("The time taken to execute the code is: " + (System.currentTimeMillis() - start)/1000.0);
}
}
The time taken to execute the code is: 2.279
Case 3: (No comments, empty for loop)
import java.io.IOException;
public class Stopwatch {
private static long start;
public static void main(String args[]) throws IOException {
start = System.currentTimeMillis();
for (int i = 0; i < 1000000000; i++) {
}
System.out.println("The time taken to execute the code is: " + (System.currentTimeMillis() - start)/1000.0);
}
}
The time taken to execute the code is: 2.249
Configuration: JDK 1.5, 3rd Gen i5, 4GB Ram.
Question: If we add more comments, does the program takes more time to execute? Why?
Question: If we add more comments, does the program takes more time to execute? Why?
No. Comments have no effect on execution.
They will slow the compiler down a tiny bit - but even that should be imperceptible unless you have a ridiculous number of comments.
The "effect" you're noticing is more to do with the way you're timing things - using System.currentTimeMillis for benchmarking is a bad idea; you should use System.nanos instead as it typically uses a higher-accuracy clock (suitable only for timing, not for determining the "wall clock" time). Additionally, typically benchmark programs should run their "target" code for long enough to warm up the JIT compiler etc before actually measuring. Then you need to consider other things which might have been running on your system at the same time. Basically there's a lot involved in writing a good benchmark. I suggest you look at Caliper if you're going to be writing any significant benchmarks in the future.
You can verify that there's no difference just due to the comments though - compile your code and then run
javap -c Stopwatch
and you can look at the bytecode. You'll see there's no difference between the different versions.
No, comments are ignored by compile so they cannot affect the time execution. The difference you get is very low. If you try your test 10 times you can see that the difference you get is in the range of statistical mistake.
Computer does a lot of tasks simultaneously. If you want to compare performance of 2 pieces of code that give similar execution time you need many experiments to prove that one of them is faster than other.
It may slow the compiling process down on an extremely minute level - all it's doing is reading the // or /* */ and skipping everything after that until a line break or end comment, respectively. If you want to get more accurate results, run each iteration 10 times, and get the average execution time for each one. Then, compare.
Adding comments will increase the time taken for compiling the program but only at a minute level. When compiling the compiler will have to read a line to know if it needs to skip (in case of a comment) or execute that line of code.

What is the quantitative overhead of making a JNI call?

Based on performance alone, approximately how many "simple" lines of java is the equivalent performance hit of making a JNI call?
Or to try to express the question in a more concrete way, if a simple java operation such as
someIntVar1 = someIntVar2 + someIntVar3;
was given a "CPU work" index of 1, what would be the typical (ballpark) "CPU work" index of the overhead of making the JNI call?
This question ignores the time taken waiting for the native code to execute. In telephonic parlance, it is strictly about the "flag fall" part of the call, not the "call rate".
The reason for asking this question is to have a "rule of thumb" to know when to bother attempting coding a JNI call when you know the native cost (from direct testing) and the java cost of a given operation. It could help you quickly avoid the hassle to coding the JNI call only to find that the callout overhead consumed any benefit of using native code.
Edit:
Some folks are getting hung up on variations in CPU, RAM etc. These are all virtually irrelevant to the question - I'm asking for the relative cost to lines of java code. If CPU and RAM are poor, they are poor for both java and JNI so environmental considerations should balance out. The JVM version falls into the "irrelevant" category too.
This question isn't asking for an absolute timing in nanoseconds, but rather a ball park "work effort" in units of "lines of simple java code".
Quick profiler test yields:
Java class:
public class Main {
private static native int zero();
private static int testNative() {
return Main.zero();
}
private static int test() {
return 0;
}
public static void main(String[] args) {
testNative();
test();
}
static {
System.loadLibrary("foo");
}
}
C library:
#include <jni.h>
#include "Main.h"
JNIEXPORT int JNICALL
Java_Main_zero(JNIEnv *env, jobject obj)
{
return 0;
}
Results:
System details:
java version "1.7.0_09"
OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-1)
OpenJDK Server VM (build 23.2-b09, mixed mode)
Linux visor 3.2.0-4-686-pae #1 SMP Debian 3.2.32-1 i686 GNU/Linux
Update: Caliper micro-benchmarks for x86 (32/64 bit) and ARMv6 are as follows:
Java class:
public class Main extends SimpleBenchmark {
private static native int zero();
private Random random;
private int[] primes;
public int timeJniCall(int reps) {
int r = 0;
for (int i = 0; i < reps; i++) r += Main.zero();
return r;
}
public int timeAddIntOperation(int reps) {
int p = primes[random.nextInt(1) + 54]; // >= 257
for (int i = 0; i < reps; i++) p += i;
return p;
}
public long timeAddLongOperation(int reps) {
long p = primes[random.nextInt(3) + 54]; // >= 257
long inc = primes[random.nextInt(3) + 4]; // >= 11
for (int i = 0; i < reps; i++) p += inc;
return p;
}
#Override
protected void setUp() throws Exception {
random = new Random();
primes = getPrimes(1000);
}
public static void main(String[] args) {
Runner.main(Main.class, args);
}
public static int[] getPrimes(int limit) {
// returns array of primes under $limit, off-topic here
}
static {
System.loadLibrary("foo");
}
}
Results (x86/i7500/Hotspot/Linux):
Scenario{benchmark=JniCall} 11.34 ns; σ=0.02 ns # 3 trials
Scenario{benchmark=AddIntOperation} 0.47 ns; σ=0.02 ns # 10 trials
Scenario{benchmark=AddLongOperation} 0.92 ns; σ=0.02 ns # 10 trials
benchmark ns linear runtime
JniCall 11.335 ==============================
AddIntOperation 0.466 =
AddLongOperation 0.921 ==
Results (amd64/phenom 960T/Hostspot/Linux):
Scenario{benchmark=JniCall} 6.66 ns; σ=0.22 ns # 10 trials
Scenario{benchmark=AddIntOperation} 0.29 ns; σ=0.00 ns # 3 trials
Scenario{benchmark=AddLongOperation} 0.26 ns; σ=0.00 ns # 3 trials
benchmark ns linear runtime
JniCall 6.657 ==============================
AddIntOperation 0.291 =
AddLongOperation 0.259 =
Results (armv6/BCM2708/Zero/Linux):
Scenario{benchmark=JniCall} 678.59 ns; σ=1.44 ns # 3 trials
Scenario{benchmark=AddIntOperation} 183.46 ns; σ=0.54 ns # 3 trials
Scenario{benchmark=AddLongOperation} 199.36 ns; σ=0.65 ns # 3 trials
benchmark ns linear runtime
JniCall 679 ==============================
AddIntOperation 183 ========
AddLongOperation 199 ========
To summarize things a bit, it seems that JNI call is roughly equivalent to 10-25 java ops on typical (x86) hardware and Hotspot VM. At no surprise, under much less optimized Zero VM, the results are quite different (3-4 ops).
Thanks go to #Giovanni Azua and #Marko Topolnik for participation and hints.
So I just tested the "latency" for a JNI call to C on Windows 8.1, 64-bit, using the Eclipse Mars IDE, JDK 1.8.0_74, and VirtualVM profiler 1.3.8 with the Profile Startup add-on.
Setup: (two methods)
SOMETHING() passes arguments, does stuff, and returns arguments
NOTHING() passes in the same arguments, does nothing with them, and returns same arguments.
(each gets called 270 times)
Total run time for SOMETHING(): 6523ms
Total run time for NOTHING(): 0.102ms
Thus in my case the JNI calls are quite negligible.
You should actually test it yourself what the "latency" is. Latency is defined in engineering as the time it takes to send a message of zero length. In this context, it would correspond to writing the smallest Java program that invokes a do_nothing empty C++ function and compute mean and stddev of the elapsed time over 30 measurements (do couple of extra warm up calls). You might be surprised of the different average results doing the same for different JDK versions and platforms.
Only doing so will give you the final answer of whether using JNI makes sense for your target environment.

Categories