High cost of polymorphism in Java Hotspot server

High cost of polymorphism in Java Hotspot server - java

When I run my timing test program in Java Hotspot client, I get consistent behavior.
However, when I run it in Hotspot server, I get unexpected result.
Essentially, the cost of polymorphism is unacceptably high in certain situations that I've tried
to duplicate bellow.
Is this a known issue/bug with Hotspot server, or am I doing something wrong?
Test program and timing are given bellow:
Intel i7, Windows 8
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Mine2: 0.387028831 <--- polymorphic call with expected timing
Trivial: 1.545411765 <--- some more polymorphic calls
Mine: 0.727726371 <--- polymorphic call with unexpected timing. Should be about 0.38
Mine: 0.383132698 <--- direct call with expected timing
The situation gets worse as I add additional tests.
Timing of the tests near the end of the list are completely off.
interface canDoIsSquare {
boolean isSquare(long x);
}
final class Trivial implements canDoIsSquare {
#Override final public boolean isSquare(long x) {
if (x > 0) {
long t = (long) Math.sqrt(x);
return t * t == x;
}
return x == 0;
}
#Override public String toString() {return "Trivial";}
}
final class Mine implements canDoIsSquare {
#Override final public boolean isSquare(long x) {
if (x > 0) {
while ((x & 3) == 0)
x >>= 2;
if ((x & 2) != 0 || (x & 7) == 5)
return false;
final long t = (long) Math.sqrt(x);
return (t * t == x);
}
return x == 0;
}
#Override public String toString() {return "Mine";}
}
final class Mine2 implements canDoIsSquare {
#Override final public boolean isSquare(long x) {
// just duplicated code for this test
if (x > 0) {
while ((x & 3) == 0)
x >>= 2;
if ((x & 2) != 0 || (x & 7) == 5)
return false;
final long t = (long) Math.sqrt(x);
return (t * t == x);
}
return x == 0;
}
#Override final public String toString() {return "Mine2";}
}
public class IsSquared {
static final long init = (long) (Integer.MAX_VALUE / 8)
* (Integer.MAX_VALUE / 2) + 1L;
static long test1(final canDoIsSquare fun) {
long r = init;
long startTimeNano = System.nanoTime();
while (!fun.isSquare(r))
++r;
long taskTimeNano = System.nanoTime() - startTimeNano;
System.out.println(fun + ": " + taskTimeNano / 1e9);
return r;
}
static public void main(String[] args) {
Mine mine = new Mine();
Trivial trivial = new Trivial();
Mine2 mine2 = new Mine2();
test1(mine2);
test1(trivial);
test1(mine);
long r = init;
long startTimeNano = System.nanoTime();
while (!mine.isSquare(r))
++r;
long taskTimeNano = System.nanoTime() - startTimeNano;
System.out.println(mine + ": " + taskTimeNano / 1e9);
System.out.println(r);
}
}

The cost is high, indeed, but your benchmark doesn't measure anything really relevant. The JIT can optimize away most of the overhead, but you didn't give it any chance. See e.g. here.
In any case, there's no benchmark warmup and there's On Stack Replacement.
The explanation is probably that the Server Hotspot optimizes better but slower. It assumes that it has enough time and collects the necessary stats longer. So while the Client Hotspot optimized your program, the Server Hotspot was preparing itself to produce better code.
The reason for the worsening with additional tests is that the initially monomorphic call site became bimorphic and then megamorphic.
In reality it's possible that only one of the methods gets called. If you want benchmark this, you have to run each test in its own JVM. This is a real pain, but existing benchmarking frameworks do it for you.
Or you may want to measure the polymorphic case, but then you need to warm up the code with all cases first. This way you can find out which method is faster even in a single JVM (though each will be slowed down by the megamorphic call overhead.
Update
The explanation seems to be the change from monomorphic to megamorhic. When the first test was run, the JVM was knew all the classes (as the instances were already created), but was optimistically assuming that only Mine2 occurs on the call site. So it did a quick check (translated as a conditional branch, which was always correctly predicted and thus very fast), and called the proper method. As it later saw the other two instances being used there, it had to create a branch table for them (the branch prediction still works, but the overhead is higher).
Question
What's unclear: The JVM can move this test out of the loop and thus reduce it's cost to nearly nothing. I can't tell why it doesn't happen.

In short, the JIT can optimises a single method call, and two method calls, in ways it cannot with more multi-polymorphic calls. The number of possible methods which might be called on any given line is what matters and the JIT builds up this picture over time. When a method is inlined further optimisations are possible, but in your case the line in question increases the number of possible method calls from test1 over the life of the run and so it gets slower.
The way I get around this is to duplicate the short test code so each class is tested equally (assuming this is realistic) If you program will be multi-polymorphic when it is running, this is what you should test to be realistic as you can see it can change the results.
When you run the method from a fresh loop you see the benefit of only calling one method from that line of code.
Here is a table of different costs you might see depending on the number of possible methods any individual line can call. http://vanillajava.blogspot.co.uk/2012/12/performance-of-inlined-virtual-method.html
Polymorphism is not designed to improve performance and for me it is entirely reasonable that as the complexity of the polymorphism increases it should be slower.
BTW making methods final doesn't improve the performance any more. The JIT works out if you have called a sub-class on a line by line basis (as discussed)
EDIT As you can see the client JVM doesn't optimise the code as much as it is designed fr relatively light eight startup times. This means the client JVM is more consistent, but consistently slower. If you want the best performance you need to consider a number of optimisation strategies which leads to multiple possible outcomes depending on whether the optimisation is applied or not.

Related

Getting StackOverflow exception during some calculations [duplicate]

What is a StackOverflowError, what causes it, and how should I deal with them?

Parameters and local variables are allocated on the stack (with reference types, the object lives on the heap and a variable in the stack references that object on the heap). The stack typically lives at the upper end of your address space and as it is used up it heads towards the bottom of the address space (i.e. towards zero).
Your process also has a heap, which lives at the bottom end of your process. As you allocate memory, this heap can grow towards the upper end of your address space. As you can see, there is a potential for the heap to "collide" with the stack (a bit like tectonic plates!!!).
The common cause for a stack overflow is a bad recursive call. Typically, this is caused when your recursive functions doesn't have the correct termination condition, so it ends up calling itself forever. Or when the termination condition is fine, it can be caused by requiring too many recursive calls before fulfilling it.
However, with GUI programming, it's possible to generate indirect recursion. For example, your app may be handling paint messages, and, whilst processing them, it may call a function that causes the system to send another paint message. Here you've not explicitly called yourself, but the OS/VM has done it for you.
To deal with them, you'll need to examine your code. If you've got functions that call themselves then check that you've got a terminating condition. If you have, then check that when calling the function you have at least modified one of the arguments, otherwise there'll be no visible change for the recursively called function and the terminating condition is useless. Also mind that your stack space can run out of memory before reaching a valid terminating condition, thus make sure your method can handle input values requiring more recursive calls.
If you've got no obvious recursive functions then check to see if you're calling any library functions that indirectly will cause your function to be called (like the implicit case above).

To describe this, first let us understand how local variables and objects are stored.
Local variable are stored on the stack:
If you looked at the image you should be able to understand how things are working.
When a function call is invoked by a Java application, a stack frame is allocated on the call stack. The stack frame contains the parameters of the invoked method, its local parameters, and the return address of the method. The return address denotes the execution point from which, the program execution shall continue after the invoked method returns. If there is no space for a new stack frame then, the StackOverflowError is thrown by the Java Virtual Machine (JVM).
The most common case that can possibly exhaust a Java application’s stack is recursion. In recursion, a method invokes itself during its execution. Recursion is considered as a powerful general-purpose programming technique, but it must be used with caution, to avoid StackOverflowError.
An example of throwing a StackOverflowError is shown below:
StackOverflowErrorExample.java:
public class StackOverflowErrorExample {
public static void recursivePrint(int num) {
System.out.println("Number: " + num);
if (num == 0)
return;
else
recursivePrint(++num);
}
public static void main(String[] args) {
StackOverflowErrorExample.recursivePrint(1);
}
}
In this example, we define a recursive method, called recursivePrint that prints an integer and then, calls itself, with the next successive integer as an argument. The recursion ends until we pass in 0 as a parameter. However, in our example, we passed in the parameter from 1 and its increasing followers, consequently, the recursion will never terminate.
A sample execution, using the -Xss1M flag that specifies the size of the thread stack to equal to 1 MB, is shown below:
Number: 1
Number: 2
Number: 3
...
Number: 6262
Number: 6263
Number: 6264
Number: 6265
Number: 6266
Exception in thread "main" java.lang.StackOverflowError
at java.io.PrintStream.write(PrintStream.java:480)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.write(PrintStream.java:527)
at java.io.PrintStream.print(PrintStream.java:669)
at java.io.PrintStream.println(PrintStream.java:806)
at StackOverflowErrorExample.recursivePrint(StackOverflowErrorExample.java:4)
at StackOverflowErrorExample.recursivePrint(StackOverflowErrorExample.java:9)
at StackOverflowErrorExample.recursivePrint(StackOverflowErrorExample.java:9)
at StackOverflowErrorExample.recursivePrint(StackOverflowErrorExample.java:9)
...
Depending on the JVM’s initial configuration, the results may differ, but eventually the StackOverflowError shall be thrown. This example is a very good example of how recursion can cause problems, if not implemented with caution.
How to deal with the StackOverflowError
The simplest solution is to carefully inspect the stack trace and
detect the repeating pattern of line numbers. These line numbers
indicate the code being recursively called. Once you detect these
lines, you must carefully inspect your code and understand why the
recursion never terminates.
If you have verified that the recursion
is implemented correctly, you can increase the stack’s size, in
order to allow a larger number of invocations. Depending on the Java
Virtual Machine (JVM) installed, the default thread stack size may
equal to either 512 KB, or 1 MB. You can increase the thread stack
size using the -Xss flag. This flag can be specified either via the
project’s configuration, or via the command line. The format of the
-Xss argument is:
-Xss<size>[g|G|m|M|k|K]

If you have a function like:
int foo()
{
// more stuff
foo();
}
Then foo() will keep calling itself, getting deeper and deeper, and when the space used to keep track of what functions you're in is filled up, you get the stack overflow error.

Stack overflow means exactly that: a stack overflows. Usually there's a one stack in the program that contains local-scope variables and addresses where to return when execution of a routine ends. That stack tends to be a fixed memory range somewhere in the memory, therefore it's limited how much it can contain values.
If the stack is empty you can't pop, if you do you'll get stack underflow error.
If the stack is full you can't push, if you do you'll get stack overflow error.
So stack overflow appears where you allocate too much into the stack. For instance, in the mentioned recursion.
Some implementations optimize out some forms of recursions. Tail recursion in particular. Tail recursive routines are form of routines where the recursive call appears as a final thing what the routine does. Such routine call gets simply reduced into a jump.
Some implementations go so far as implement their own stacks for recursion, therefore they allow the recursion to continue until the system runs out of memory.
Easiest thing you could try would be to increase your stack size if you can. If you can't do that though, the second best thing would be to look whether there's something that clearly causes the stack overflow. Try it by printing something before and after the call into routine. This helps you to find out the failing routine.

A stack overflow is usually called by nesting function calls too deeply (especially easy when using recursion, i.e. a function that calls itself) or allocating a large amount of memory on the stack where using the heap would be more appropriate.

Like you say, you need to show some code. :-)
A stack overflow error usually happens when your function calls nest too deeply. See the Stack Overflow Code Golf thread for some examples of how this happens (though in the case of that question, the answers intentionally cause stack overflow).

A StackOverflowError is a runtime error in Java.
It is thrown when the amount of call stack memory allocated by the JVM is exceeded.
A common case of a StackOverflowError being thrown, is when the call stack exceeds due to excessive deep or infinite recursion.
Example:
public class Factorial {
public static int factorial(int n){
if(n == 1){
return 1;
}
else{
return n * factorial(n-1);
}
}
public static void main(String[] args){
System.out.println("Main method started");
int result = Factorial.factorial(-1);
System.out.println("Factorial ==>"+result);
System.out.println("Main method ended");
}
}
Stack trace:
Main method started
Exception in thread "main" java.lang.StackOverflowError
at com.program.stackoverflow.Factorial.factorial(Factorial.java:9)
at com.program.stackoverflow.Factorial.factorial(Factorial.java:9)
at com.program.stackoverflow.Factorial.factorial(Factorial.java:9)
In the above case, it can be avoided by doing programmatic changes.
But if the program logic is correct and it still occurs then your stack size needs to be increased.

StackOverflowError is to the stack as OutOfMemoryError is to the heap.
Unbounded recursive calls result in stack space being used up.
The following example produces StackOverflowError:
class StackOverflowDemo
{
public static void unboundedRecursiveCall() {
unboundedRecursiveCall();
}
public static void main(String[] args)
{
unboundedRecursiveCall();
}
}
StackOverflowError is avoidable if recursive calls are bounded to prevent the aggregate total of incomplete in-memory calls (in bytes) from exceeding the stack size (in bytes).

The most common cause of stack overflows is excessively deep or infinite recursion. If this is your problem, this tutorial about Java Recursion could help understand the problem.

Here is an example of a recursive algorithm for reversing a singly linked list. On a laptop (with the specifications 4 GB memory, Intel Core i5 2.3 GHz CPU 64 bit and Windows 7), this function will run into StackOverflow error for a linked list of size close to 10,000.
My point is that we should use recursion judiciously, always taking into account of the scale of the system.
Often recursion can be converted to iterative program, which scales better. (One iterative version of the same algorithm is given at the bottom of the page. It reverses a singly linked list of size 1 million in 9 milliseconds.)
private static LinkedListNode doReverseRecursively(LinkedListNode x, LinkedListNode first){
LinkedListNode second = first.next;
first.next = x;
if(second != null){
return doReverseRecursively(first, second);
}else{
return first;
}
}
public static LinkedListNode reverseRecursively(LinkedListNode head){
return doReverseRecursively(null, head);
}
Iterative Version of the Same Algorithm:
public static LinkedListNode reverseIteratively(LinkedListNode head){
return doReverseIteratively(null, head);
}
private static LinkedListNode doReverseIteratively(LinkedListNode x, LinkedListNode first) {
while (first != null) {
LinkedListNode second = first.next;
first.next = x;
x = first;
if (second == null) {
break;
} else {
first = second;
}
}
return first;
}
public static LinkedListNode reverseIteratively(LinkedListNode head){
return doReverseIteratively(null, head);
}

The stack has a space limit that depends on the operating system. The normal size is 8 MB (in Ubuntu (Linux), you can check that limit with $ ulimit -u and it can be checked in other OS similarly). Any program makes use of the stack at runtime, but to fully know when it is used you need to check the assembly language. In x86_64 for example, the stack is used to:
Save the return address when making a procedure call
Save local variables
Save special registers to restore them later
Pass arguments to a procedure call (more than 6)
Other: random unused stack base, canary values, padding, ... etc.
If you don't know x86_64 (normal case) you only need to know when the specific high-level programming language you are using compile to those actions. For example in C:
(1) → a function call
(2) → local variables in function calls (including main)
(3) → local variables in function calls (not main)
(4) → a function call
(5) → normally a function call, it is generally irrelevant for a stack overflow.
So, in C, only local variables and function calls make use of the stack. The two (unique?) ways of making a stack overflow are:
Declaring too large local variables in main or in any function that it's called in (int array[10000][10000];)
A very deep or infinite recursion (too many function calls at the same time).
To avoid a StackOverflowError you can:
check if local variables are too big (order of 1 MB) → use the heap (malloc/calloc calls) or global variables.
check for infinite recursion → you know what to do... correct it!
check for normal too deep recursion → the easiest approach is to just change the implementation to be iterative.
Notice also that global variables, include libraries, etc... don't make use of the stack.
Only if the above does not work, change the stack size to the maximum on the specific OS. With Ubuntu for example: ulimit -s 32768 (32 MB). (This has never been the solution for any of my stack overflow errors, but I also don't have much experience.)
I have omitted special and/or not standard cases in C (such as usage of alloc() and similar) because if you are using them you should already know exactly what you are doing.

In a crunch, the below situation will bring a stack overflow error.
public class Example3 {
public static void main(String[] args) {
main(new String[1]);
}
}

A simple Java example that causes java.lang.StackOverflowError because of a bad recursive call:
class Human {
Human(){
new Animal();
}
}
class Animal extends Human {
Animal(){
super();
}
}
public class Test01 {
public static void main(String[] args) {
new Animal();
}
}

Many answers to this question are good. However, I would like to take a slightly different approach and give some more insight into how memory works and also a (simplified) visualization to better understand StackOverflow errors. This understanding does not only apply to Java but all processes alike.
On modern systems all new processes get their own virtual address space (VAS). In essence VAS is an abstraction layer provided by the operating system on top of physical memory in order to ensure processes do not interfere with each others memory. It's the kernels job to then map the virtual addresses provided to to the actual physical addresses.
VAS can be divided into a couple of sections:
In order to let the CPU know what it is supposed to do machine instructions must be loaded into memory. This is usually referred to as the code or text segment and of static size.
On top of that one can find the data segment and heap. The data segment is of fixed size and contains global or static variables.
As a program runs into special conditions it may need to additionally allocate data, which is where the heap comes into play and is therefore able to dynamically grow in size.
The stack is located on the other side of the virtual address space and (among other things) keeps track of all function calls using a LIFO data structure. Similar to the heap a program may need additional space during runtime to keep track of new function calls being invoked. Since the stack is located on the other side of the VAS it is growing into the opposite direction i.e. towards the heap.
TL;DR
This is where the StackOverflow error comes into play.
Since the stack grows down (towards the heap) it may so happen that at some point in time it cannot grow further as it would overlap with the heap address space. Once that happens the StackOverflow error occurs.
The most common reason as to why this happens is due to a bug in the program making recursive calls that do not terminate properly.
Note that on some systems VAS may behave slightly different an can be divided into even more segments, however, this general understanding applies to all UNIX systems.

Here's an example
public static void main(String[] args) {
System.out.println(add5(1));
}
public static int add5(int a) {
return add5(a) + 5;
}
A StackOverflowError basically is when you try to do something, that most likely calls itself, and goes on for infinity (or until it gives a StackOverflowError).
add5(a) will call itself, and then call itself again, and so on.

This is a typical case of java.lang.StackOverflowError... The method is recursively calling itself with no exit in doubleValue(), floatValue(), etc.
File Rational.java
public class Rational extends Number implements Comparable<Rational> {
private int num;
private int denom;
public Rational(int num, int denom) {
this.num = num;
this.denom = denom;
}
public int compareTo(Rational r) {
if ((num / denom) - (r.num / r.denom) > 0) {
return +1;
} else if ((num / denom) - (r.num / r.denom) < 0) {
return -1;
}
return 0;
}
public Rational add(Rational r) {
return new Rational(num + r.num, denom + r.denom);
}
public Rational sub(Rational r) {
return new Rational(num - r.num, denom - r.denom);
}
public Rational mul(Rational r) {
return new Rational(num * r.num, denom * r.denom);
}
public Rational div(Rational r) {
return new Rational(num * r.denom, denom * r.num);
}
public int gcd(Rational r) {
int i = 1;
while (i != 0) {
i = denom % r.denom;
denom = r.denom;
r.denom = i;
}
return denom;
}
public String toString() {
String a = num + "/" + denom;
return a;
}
public double doubleValue() {
return (double) doubleValue();
}
public float floatValue() {
return (float) floatValue();
}
public int intValue() {
return (int) intValue();
}
public long longValue() {
return (long) longValue();
}
}
File Main.java
public class Main {
public static void main(String[] args) {
Rational a = new Rational(2, 4);
Rational b = new Rational(2, 6);
System.out.println(a + " + " + b + " = " + a.add(b));
System.out.println(a + " - " + b + " = " + a.sub(b));
System.out.println(a + " * " + b + " = " + a.mul(b));
System.out.println(a + " / " + b + " = " + a.div(b));
Rational[] arr = {new Rational(7, 1), new Rational(6, 1),
new Rational(5, 1), new Rational(4, 1),
new Rational(3, 1), new Rational(2, 1),
new Rational(1, 1), new Rational(1, 2),
new Rational(1, 3), new Rational(1, 4),
new Rational(1, 5), new Rational(1, 6),
new Rational(1, 7), new Rational(1, 8),
new Rational(1, 9), new Rational(0, 1)};
selectSort(arr);
for (int i = 0; i < arr.length - 1; ++i) {
if (arr[i].compareTo(arr[i + 1]) > 0) {
System.exit(1);
}
}
Number n = new Rational(3, 2);
System.out.println(n.doubleValue());
System.out.println(n.floatValue());
System.out.println(n.intValue());
System.out.println(n.longValue());
}
public static <T extends Comparable<? super T>> void selectSort(T[] array) {
T temp;
int mini;
for (int i = 0; i < array.length - 1; ++i) {
mini = i;
for (int j = i + 1; j < array.length; ++j) {
if (array[j].compareTo(array[mini]) < 0) {
mini = j;
}
}
if (i != mini) {
temp = array[i];
array[i] = array[mini];
array[mini] = temp;
}
}
}
}
Result
2/4 + 2/6 = 4/10
Exception in thread "main" java.lang.StackOverflowError
2/4 - 2/6 = 0/-2
at com.xetrasu.Rational.doubleValue(Rational.java:64)
2/4 * 2/6 = 4/24
at com.xetrasu.Rational.doubleValue(Rational.java:64)
2/4 / 2/6 = 12/8
at com.xetrasu.Rational.doubleValue(Rational.java:64)
at com.xetrasu.Rational.doubleValue(Rational.java:64)
at com.xetrasu.Rational.doubleValue(Rational.java:64)
at com.xetrasu.Rational.doubleValue(Rational.java:64)
at com.xetrasu.Rational.doubleValue(Rational.java:64)
Here is the source code of StackOverflowError in OpenJDK 7.

Eclipse android developer an internal error occurred [duplicate]

What is a StackOverflowError, what causes it, and how should I deal with them?

Parameters and local variables are allocated on the stack (with reference types, the object lives on the heap and a variable in the stack references that object on the heap). The stack typically lives at the upper end of your address space and as it is used up it heads towards the bottom of the address space (i.e. towards zero).
Your process also has a heap, which lives at the bottom end of your process. As you allocate memory, this heap can grow towards the upper end of your address space. As you can see, there is a potential for the heap to "collide" with the stack (a bit like tectonic plates!!!).
The common cause for a stack overflow is a bad recursive call. Typically, this is caused when your recursive functions doesn't have the correct termination condition, so it ends up calling itself forever. Or when the termination condition is fine, it can be caused by requiring too many recursive calls before fulfilling it.
However, with GUI programming, it's possible to generate indirect recursion. For example, your app may be handling paint messages, and, whilst processing them, it may call a function that causes the system to send another paint message. Here you've not explicitly called yourself, but the OS/VM has done it for you.
To deal with them, you'll need to examine your code. If you've got functions that call themselves then check that you've got a terminating condition. If you have, then check that when calling the function you have at least modified one of the arguments, otherwise there'll be no visible change for the recursively called function and the terminating condition is useless. Also mind that your stack space can run out of memory before reaching a valid terminating condition, thus make sure your method can handle input values requiring more recursive calls.
If you've got no obvious recursive functions then check to see if you're calling any library functions that indirectly will cause your function to be called (like the implicit case above).

To describe this, first let us understand how local variables and objects are stored.
Local variable are stored on the stack:
If you looked at the image you should be able to understand how things are working.
When a function call is invoked by a Java application, a stack frame is allocated on the call stack. The stack frame contains the parameters of the invoked method, its local parameters, and the return address of the method. The return address denotes the execution point from which, the program execution shall continue after the invoked method returns. If there is no space for a new stack frame then, the StackOverflowError is thrown by the Java Virtual Machine (JVM).
The most common case that can possibly exhaust a Java application’s stack is recursion. In recursion, a method invokes itself during its execution. Recursion is considered as a powerful general-purpose programming technique, but it must be used with caution, to avoid StackOverflowError.
An example of throwing a StackOverflowError is shown below:
StackOverflowErrorExample.java:
public class StackOverflowErrorExample {
public static void recursivePrint(int num) {
System.out.println("Number: " + num);
if (num == 0)
return;
else
recursivePrint(++num);
}
public static void main(String[] args) {
StackOverflowErrorExample.recursivePrint(1);
}
}
In this example, we define a recursive method, called recursivePrint that prints an integer and then, calls itself, with the next successive integer as an argument. The recursion ends until we pass in 0 as a parameter. However, in our example, we passed in the parameter from 1 and its increasing followers, consequently, the recursion will never terminate.
A sample execution, using the -Xss1M flag that specifies the size of the thread stack to equal to 1 MB, is shown below:
Number: 1
Number: 2
Number: 3
...
Number: 6262
Number: 6263
Number: 6264
Number: 6265
Number: 6266
Exception in thread "main" java.lang.StackOverflowError
at java.io.PrintStream.write(PrintStream.java:480)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.write(PrintStream.java:527)
at java.io.PrintStream.print(PrintStream.java:669)
at java.io.PrintStream.println(PrintStream.java:806)
at StackOverflowErrorExample.recursivePrint(StackOverflowErrorExample.java:4)
at StackOverflowErrorExample.recursivePrint(StackOverflowErrorExample.java:9)
at StackOverflowErrorExample.recursivePrint(StackOverflowErrorExample.java:9)
at StackOverflowErrorExample.recursivePrint(StackOverflowErrorExample.java:9)
...
Depending on the JVM’s initial configuration, the results may differ, but eventually the StackOverflowError shall be thrown. This example is a very good example of how recursion can cause problems, if not implemented with caution.
How to deal with the StackOverflowError
The simplest solution is to carefully inspect the stack trace and
detect the repeating pattern of line numbers. These line numbers
indicate the code being recursively called. Once you detect these
lines, you must carefully inspect your code and understand why the
recursion never terminates.
If you have verified that the recursion
is implemented correctly, you can increase the stack’s size, in
order to allow a larger number of invocations. Depending on the Java
Virtual Machine (JVM) installed, the default thread stack size may
equal to either 512 KB, or 1 MB. You can increase the thread stack
size using the -Xss flag. This flag can be specified either via the
project’s configuration, or via the command line. The format of the
-Xss argument is:
-Xss<size>[g|G|m|M|k|K]

If you have a function like:
int foo()
{
// more stuff
foo();
}
Then foo() will keep calling itself, getting deeper and deeper, and when the space used to keep track of what functions you're in is filled up, you get the stack overflow error.

Stack overflow means exactly that: a stack overflows. Usually there's a one stack in the program that contains local-scope variables and addresses where to return when execution of a routine ends. That stack tends to be a fixed memory range somewhere in the memory, therefore it's limited how much it can contain values.
If the stack is empty you can't pop, if you do you'll get stack underflow error.
If the stack is full you can't push, if you do you'll get stack overflow error.
So stack overflow appears where you allocate too much into the stack. For instance, in the mentioned recursion.
Some implementations optimize out some forms of recursions. Tail recursion in particular. Tail recursive routines are form of routines where the recursive call appears as a final thing what the routine does. Such routine call gets simply reduced into a jump.
Some implementations go so far as implement their own stacks for recursion, therefore they allow the recursion to continue until the system runs out of memory.
Easiest thing you could try would be to increase your stack size if you can. If you can't do that though, the second best thing would be to look whether there's something that clearly causes the stack overflow. Try it by printing something before and after the call into routine. This helps you to find out the failing routine.

A stack overflow is usually called by nesting function calls too deeply (especially easy when using recursion, i.e. a function that calls itself) or allocating a large amount of memory on the stack where using the heap would be more appropriate.

Like you say, you need to show some code. :-)
A stack overflow error usually happens when your function calls nest too deeply. See the Stack Overflow Code Golf thread for some examples of how this happens (though in the case of that question, the answers intentionally cause stack overflow).

A StackOverflowError is a runtime error in Java.
It is thrown when the amount of call stack memory allocated by the JVM is exceeded.
A common case of a StackOverflowError being thrown, is when the call stack exceeds due to excessive deep or infinite recursion.
Example:
public class Factorial {
public static int factorial(int n){
if(n == 1){
return 1;
}
else{
return n * factorial(n-1);
}
}
public static void main(String[] args){
System.out.println("Main method started");
int result = Factorial.factorial(-1);
System.out.println("Factorial ==>"+result);
System.out.println("Main method ended");
}
}
Stack trace:
Main method started
Exception in thread "main" java.lang.StackOverflowError
at com.program.stackoverflow.Factorial.factorial(Factorial.java:9)
at com.program.stackoverflow.Factorial.factorial(Factorial.java:9)
at com.program.stackoverflow.Factorial.factorial(Factorial.java:9)
In the above case, it can be avoided by doing programmatic changes.
But if the program logic is correct and it still occurs then your stack size needs to be increased.

StackOverflowError is to the stack as OutOfMemoryError is to the heap.
Unbounded recursive calls result in stack space being used up.
The following example produces StackOverflowError:
class StackOverflowDemo
{
public static void unboundedRecursiveCall() {
unboundedRecursiveCall();
}
public static void main(String[] args)
{
unboundedRecursiveCall();
}
}
StackOverflowError is avoidable if recursive calls are bounded to prevent the aggregate total of incomplete in-memory calls (in bytes) from exceeding the stack size (in bytes).

The most common cause of stack overflows is excessively deep or infinite recursion. If this is your problem, this tutorial about Java Recursion could help understand the problem.

Here is an example of a recursive algorithm for reversing a singly linked list. On a laptop (with the specifications 4 GB memory, Intel Core i5 2.3 GHz CPU 64 bit and Windows 7), this function will run into StackOverflow error for a linked list of size close to 10,000.
My point is that we should use recursion judiciously, always taking into account of the scale of the system.
Often recursion can be converted to iterative program, which scales better. (One iterative version of the same algorithm is given at the bottom of the page. It reverses a singly linked list of size 1 million in 9 milliseconds.)
private static LinkedListNode doReverseRecursively(LinkedListNode x, LinkedListNode first){
LinkedListNode second = first.next;
first.next = x;
if(second != null){
return doReverseRecursively(first, second);
}else{
return first;
}
}
public static LinkedListNode reverseRecursively(LinkedListNode head){
return doReverseRecursively(null, head);
}
Iterative Version of the Same Algorithm:
public static LinkedListNode reverseIteratively(LinkedListNode head){
return doReverseIteratively(null, head);
}
private static LinkedListNode doReverseIteratively(LinkedListNode x, LinkedListNode first) {
while (first != null) {
LinkedListNode second = first.next;
first.next = x;
x = first;
if (second == null) {
break;
} else {
first = second;
}
}
return first;
}
public static LinkedListNode reverseIteratively(LinkedListNode head){
return doReverseIteratively(null, head);
}

The stack has a space limit that depends on the operating system. The normal size is 8 MB (in Ubuntu (Linux), you can check that limit with $ ulimit -u and it can be checked in other OS similarly). Any program makes use of the stack at runtime, but to fully know when it is used you need to check the assembly language. In x86_64 for example, the stack is used to:
Save the return address when making a procedure call
Save local variables
Save special registers to restore them later
Pass arguments to a procedure call (more than 6)
Other: random unused stack base, canary values, padding, ... etc.
If you don't know x86_64 (normal case) you only need to know when the specific high-level programming language you are using compile to those actions. For example in C:
(1) → a function call
(2) → local variables in function calls (including main)
(3) → local variables in function calls (not main)
(4) → a function call
(5) → normally a function call, it is generally irrelevant for a stack overflow.
So, in C, only local variables and function calls make use of the stack. The two (unique?) ways of making a stack overflow are:
Declaring too large local variables in main or in any function that it's called in (int array[10000][10000];)
A very deep or infinite recursion (too many function calls at the same time).
To avoid a StackOverflowError you can:
check if local variables are too big (order of 1 MB) → use the heap (malloc/calloc calls) or global variables.
check for infinite recursion → you know what to do... correct it!
check for normal too deep recursion → the easiest approach is to just change the implementation to be iterative.
Notice also that global variables, include libraries, etc... don't make use of the stack.
Only if the above does not work, change the stack size to the maximum on the specific OS. With Ubuntu for example: ulimit -s 32768 (32 MB). (This has never been the solution for any of my stack overflow errors, but I also don't have much experience.)
I have omitted special and/or not standard cases in C (such as usage of alloc() and similar) because if you are using them you should already know exactly what you are doing.

In a crunch, the below situation will bring a stack overflow error.
public class Example3 {
public static void main(String[] args) {
main(new String[1]);
}
}

A simple Java example that causes java.lang.StackOverflowError because of a bad recursive call:
class Human {
Human(){
new Animal();
}
}
class Animal extends Human {
Animal(){
super();
}
}
public class Test01 {
public static void main(String[] args) {
new Animal();
}
}

Many answers to this question are good. However, I would like to take a slightly different approach and give some more insight into how memory works and also a (simplified) visualization to better understand StackOverflow errors. This understanding does not only apply to Java but all processes alike.
On modern systems all new processes get their own virtual address space (VAS). In essence VAS is an abstraction layer provided by the operating system on top of physical memory in order to ensure processes do not interfere with each others memory. It's the kernels job to then map the virtual addresses provided to to the actual physical addresses.
VAS can be divided into a couple of sections:
In order to let the CPU know what it is supposed to do machine instructions must be loaded into memory. This is usually referred to as the code or text segment and of static size.
On top of that one can find the data segment and heap. The data segment is of fixed size and contains global or static variables.
As a program runs into special conditions it may need to additionally allocate data, which is where the heap comes into play and is therefore able to dynamically grow in size.
The stack is located on the other side of the virtual address space and (among other things) keeps track of all function calls using a LIFO data structure. Similar to the heap a program may need additional space during runtime to keep track of new function calls being invoked. Since the stack is located on the other side of the VAS it is growing into the opposite direction i.e. towards the heap.
TL;DR
This is where the StackOverflow error comes into play.
Since the stack grows down (towards the heap) it may so happen that at some point in time it cannot grow further as it would overlap with the heap address space. Once that happens the StackOverflow error occurs.
The most common reason as to why this happens is due to a bug in the program making recursive calls that do not terminate properly.
Note that on some systems VAS may behave slightly different an can be divided into even more segments, however, this general understanding applies to all UNIX systems.

Here's an example
public static void main(String[] args) {
System.out.println(add5(1));
}
public static int add5(int a) {
return add5(a) + 5;
}
A StackOverflowError basically is when you try to do something, that most likely calls itself, and goes on for infinity (or until it gives a StackOverflowError).
add5(a) will call itself, and then call itself again, and so on.

This is a typical case of java.lang.StackOverflowError... The method is recursively calling itself with no exit in doubleValue(), floatValue(), etc.
File Rational.java
public class Rational extends Number implements Comparable<Rational> {
private int num;
private int denom;
public Rational(int num, int denom) {
this.num = num;
this.denom = denom;
}
public int compareTo(Rational r) {
if ((num / denom) - (r.num / r.denom) > 0) {
return +1;
} else if ((num / denom) - (r.num / r.denom) < 0) {
return -1;
}
return 0;
}
public Rational add(Rational r) {
return new Rational(num + r.num, denom + r.denom);
}
public Rational sub(Rational r) {
return new Rational(num - r.num, denom - r.denom);
}
public Rational mul(Rational r) {
return new Rational(num * r.num, denom * r.denom);
}
public Rational div(Rational r) {
return new Rational(num * r.denom, denom * r.num);
}
public int gcd(Rational r) {
int i = 1;
while (i != 0) {
i = denom % r.denom;
denom = r.denom;
r.denom = i;
}
return denom;
}
public String toString() {
String a = num + "/" + denom;
return a;
}
public double doubleValue() {
return (double) doubleValue();
}
public float floatValue() {
return (float) floatValue();
}
public int intValue() {
return (int) intValue();
}
public long longValue() {
return (long) longValue();
}
}
File Main.java
public class Main {
public static void main(String[] args) {
Rational a = new Rational(2, 4);
Rational b = new Rational(2, 6);
System.out.println(a + " + " + b + " = " + a.add(b));
System.out.println(a + " - " + b + " = " + a.sub(b));
System.out.println(a + " * " + b + " = " + a.mul(b));
System.out.println(a + " / " + b + " = " + a.div(b));
Rational[] arr = {new Rational(7, 1), new Rational(6, 1),
new Rational(5, 1), new Rational(4, 1),
new Rational(3, 1), new Rational(2, 1),
new Rational(1, 1), new Rational(1, 2),
new Rational(1, 3), new Rational(1, 4),
new Rational(1, 5), new Rational(1, 6),
new Rational(1, 7), new Rational(1, 8),
new Rational(1, 9), new Rational(0, 1)};
selectSort(arr);
for (int i = 0; i < arr.length - 1; ++i) {
if (arr[i].compareTo(arr[i + 1]) > 0) {
System.exit(1);
}
}
Number n = new Rational(3, 2);
System.out.println(n.doubleValue());
System.out.println(n.floatValue());
System.out.println(n.intValue());
System.out.println(n.longValue());
}
public static <T extends Comparable<? super T>> void selectSort(T[] array) {
T temp;
int mini;
for (int i = 0; i < array.length - 1; ++i) {
mini = i;
for (int j = i + 1; j < array.length; ++j) {
if (array[j].compareTo(array[mini]) < 0) {
mini = j;
}
}
if (i != mini) {
temp = array[i];
array[i] = array[mini];
array[mini] = temp;
}
}
}
}
Result
2/4 + 2/6 = 4/10
Exception in thread "main" java.lang.StackOverflowError
2/4 - 2/6 = 0/-2
at com.xetrasu.Rational.doubleValue(Rational.java:64)
2/4 * 2/6 = 4/24
at com.xetrasu.Rational.doubleValue(Rational.java:64)
2/4 / 2/6 = 12/8
at com.xetrasu.Rational.doubleValue(Rational.java:64)
at com.xetrasu.Rational.doubleValue(Rational.java:64)
at com.xetrasu.Rational.doubleValue(Rational.java:64)
at com.xetrasu.Rational.doubleValue(Rational.java:64)
at com.xetrasu.Rational.doubleValue(Rational.java:64)
Here is the source code of StackOverflowError in OpenJDK 7.

does splitting logic into multiple methods in java slows down execution? if yes then shall we avoid re-factoring of code

http://s24.postimg.org/9y073weid/refactor_vs_non_refactor.png
Well here is the result for the execution time in nanoseconds for re-factored and non re-factored code for a simple addition operation. 1 to 5 are the consecutive runs of the code.
My intent was just to find out whether splitting up logic into multiple methods makes execution slow or not and here is the result which shows that yes there is considerable time that goes into just putting the methods on stack.
I invite people who have done some research on it before or want to investigate on this area to correct me if I am doing something wrong and draw some conclusive results out of this.
In my opinion yes code re-factoring does help in making code more structured and understandable but in time critical systems like real time game engines I would prefer not to re-factor.
Following was the simple code which I used:
package com.sim;
public class NonThreadedMethodCallBenchMark{
public static void main(String args[]){
NonThreadedMethodCallBenchMark testObject = new NonThreadedMethodCallBenchMark();
System.out.println("************Starting***************");
long startTime =System.nanoTime();
for(int i=0;i<900000;i++){
//testObject.method(1, 2); // Uncomment this line and comment the line below to test refactor time
//testObject.method5(1,2); // uncomment this line and comment the above line to test non refactor time
}
long endTime =System.nanoTime();
System.out.println("Total :" +(endTime-startTime)+" nanoseconds");
}
public int method(int a , int b){
return method1(a,b);
}
public int method1(int a, int b){
return method2(a,b);
}
public int method2(int a, int b){
return method3(a,b);
}
public int method3(int a, int b){
return method4(a,b);
}
public int method4(int a, int b){
return method5(a,b);
}
public int method5(int a, int b){
return a+b;
}
public void run() {
int x=method(1,2);
}
}

You should consider that code which doesn't do anything useful can be optimised away. If you are not careful you can be timing how long it takes to detect the code doesn't do anything useful, rather than run the code. If you use multiple methods, it can take longer to detect useless code and thus give different results. I would always look at the steady state performance after the code has warmed up.
For the most expensive parts of your code, small methods will be inlined so they won't make any difference to performance cost. What can happen is
smaller methods can be optimised better as complex methods can defeat optimisation tricks.
smaller methods can be eliminated as they are inlined.
If you don't ever warm up the code, it is likely to be slower. However, if code is called rarely, it is unlikely to matter (except in low latency system, in which case I suggest you warm up your code on startup)
If you run
System.out.println("************Starting***************");
for (int j = 0; j < 10; j++) {
long startTime = System.nanoTime();
for (int i = 0; i < 1000000; i++) {
testObject.method(1, 2);
//testObject.method5(1,2); // uncomment this line and comment the above line to test non refactor time
}
long endTime = System.nanoTime();
System.out.println("Total :" + (endTime - startTime) + " nanoseconds");
}
prints
************Starting***************
Total :8644835 nanoseconds
Total :3363047 nanoseconds
Total :52 nanoseconds
Total :30 nanoseconds
Total :30 nanoseconds
Note: the 30 nano-seconds is the time it takes to perform a System.nanoTime() call. The inner loop and the methods calls have been eliminated.

Yes, extra method calls cost extra time (except in the circumstance where the compiler/jitter actually does inlining, which I think some of them will do sometimes but under circumstances that you will find hard to control).
You shouldn't worry about this except for the most expensive part of your code, because you won't be able to see the difference in most cases. In those cases where performance doesn't matter, you should refactor to maximize code clarity.
I suggest you refactor at will, occasionally measure performance using a profiler to find the expensive parts. Only when the profiler shows that some specific code is using a significant fraction of the time, should you worry about about function calls (or other speed vs. clarity tradeoffs) in that specific code. You will discover that the performance bottlenecks are often in places that you wouldn't have guessed.

Android: How much overhead is generated by running an empty method?

I have created a class to handle my debug outputs so that I don't need to strip out all my log outputs before release.
public class Debug {
public static void debug( String module, String message) {
if( Release.DEBUG )
Log.d(module, message);
}
}
After reading another question, I have learned that the contents of the if statement are not compiled if the constant Release.DEBUG is false.
What I want to know is how much overhead is generated by running this empty method? (Once the if clause is removed there is no code left in the method) Is it going to have any impact on my application? Obviously performance is a big issue when writing for mobile handsets =P
Thanks
Gary

Measurements done on Nexus S with Android 2.3.2:
10^6 iterations of 1000 calls to an empty static void function: 21s <==> 21ns/call
10^6 iterations of 1000 calls to an empty non-static void function: 65s <==> 65ns/call
10^6 iterations of 500 calls to an empty static void function: 3.5s <==> 7ns/call
10^6 iterations of 500 calls to an empty non-static void function: 28s <==> 56ns/call
10^6 iterations of 100 calls to an empty static void function: 2.4s <==> 24ns/call
10^6 iterations of 100 calls to an empty non-static void function: 2.9s <==> 29ns/call
control:
10^6 iterations of an empty loop: 41ms <==> 41ns/iteration
10^7 iterations of an empty loop: 560ms <==> 56ns/iteration
10^9 iterations of an empty loop: 9300ms <==> 9.3ns/iteration
I've repeated the measurements several times. No significant deviations were found.
You can see that the per-call cost can vary greatly depending on workload (possibly due to JIT compiling),
but 3 conclusions can be drawn:
dalvik/java sucks at optimizing dead code
static function calls can be optimized much better than non-static
(non-static functions are virtual and need to be looked up in a virtual table)
the cost on nexus s is not greater than 70ns/call (thats ~70 cpu cycles)
and is comparable with the cost of one empty for loop iteration (i.e. one increment and one condition check on a local variable)
Observe that in your case the string argument will always be evaluated. If you do string concatenation, this will involve creating intermediate strings. This will be very costly and involve a lot of gc. For example executing a function:
void empty(String string){
}
called with arguments such as
empty("Hello " + 42 + " this is a string " + count );
10^4 iterations of 100 such calls takes 10s. That is 10us/call, i.e. ~1000 times slower than just an empty call. It also produces huge amount of GC activity. The only way to avoid this is to manually inline the function, i.e. use the >>if<< statement instead of the debug function call. It's ugly but the only way to make it work.

Unless you call this from within a deeply nested loop, I wouldn't worry about it.

A good compiler removes the entire empty method, resulting in no overhead at all. I'm not sure if the Dalvik compiler already does this, but I suspect it's likely, at least since the arrival of the Just-in-time compiler with Froyo.
See also: Inline expansion

In terms of performance the overhead of generating the messages which get passed into the debug function are going to be a lot more serious since its likely they do memory allocations eg
Debug.debug(mymodule, "My error message" + myerrorcode);
Which will still occur even through the message is binned.
Unfortunately you really need the "if( Release.DEBUG ) " around the calls to this function rather than inside the function itself if your goal is performance, and you will see this in a lot of android code.

This is an interesting question and I like #misiu_mp analysis, so I thought I would update it with a 2016 test on a Nexus 7 running Android 6.0.1. Here is the test code:
public void runSpeedTest() {
long startTime;
long[] times = new long[100000];
long[] staticTimes = new long[100000];
for (int i = 0; i < times.length; i++) {
startTime = System.nanoTime();
for (int j = 0; j < 1000; j++) {
emptyMethod();
}
times[i] = (System.nanoTime() - startTime) / 1000;
startTime = System.nanoTime();
for (int j = 0; j < 1000; j++) {
emptyStaticMethod();
}
staticTimes[i] = (System.nanoTime() - startTime) / 1000;
}
int timesSum = 0;
for (int i = 0; i < times.length; i++) { timesSum += times[i]; Log.d("status", "time," + times[i]); sleep(); }
int timesStaticSum = 0;
for (int i = 0; i < times.length; i++) { timesStaticSum += staticTimes[i]; Log.d("status", "statictime," + staticTimes[i]); sleep(); }
sleep();
Log.d("status", "final speed = " + (timesSum / times.length));
Log.d("status", "final static speed = " + (timesStaticSum / times.length));
}
private void sleep() {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private void emptyMethod() { }
private static void emptyStaticMethod() { }
The sleep() was added to prevent overflowing the Log.d buffer.
I played around with it many times and the results were pretty consistent with #misiu_mp:
10^5 iterations of 1000 calls to an empty static void function: 29ns/call
10^5 iterations of 1000 calls to an empty non-static void function: 34ns/call
The static method call was always slightly faster than the non-static method call, but it would appear that a) the gap has closed significantly since Android 2.3.2 and b) there's still a cost to making calls to an empty method, static or not.
Looking at a histogram of times reveals something interesting, however. The majority of call, whether static or not, take between 30-40ns, and looking closely at the data they are virtually all 30ns exactly.
Running the same code with empty loops (commenting out the method calls) produces an average speed of 8ns, however, about 3/4 of the measured times are 0ns while the remainder are exactly 30ns.
I'm not sure how to account for this data, but I'm not sure that #misiu_mp's conclusions still hold. The difference between empty static and non-static methods is negligible, and the preponderance of measurements are exactly 30ns. That being said, it would appear that there is still some non-zero cost to running empty methods.

can array access be optimized?

Maybe I'm being misled by my profiler (Netbeans), but I'm seeing some odd behavior, hoping maybe someone here can help me understand it.
I am working on an application, which makes heavy use of rather large hash tables (keys are longs, values are objects). The performance with the built in java hash table (HashMap specifically) was very poor, and after trying some alternatives -- Trove, Fastutils, Colt, Carrot -- I started working on my own.
The code is very basic using a double hashing strategy. This works fine and good and shows the best performance of all the other options I've tried thus far.
The catch is, according to the profiler, lookups into the hash table are the single most expensive method in the entire application -- despite the fact that other methods are called many more times, and/or do a lot more logic.
What really confuses me is the lookups are called only by one class; the calling method does the lookup and processes the results. Both are called nearly the same number of times, and the method that calls the lookup has a lot of logic in it to handle the result of the lookup, but is about 100x faster.
Below is the code for the hash lookup. It's basically just two accesses into an array (the functions that compute the hash codes, according to profiling, are virtually free). I don't understand how this bit of code can be so slow since it is just array access, and I don't see any way of making it faster.
Note that the code simply returns the bucket matching the key, the caller is expected to process the bucket. 'size' is the hash.length/2, hash1 does lookups in the first half of the hash table, hash2 does lookups in the second half. key_index is a final int field on the hash table passed into the constructor, and the values array on the Entry objects is a small array of longs usually of length 10 or less.
Any thoughts people have on this are much appreciated.
Thanks.
public final Entry get(final long theKey) {
Entry aEntry = hash[hash1(theKey, size)];
if (aEntry != null && aEntry.values[key_index] != theKey) {
aEntry = hash[hash2(theKey, size)];
if (aEntry != null && aEntry.values[key_index] != theKey) {
return null;
}
}
return aEntry;
}
Edit, the code for hash1 & hash2
private static int hash1(final long key, final int hashTableSize) {
return (int)(key&(hashTableSize-1));
}
private static int hash2(final long key, final int hashTableSize) {
return (int)(hashTableSize+((key^(key>>3))&(hashTableSize-1)));
}

Nothing in your implementation strikes me as particularly inefficient. I'll admit I don't really follow your hashing/lookup strategy, but if you say it's performant in your circumstances, I'll believe you.
The only thing that I would expect might make some difference is to move the key out of the values array of Entry.
Instead of having this:
class Entry {
long[] values;
}
//...
if ( entry.values[key_index] == key ) { //...
Try this:
class Entry {
long key;
long values[];
}
//...
if ( entry.key == key ) { //...
Instead of incurring the cost of accessing a member, plus doing bounds checking, then getting a value of the array, you should just incur the cost of accessing the member.
Is there a random-access data type faster than an array?
I was interested in the answer to this question, so I set up a test environment. This is my Array interface:
interface Array {
long get(int i);
void set(int i, long v);
}
This "Array" has undefined behaviour when indices are out of bounds. I threw together the obvious implementation:
class NormalArray implements Array {
private long[] data;
public NormalArray(int size) {
data = new long[size];
}
#Override
public long get(int i) {
return data[i];
}
#Override
public void set(int i, long v) {
data[i] = v;
}
}
And then a control:
class NoOpArray implements Array {
#Override
public long get(int i) {
return 0;
}
#Override
public void set(int i, long v) {
}
}
Finally, I designed an "array" where the first 10 indices are hardcoded members. The members are set/selected through a switch:
class TenArray implements Array {
private long v0;
private long v1;
private long v2;
private long v3;
private long v4;
private long v5;
private long v6;
private long v7;
private long v8;
private long v9;
private long[] extras;
public TenArray(int size) {
if (size > 10) {
extras = new long[size - 10];
}
}
#Override
public long get(final int i) {
switch (i) {
case 0:
return v0;
case 1:
return v1;
case 2:
return v2;
case 3:
return v3;
case 4:
return v4;
case 5:
return v5;
case 6:
return v6;
case 7:
return v7;
case 8:
return v8;
case 9:
return v9;
default:
return extras[i - 10];
}
}
#Override
public void set(final int i, final long v) {
switch (i) {
case 0:
v0 = v; break;
case 1:
v1 = v; break;
case 2:
v2 = v; break;
case 3:
v3 = v; break;
case 4:
v4 = v; break;
case 5:
v5 = v; break;
case 6:
v6 = v; break;
case 7:
v7 = v; break;
case 8:
v8 = v; break;
case 9:
v9 = v; break;
default:
extras[i - 10] = v;
}
}
}
I tested it with this harness:
import java.util.Random;
public class ArrayOptimization {
public static void main(String[] args) {
int size = 10;
long[] data = new long[size];
Random r = new Random();
for ( int i = 0; i < data.length; i++ ) {
data[i] = r.nextLong();
}
Array[] a = new Array[] {
new NoOpArray(),
new NormalArray(size),
new TenArray(size)
};
for (;;) {
for ( int i = 0; i < a.length; i++ ) {
testSet(a[i], data, 10000000);
testGet(a[i], data, 10000000);
}
}
}
private static void testGet(Array a, long[] data, int iterations) {
long nanos = System.nanoTime();
for ( int i = 0; i < iterations; i++ ) {
for ( int j = 0; j < data.length; j++ ) {
data[j] = a.get(j);
}
}
long stop = System.nanoTime();
System.out.printf("%s/get took %fms%n", a.getClass().getName(),
(stop - nanos) / 1000000.0);
}
private static void testSet(Array a, long[] data, int iterations) {
long nanos = System.nanoTime();
for ( int i = 0; i < iterations; i++ ) {
for ( int j = 0; j < data.length; j++ ) {
a.set(j, data[j]);
}
}
long stop = System.nanoTime();
System.out.printf("%s/set took %fms%n", a.getClass().getName(),
(stop - nanos) / 1000000.0);
}
}
The results were somewhat surprising. The TenArray performs non-trivially faster than a NormalArray does (for sizes <= 10). Subtracting the overhead (using the NoOpArray average) you get TenArray as taking ~65% of the time of the normal array. So if you know the likely max size of your array, I suppose it is possible to exceed the speed of an array. I would imagine switch uses either less bounds checking or more efficient bounds checking than does an array.
NoOpArray/set took 953.272654ms
NoOpArray/get took 891.514622ms
NormalArray/set took 1235.694953ms
NormalArray/get took 1148.091061ms
TenArray/set took 1149.833109ms
TenArray/get took 1054.040459ms
NoOpArray/set took 948.458667ms
NoOpArray/get took 888.618223ms
NormalArray/set took 1232.554749ms
NormalArray/get took 1120.333771ms
TenArray/set took 1153.505578ms
TenArray/get took 1056.665337ms
NoOpArray/set took 955.812843ms
NoOpArray/get took 893.398847ms
NormalArray/set took 1237.358472ms
NormalArray/get took 1125.100537ms
TenArray/set took 1150.901231ms
TenArray/get took 1057.867936ms
Now whether you can in practice get speeds faster than an array I'm not sure; obviously this way you incur any overhead associated with the interface/class/methods.

Most likely you are partially misled in your interpretation of the profilers results. Profilers are notoriously overinflating the performance impact of small, frequently called methods. In your case, the profiling overhead for the get()-method is probably larger than the actual processing spent in the method itself. The situation is worsened further, since the instrumentation also interferes with the JIT's capability to inline methods.
As a rule of thumb for this situation - if the total processing time for a piece of work of known length increases more then two- to threefold when running under the profiler, the profiling overhead will give you skewed results.
To verify your changes actually do have impact, always measure performance improvements without the profiler, too. The profiler can hint you about bottlenecks, but it can also deceive you to look at places where nothing is wrong.
Array bounds checking can have a surprisingly large impact on performance (if you do comparably little else), but it can also be hard to clearly separate from general memory access penalties. In some trivial cases, the JIT might be able to eliminate them (there have been efforts towards bounds check elimination in Java 6), but this is AFAIK mostly limited to simple loop constructs like for(x=0; x<array.length; x++).
Under some circumstances you may be able to replace array access by simple member access, completely avoiding the bound checks, but its limited to the rare cases where you access you array exclusively by constant indices. I see no way to apply it to your problem.
The change suggested by Mark Peters is most likely not solely faster because it eliminates a bounds check, but also because it alters the locality properties of your data structures in a more cache friendly way.

Many profilers tell you very confusing things, partly because of how they work, and partly because people have funny ideas about performance to begin with.
For example, you're wondering about how many times functions are called, and you're looking at code and thinking it looks like a lot of logic, therefore slow.
There's a very simple way to think about this stuff, that makes it very easy to understand what's going on.
First of all, think in terms of the percent of time a routine or statement is active, rather than the number of times it is called or the average length of time it takes. The reason for that is it is relatively unaffected by irrelevant issues like competing processes or I/O, and it saves you having to multiply the number of calls by the average execution time and divide by the total time just to see if it is a big enough to even care about. Also, percent tells you, bottom line, how much fixing it could potentially reduce the overall execution time.
Second, what I mean by "active" is "on the stack", where the stack includes the currently running instruction and all the calls "above" it back to "call main". If a routine is responsible for 10% of the time, including routines that it calls, then during that time it is on the stack. The same is true of individual statements or even instructions. (Ignore "self time" or "exclusive time". It's a distraction.)
Profilers that put timers and counters on functions can only give you some of this information. Profilers that only sample the program counter tell you even less. What you need is something that samples the call stack and reports to you by line (not just by function) the percent of stack samples containing that line. It's also important that they sample the stack a) during I/O or other blockage, but b) not while waiting for user input.
There are profilers that can do this. I'm not sure about Java.
If you're still with me, let me throw out another ringer. You're looking for things you can optimize, right? and only things that have a large enough percent to be worth the trouble, like 10% or more? Such a line of code costing 10% is on the stack 10% of the time. That means if 20,000 samples are taken, it is on about 2,000 of them. If 20 samples are taken, it is on about 2 of them, on average. Now, you're trying to find the line, right? Does it really matter if the percent is off a little bit, as long as you find it? That's another one of those happy myths of profilers - that precision of timing matters. For finding problems worth fixing, 20,000 samples won't tell you much more than 20 samples will.
So what do I do? Just take the samples by hand and study them. Code worth optimizing will simply jump out at me.
Finally, there's a big gob of good news. There are probably multiple things you could optimize. Suppose you fix a 20% problem and make it go away. Overall time shrinks to 4/5 of what it was, but the other problems aren't taking any less time, so now their percentage is 5/4 of what it was, because the denominator got smaller. Percentage-wise they got bigger, and easier to find. This effect snowballs, allowing you to really squeeze the code.

You could try using a memoizing or caching strategy to reduce the number of actual calls. Another thing you could try if you're very desperate is a native array, since indexing those is unbelievably fast, and JNI shouldn't invoke toooo much overhead if you're using parameters like longs that don't require marshalling.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.