I've been working on a project to write a recursive function in assembly, where it will calculate the Fibonacci number. To start off I made it in Java code:
public class Fibonacci {
public static int fibonacci(int n)
{
if(n <= 1)
return n;
return fibonacci(n-1) + fibonacci(n-2);
}
This recursive function worked perfectly fine. Although when trying to implement it in assembly code I didn't get the result I expected. After troubleshooting a while, I wrote (roughly) the equivalent code in Java:
static int n;
static int rec1;
static int result;
public static int asmFibonacci(int n)
{
if(n <= 1) {
result = n;
return 0;
}
n = n-1;
asmFibonacci(n);
rec1 = result;
n = n-1;
asmFibonacci(n);
result = rec1 + result;
return 0;
}
This function gets the same wrong result as the one I had in assembly code, although I still don't understand why, what am I missing? Both functions recur the same amount of times.
Results
asmFibonacci
0
1
1
2
2
3
3
4
4
5
Fibonacci
0
1
1
2
3
5
8
13
21
34
Any help would be much appreciated.
Update
After pushing rec1(R1) onto the stack as well, I got the Fibonacci subroutine in assembly to work as expected.
main
LDR R0, = 12
LDR R1, = 0
LDR R2, = 0
BL Fibonacci
STOP B STOP
Fibonacci
PUSH {LR, R0-R1}
CMP R0, #1
BLE RETURN
ADD R0, #-1
BL Fibonacci
MOV R1, R2
ADD R0, #-1
BL Fibonacci
ADD R0, R1, R2
RETURN
MOV R2, R0
POP {R0-R1, LR}
BX LR
END
Proper recursive code won't use static storage; it's not re-entrant therefore not viable for recursion.
"re-entrant" means that it can be called while you're in the middle of another call, e.g. that evaluating Fib(3) while you're in the middle of Fib(5) doesn't mess up any variables that Fib(5) is going to want to re-read later. Such as static int rec1;.
Use only local variables.
In asm, that means stack space or call-preserved registers. (Using call-preserved registers means preserving the caller's value, again on the stack).
Also note that static int n is shadowed by your int n function arg (the way you wrote it in Java at least), so you avoided the bug of trying to use static n! static int rec2 is useless because you don't need to save it across anything.
Also, static int result; is total insanity. Recursive functions return their result, not just produce a side-effect on a global / static var!
In asm, get used to using registers; not everything should get stored and reloaded from static storage with named labels. Even a debug-mode C compiler wouldn't do that (it would use stack space for locals)
Note that naive double-recursive Fibonacci only needs one total int-sized space for saving something across a call. Across the first call, you save n. Across the second call, you need to save the result of the first call, but you're done with n. You can recycle the same call-preserved register for that.
Recursive Fibonacci is total garbage for performance, of course, and only useful as an exercise in recursion. Approximately O(Fib(n)) vs. O(n) performance for simple iterative repeating x += y; SWAP(x,y) or x+=y ; y+=x; See Assembly Language (x86): How to create a loop to calculate Fibonacci sequence for efficient loops.
Why can't do you this if you try to find out whether an int is between to numbers:
if(10 < x < 20)
Instead of it, you'll have to do
if(10<x && x<20)
which seems like a bit of overhead.
One problem is that a ternary relational construct would introduce serious parser problems:
<expr> ::= <expr> <rel-op> <expr> |
... |
<expr> <rel-op> <expr> <rel-op> <expr>
When you try to express a grammar with those productions using a typical PGS, you'll find that there is a shift-reduce conflict at the point of the first <rel-op>. The parse needs to lookahead an arbitrary number of symbols to see if there is a second <rel-op> before it can decide whether the binary or ternary form has been used. In this case, you could not simply ignore the conflict because that would result in incorrect parses.
I'm not saying that this grammar is fatally ambiguous. But I think you'd need a backtracking parser to deal with it correctly. And that is a serious problem for a programming language where fast compilation is a major selling point.
Because that syntax simply isn't defined? Besides, x < y evaluates as a bool, so what does bool < int mean? It isn't really an overhead; besides, you could write a utility method if you really want - isBetween(10,x,20) - I wouldn't myself, but hey...
It's just the syntax. '<' is a binary operation, and most languages don't make it transitive. They could have made it like the way you say, but then somebody would be asking why you can't do other operations in trinary as well. "if (12 < x != 5)"?
Syntax is always a trade-off between complexity, expressiveness and readability. Different language designers make different choices. For instance, SQL has "x BETWEEN y AND z", where x, y, and z can individually or all be columns, constants, or bound variables. And I'm happy to use it in SQL, and I'm equally happy not to worry about why it's not in Java.
You could make your own
public static boolean isBetween(int a, int b, int c) {
return b > a ? c > a && c < b : c > b && c < a;
}
Edit: sorry checks if c is between a and b
The inconvenience of typing 10 < x && x < 20 is minimal compared to the increase in language complexity if one would allow 10 < x < 20, so the designers of the Java language decided against supporting it.
COBOL allows that (I am sure some other languages do as well). Java inherited most of it's syntax from C which doesn't allow it.
You are human, and therefore you understand what the term "10 < x < 20" suppose to mean.
The computer doesn't have this intuition, so it reads it as:
"(10 < x) < 20".
For example, if x = 15, it will calculate:
(10 < x) => TRUE
"TRUE < 20" => ???
In C programming, it will be worse, since there are no True\False values.
If x = 5, the calculation will be:
10 < x => 0 (the value of False)
0 < 20 => non-0 number (True)
and therefore "10 < 5 < 20" will return True! :S
simplifying:
a = 10; b = 15; c = 20
public static boolean check(int a, int b, int c) {
return a<=b && b<=c;
}
This checks if b is between a and c
Because the < operator (and most others) are binary operators (they take two arguments), and (true true) is not a valid boolean expression.
The Java language designers could have designed the language to allow syntax like the type you prefer, but (I'm guessing) they decided that it was not worth the more complex parsing rules.
One can use Range class from the Guava library:
Range.open(10, 20).contains(n)
Apache Commons Lang has a similar class as well.
if (10 < x || x < 20)
This statement will evaluate true for numbers between 10 and 20.
This is a rough equivalent to 10 < x < 20
In this code:
if (value >= x && value <= y) {
when value >= x and value <= y are as likely true as false with no particular pattern, would using the & operator be faster than using &&?
Specifically, I am thinking about how && lazily evaluates the right-hand-side expression (ie only if the LHS is true), which implies a conditional, whereas in Java & in this context guarantees strict evaluation of both (boolean) sub-expressions. The value result is the same either way.
But whilst a >= or <= operator will use a simple comparison instruction, the && must involve a branch, and that branch is susceptible to branch prediction failure - as per this Very Famous Question: Why is it faster to process a sorted array than an unsorted array?
So, forcing the expression to have no lazy components will surely be more deterministic and not be vulnerable to prediction failure. Right?
Notes:
obviously the answer to my question would be No if the code looked like this: if(value >= x && verySlowFunction()). I am focusing on "sufficiently simple" RHS expressions.
there's a conditional branch in there anyway (the if statement). I can't quite prove to myself that that is irrelevant, and that alternative formulations might be better examples, like boolean b = value >= x && value <= y;
this all falls into the world of horrendous micro-optimizations. Yeah, I know :-) ... interesting though?
Update
Just to explain why I'm interested: I've been staring at the systems that Martin Thompson has been writing about on his Mechanical Sympathy blog, after he came and did a talk about Aeron. One of the key messages is that our hardware has all this magical stuff in it, and we software developers tragically fail to take advantage of it. Don't worry, I'm not about to go s/&&/\&/ on all my code :-) ... but there are a number of questions on this site on improving branch prediction by removing branches, and it occurred to me that the conditional boolean operators are at the core of test conditions.
Of course, #StephenC makes the fantastic point that bending your code into weird shapes can make it less easy for JITs to spot common optimizations - if not now, then in the future. And that the Very Famous Question mentioned above is special because it pushes the prediction complexity far beyond practical optimization.
I'm pretty much aware that in most (or almost all) situations, && is the clearest, simplest, fastest, best thing to do - although I'm very grateful to the people who have posted answers demonstrating this! I'm really interested to see if there are actually any cases in anyone's experience where the answer to "Can & be faster?" might be Yes...
Update 2:
(Addressing advice that the question is overly broad. I don't want to make major changes to this question because it might compromise some of the answers below, which are of exceptional quality!) Perhaps an example in the wild is called for; this is from the Guava LongMath class (thanks hugely to #maaartinus for finding this):
public static boolean isPowerOfTwo(long x) {
return x > 0 & (x & (x - 1)) == 0;
}
See that first &? And if you check the link, the next method is called lessThanBranchFree(...), which hints that we are in branch-avoidance territory - and Guava is really widely used: every cycle saved causes sea-levels to drop visibly. So let's put the question this way: is this use of & (where && would be more normal) a real optimization?
Ok, so you want to know how it behaves at the lower level... Let's have a look at the bytecode then!
EDIT : added the generated assembly code for AMD64, at the end. Have a look for some interesting notes.
EDIT 2 (re: OP's "Update 2"): added asm code for Guava's isPowerOfTwo method as well.
Java source
I wrote these two quick methods:
public boolean AndSC(int x, int value, int y) {
return value >= x && value <= y;
}
public boolean AndNonSC(int x, int value, int y) {
return value >= x & value <= y;
}
As you can see, they are exactly the same, save for the type of AND operator.
Java bytecode
And this is the generated bytecode:
public AndSC(III)Z
L0
LINENUMBER 8 L0
ILOAD 2
ILOAD 1
IF_ICMPLT L1
ILOAD 2
ILOAD 3
IF_ICMPGT L1
L2
LINENUMBER 9 L2
ICONST_1
IRETURN
L1
LINENUMBER 11 L1
FRAME SAME
ICONST_0
IRETURN
L3
LOCALVARIABLE this Ltest/lsoto/AndTest; L0 L3 0
LOCALVARIABLE x I L0 L3 1
LOCALVARIABLE value I L0 L3 2
LOCALVARIABLE y I L0 L3 3
MAXSTACK = 2
MAXLOCALS = 4
// access flags 0x1
public AndNonSC(III)Z
L0
LINENUMBER 15 L0
ILOAD 2
ILOAD 1
IF_ICMPLT L1
ICONST_1
GOTO L2
L1
FRAME SAME
ICONST_0
L2
FRAME SAME1 I
ILOAD 2
ILOAD 3
IF_ICMPGT L3
ICONST_1
GOTO L4
L3
FRAME SAME1 I
ICONST_0
L4
FRAME FULL [test/lsoto/AndTest I I I] [I I]
IAND
IFEQ L5
L6
LINENUMBER 16 L6
ICONST_1
IRETURN
L5
LINENUMBER 18 L5
FRAME SAME
ICONST_0
IRETURN
L7
LOCALVARIABLE this Ltest/lsoto/AndTest; L0 L7 0
LOCALVARIABLE x I L0 L7 1
LOCALVARIABLE value I L0 L7 2
LOCALVARIABLE y I L0 L7 3
MAXSTACK = 3
MAXLOCALS = 4
The AndSC (&&) method generates two conditional jumps, as expected:
It loads value and x onto the stack, and jumps to L1 if value is lower. Else it keeps running the next lines.
It loads value and y onto the stack, and jumps to L1 also, if value is greater. Else it keeps running the next lines.
Which happen to be a return true in case none of the two jumps were made.
And then we have the lines marked as L1 which are a return false.
The AndNonSC (&) method, however, generates three conditional jumps!
It loads value and x onto the stack and jumps to L1 if value is lower. Because now it needs to save the result to compare it with the other part of the AND, so it has to execute either "save true" or "save false", it can't do both with the same instruction.
It loads value and y onto the stack and jumps to L1 if value is greater. Once again it needs to save true or false and that's two different lines depending on the comparison result.
Now that both comparisons are done, the code actually executes the AND operation -- and if both are true, it jumps (for a third time) to return true; or else it continues execution onto the next line to return false.
(Preliminary) Conclusion
Though I'm not that very much experienced with Java bytecode and I may have overlooked something, it seems to me that & will actually perform worse than && in every case: it generates more instructions to execute, including more conditional jumps to predict and possibly fail at.
A rewriting of the code to replace comparisons with arithmetical operations, as someone else proposed, might be a way to make & a better option, but at the cost of making the code much less clear.
IMHO it is not worth the hassle for 99% of the scenarios (it may be very well worth it for the 1% loops that need to be extremely optimized, though).
EDIT: AMD64 assembly
As noted in the comments, the same Java bytecode can lead to different machine code in different systems, so while the Java bytecode might give us a hint about which AND version performs better, getting the actual ASM as generated by the compiler is the only way to really find out.
I printed the AMD64 ASM instructions for both methods; below are the relevant lines (stripped entry points etc.).
NOTE: all methods compiled with java 1.8.0_91 unless otherwise stated.
Method AndSC with default options
# {method} {0x0000000016da0810} 'AndSC' '(III)Z' in 'AndTest'
...
0x0000000002923e3e: cmp %r8d,%r9d
0x0000000002923e41: movabs $0x16da0a08,%rax ; {metadata(method data for {method} {0x0000000016da0810} 'AndSC' '(III)Z' in 'AndTest')}
0x0000000002923e4b: movabs $0x108,%rsi
0x0000000002923e55: jl 0x0000000002923e65
0x0000000002923e5b: movabs $0x118,%rsi
0x0000000002923e65: mov (%rax,%rsi,1),%rbx
0x0000000002923e69: lea 0x1(%rbx),%rbx
0x0000000002923e6d: mov %rbx,(%rax,%rsi,1)
0x0000000002923e71: jl 0x0000000002923eb0 ;*if_icmplt
; - AndTest::AndSC#2 (line 22)
0x0000000002923e77: cmp %edi,%r9d
0x0000000002923e7a: movabs $0x16da0a08,%rax ; {metadata(method data for {method} {0x0000000016da0810} 'AndSC' '(III)Z' in 'AndTest')}
0x0000000002923e84: movabs $0x128,%rsi
0x0000000002923e8e: jg 0x0000000002923e9e
0x0000000002923e94: movabs $0x138,%rsi
0x0000000002923e9e: mov (%rax,%rsi,1),%rdi
0x0000000002923ea2: lea 0x1(%rdi),%rdi
0x0000000002923ea6: mov %rdi,(%rax,%rsi,1)
0x0000000002923eaa: jle 0x0000000002923ec1 ;*if_icmpgt
; - AndTest::AndSC#7 (line 22)
0x0000000002923eb0: mov $0x0,%eax
0x0000000002923eb5: add $0x30,%rsp
0x0000000002923eb9: pop %rbp
0x0000000002923eba: test %eax,-0x1c73dc0(%rip) # 0x0000000000cb0100
; {poll_return}
0x0000000002923ec0: retq ;*ireturn
; - AndTest::AndSC#13 (line 25)
0x0000000002923ec1: mov $0x1,%eax
0x0000000002923ec6: add $0x30,%rsp
0x0000000002923eca: pop %rbp
0x0000000002923ecb: test %eax,-0x1c73dd1(%rip) # 0x0000000000cb0100
; {poll_return}
0x0000000002923ed1: retq
Method AndSC with -XX:PrintAssemblyOptions=intel option
# {method} {0x00000000170a0810} 'AndSC' '(III)Z' in 'AndTest'
...
0x0000000002c26e2c: cmp r9d,r8d
0x0000000002c26e2f: jl 0x0000000002c26e36 ;*if_icmplt
0x0000000002c26e31: cmp r9d,edi
0x0000000002c26e34: jle 0x0000000002c26e44 ;*iconst_0
0x0000000002c26e36: xor eax,eax ;*synchronization entry
0x0000000002c26e38: add rsp,0x10
0x0000000002c26e3c: pop rbp
0x0000000002c26e3d: test DWORD PTR [rip+0xffffffffffce91bd],eax # 0x0000000002910000
0x0000000002c26e43: ret
0x0000000002c26e44: mov eax,0x1
0x0000000002c26e49: jmp 0x0000000002c26e38
Method AndNonSC with default options
# {method} {0x0000000016da0908} 'AndNonSC' '(III)Z' in 'AndTest'
...
0x0000000002923a78: cmp %r8d,%r9d
0x0000000002923a7b: mov $0x0,%eax
0x0000000002923a80: jl 0x0000000002923a8b
0x0000000002923a86: mov $0x1,%eax
0x0000000002923a8b: cmp %edi,%r9d
0x0000000002923a8e: mov $0x0,%esi
0x0000000002923a93: jg 0x0000000002923a9e
0x0000000002923a99: mov $0x1,%esi
0x0000000002923a9e: and %rsi,%rax
0x0000000002923aa1: cmp $0x0,%eax
0x0000000002923aa4: je 0x0000000002923abb ;*ifeq
; - AndTest::AndNonSC#21 (line 29)
0x0000000002923aaa: mov $0x1,%eax
0x0000000002923aaf: add $0x30,%rsp
0x0000000002923ab3: pop %rbp
0x0000000002923ab4: test %eax,-0x1c739ba(%rip) # 0x0000000000cb0100
; {poll_return}
0x0000000002923aba: retq ;*ireturn
; - AndTest::AndNonSC#25 (line 30)
0x0000000002923abb: mov $0x0,%eax
0x0000000002923ac0: add $0x30,%rsp
0x0000000002923ac4: pop %rbp
0x0000000002923ac5: test %eax,-0x1c739cb(%rip) # 0x0000000000cb0100
; {poll_return}
0x0000000002923acb: retq
Method AndNonSC with -XX:PrintAssemblyOptions=intel option
# {method} {0x00000000170a0908} 'AndNonSC' '(III)Z' in 'AndTest'
...
0x0000000002c270b5: cmp r9d,r8d
0x0000000002c270b8: jl 0x0000000002c270df ;*if_icmplt
0x0000000002c270ba: mov r8d,0x1 ;*iload_2
0x0000000002c270c0: cmp r9d,edi
0x0000000002c270c3: cmovg r11d,r10d
0x0000000002c270c7: and r8d,r11d
0x0000000002c270ca: test r8d,r8d
0x0000000002c270cd: setne al
0x0000000002c270d0: movzx eax,al
0x0000000002c270d3: add rsp,0x10
0x0000000002c270d7: pop rbp
0x0000000002c270d8: test DWORD PTR [rip+0xffffffffffce8f22],eax # 0x0000000002910000
0x0000000002c270de: ret
0x0000000002c270df: xor r8d,r8d
0x0000000002c270e2: jmp 0x0000000002c270c0
First of all, the generated ASM code differs depending on whether we choose the default AT&T syntax or the Intel syntax.
With AT&T syntax:
The ASM code is actually longer for the AndSC method, with every bytecode IF_ICMP* translated to two assembly jump instructions, for a total of 4 conditional jumps.
Meanwhile, for the AndNonSC method the compiler generates a more straight-forward code, where each bytecode IF_ICMP* is translated to only one assembly jump instruction, keeping the original count of 3 conditional jumps.
With Intel syntax:
The ASM code for AndSC is shorter, with just 2 conditional jumps (not counting the non-conditional jmp at the end). Actually it's just two CMP, two JL/E and a XOR/MOV depending on the result.
The ASM code for AndNonSC is now longer than the AndSC one! However, it has just 1 conditional jump (for the first comparison), using the registers to directly compare the first result with the second, without any more jumps.
Conclusion after ASM code analysis
At AMD64 machine-language level, the & operator seems to generate ASM code with fewer conditional jumps, which might be better for high prediction-failure rates (random values for example).
On the other hand, the && operator seems to generate ASM code with fewer instructions (with the -XX:PrintAssemblyOptions=intel option anyway), which might be better for really long loops with prediction-friendly inputs, where the fewer number of CPU cycles for each comparison can make a difference in the long run.
As I stated in some of the comments, this is going to vary greatly between systems, so if we're talking about branch-prediction optimization, the only real answer would be: it depends on your JVM implementation, your compiler, your CPU and your input data.
Addendum: Guava's isPowerOfTwo method
Here, Guava's developers have come up with a neat way of calculating if a given number is a power of 2:
public static boolean isPowerOfTwo(long x) {
return x > 0 & (x & (x - 1)) == 0;
}
Quoting OP:
is this use of & (where && would be more normal) a real optimization?
To find out if it is, I added two similar methods to my test class:
public boolean isPowerOfTwoAND(long x) {
return x > 0 & (x & (x - 1)) == 0;
}
public boolean isPowerOfTwoANDAND(long x) {
return x > 0 && (x & (x - 1)) == 0;
}
Intel's ASM code for Guava's version
# {method} {0x0000000017580af0} 'isPowerOfTwoAND' '(J)Z' in 'AndTest'
# this: rdx:rdx = 'AndTest'
# parm0: r8:r8 = long
...
0x0000000003103bbe: movabs rax,0x0
0x0000000003103bc8: cmp rax,r8
0x0000000003103bcb: movabs rax,0x175811f0 ; {metadata(method data for {method} {0x0000000017580af0} 'isPowerOfTwoAND' '(J)Z' in 'AndTest')}
0x0000000003103bd5: movabs rsi,0x108
0x0000000003103bdf: jge 0x0000000003103bef
0x0000000003103be5: movabs rsi,0x118
0x0000000003103bef: mov rdi,QWORD PTR [rax+rsi*1]
0x0000000003103bf3: lea rdi,[rdi+0x1]
0x0000000003103bf7: mov QWORD PTR [rax+rsi*1],rdi
0x0000000003103bfb: jge 0x0000000003103c1b ;*lcmp
0x0000000003103c01: movabs rax,0x175811f0 ; {metadata(method data for {method} {0x0000000017580af0} 'isPowerOfTwoAND' '(J)Z' in 'AndTest')}
0x0000000003103c0b: inc DWORD PTR [rax+0x128]
0x0000000003103c11: mov eax,0x1
0x0000000003103c16: jmp 0x0000000003103c20 ;*goto
0x0000000003103c1b: mov eax,0x0 ;*lload_1
0x0000000003103c20: mov rsi,r8
0x0000000003103c23: movabs r10,0x1
0x0000000003103c2d: sub rsi,r10
0x0000000003103c30: and rsi,r8
0x0000000003103c33: movabs rdi,0x0
0x0000000003103c3d: cmp rsi,rdi
0x0000000003103c40: movabs rsi,0x175811f0 ; {metadata(method data for {method} {0x0000000017580af0} 'isPowerOfTwoAND' '(J)Z' in 'AndTest')}
0x0000000003103c4a: movabs rdi,0x140
0x0000000003103c54: jne 0x0000000003103c64
0x0000000003103c5a: movabs rdi,0x150
0x0000000003103c64: mov rbx,QWORD PTR [rsi+rdi*1]
0x0000000003103c68: lea rbx,[rbx+0x1]
0x0000000003103c6c: mov QWORD PTR [rsi+rdi*1],rbx
0x0000000003103c70: jne 0x0000000003103c90 ;*lcmp
0x0000000003103c76: movabs rsi,0x175811f0 ; {metadata(method data for {method} {0x0000000017580af0} 'isPowerOfTwoAND' '(J)Z' in 'AndTest')}
0x0000000003103c80: inc DWORD PTR [rsi+0x160]
0x0000000003103c86: mov esi,0x1
0x0000000003103c8b: jmp 0x0000000003103c95 ;*goto
0x0000000003103c90: mov esi,0x0 ;*iand
0x0000000003103c95: and rsi,rax
0x0000000003103c98: and esi,0x1
0x0000000003103c9b: mov rax,rsi
0x0000000003103c9e: add rsp,0x50
0x0000000003103ca2: pop rbp
0x0000000003103ca3: test DWORD PTR [rip+0xfffffffffe44c457],eax # 0x0000000001550100
0x0000000003103ca9: ret
Intel's asm code for && version
# {method} {0x0000000017580bd0} 'isPowerOfTwoANDAND' '(J)Z' in 'AndTest'
# this: rdx:rdx = 'AndTest'
# parm0: r8:r8 = long
...
0x0000000003103438: movabs rax,0x0
0x0000000003103442: cmp rax,r8
0x0000000003103445: jge 0x0000000003103471 ;*lcmp
0x000000000310344b: mov rax,r8
0x000000000310344e: movabs r10,0x1
0x0000000003103458: sub rax,r10
0x000000000310345b: and rax,r8
0x000000000310345e: movabs rsi,0x0
0x0000000003103468: cmp rax,rsi
0x000000000310346b: je 0x000000000310347b ;*lcmp
0x0000000003103471: mov eax,0x0
0x0000000003103476: jmp 0x0000000003103480 ;*ireturn
0x000000000310347b: mov eax,0x1 ;*goto
0x0000000003103480: and eax,0x1
0x0000000003103483: add rsp,0x40
0x0000000003103487: pop rbp
0x0000000003103488: test DWORD PTR [rip+0xfffffffffe44cc72],eax # 0x0000000001550100
0x000000000310348e: ret
In this specific example, the JIT compiler generates far less assembly code for the && version than for Guava's & version (and, after yesterday's results, I was honestly surprised by this).
Compared to Guava's, the && version translates to 25% less bytecode for JIT to compile, 50% less assembly instructions, and only two conditional jumps (the & version has four of them).
So everything points to Guava's & method being less efficient than the more "natural" && version.
... Or is it?
As noted before, I'm running the above examples with Java 8:
C:\....>java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
But what if I switch to Java 7?
C:\....>c:\jdk1.7.0_79\bin\java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
C:\....>c:\jdk1.7.0_79\bin\java -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*AndTest.isPowerOfTwoAND -XX:PrintAssemblyOptions=intel AndTestMain
.....
0x0000000002512bac: xor r10d,r10d
0x0000000002512baf: mov r11d,0x1
0x0000000002512bb5: test r8,r8
0x0000000002512bb8: jle 0x0000000002512bde ;*ifle
0x0000000002512bba: mov eax,0x1 ;*lload_1
0x0000000002512bbf: mov r9,r8
0x0000000002512bc2: dec r9
0x0000000002512bc5: and r9,r8
0x0000000002512bc8: test r9,r9
0x0000000002512bcb: cmovne r11d,r10d
0x0000000002512bcf: and eax,r11d ;*iand
0x0000000002512bd2: add rsp,0x10
0x0000000002512bd6: pop rbp
0x0000000002512bd7: test DWORD PTR [rip+0xffffffffffc0d423],eax # 0x0000000002120000
0x0000000002512bdd: ret
0x0000000002512bde: xor eax,eax
0x0000000002512be0: jmp 0x0000000002512bbf
.....
Surprise! The assembly code generated for the & method by the JIT compiler in Java 7, has only one conditional jump now, and is way shorter! Whereas the && method (you'll have to trust me on this one, I don't want to clutter the ending!) remains about the same, with its two conditional jumps and a couple less instructions, tops.
Looks like Guava's engineers knew what they were doing, after all! (if they were trying to optimize Java 7 execution time, that is ;-)
So back to OP's latest question:
is this use of & (where && would be more normal) a real optimization?
And IMHO the answer is the same, even for this (very!) specific scenario: it depends on your JVM implementation, your compiler, your CPU and your input data.
For those kind of questions you should run a microbenchmark. I used JMH for this test.
The benchmarks are implemented as
// boolean logical AND
bh.consume(value >= x & y <= value);
and
// conditional AND
bh.consume(value >= x && y <= value);
and
// bitwise OR, as suggested by Joop Eggen
bh.consume(((value - x) | (y - value)) >= 0)
With values for value, x and y according to the benchmark name.
The result (five warmup and ten measurement iterations) for throughput benchmarking is:
Benchmark Mode Cnt Score Error Units
Benchmark.isBooleanANDBelowRange thrpt 10 386.086 ▒ 17.383 ops/us
Benchmark.isBooleanANDInRange thrpt 10 387.240 ▒ 7.657 ops/us
Benchmark.isBooleanANDOverRange thrpt 10 381.847 ▒ 15.295 ops/us
Benchmark.isBitwiseORBelowRange thrpt 10 384.877 ▒ 11.766 ops/us
Benchmark.isBitwiseORInRange thrpt 10 380.743 ▒ 15.042 ops/us
Benchmark.isBitwiseOROverRange thrpt 10 383.524 ▒ 16.911 ops/us
Benchmark.isConditionalANDBelowRange thrpt 10 385.190 ▒ 19.600 ops/us
Benchmark.isConditionalANDInRange thrpt 10 384.094 ▒ 15.417 ops/us
Benchmark.isConditionalANDOverRange thrpt 10 380.913 ▒ 5.537 ops/us
The result is not that different for the evaluation itself. As long no perfomance impact is spotted on that piece of code I would not try to optimize it. Depending on the place in the code the hotspot compiler might decide to do some optimization. Which probably is not covered by the above benchmarks.
some references:
boolean logical AND - the result value is true if both operand values are true; otherwise, the result is false
conditional AND - is like &, but evaluates its right-hand operand only if the value of its left-hand operand is true
bitwise OR - the result value is the bitwise inclusive OR of the operand values
I'm going to come at this from a different angle.
Consider these two code fragments,
if (value >= x && value <= y) {
and
if (value >= x & value <= y) {
If we assume that value, x, y have a primitive type, then those two (partial) statements will give the same outcome for all possible input values. (If wrapper types are involved, then they are not exactly equivalent because of an implicit null test for y that might fail in the & version and not the && version.)
If the JIT compiler is doing a good job, its optimizer will be able to deduce that those two statements do the same thing:
If one is predictably faster than the other, then it should be able to use the faster version ... in the JIT compiled code.
If not, then it doesn't matter which version is used at the source code level.
Since the JIT compiler gathers path statistics before compiling, it can potentially have more information about the execution characteristics that the programmer(!).
If the current generation JIT compiler (on any given platform) doesn't optimize well enough to handle this, the next generation could well do ... depending on whether or not empirical evidence points to this being a worthwhile pattern to optimize.
Indeed, if you write you Java code in a way that optimizes for this, there is a chance that by picking the more "obscure" version of the code, you might inhibit the current or future JIT compiler's ability to optimize.
In short, I don't think you should do this kind of micro-optimization at the source code level. And if you accept this argument1, and follow it to its logical conclusion, the question of which version is faster is ... moot2.
1 - I do not claim this is anywhere near being a proof.
2 - Unless you are one of the tiny community of people who actually write Java JIT compilers ...
The "Very Famous Question" is interesting in two respects:
On the one hand, that is an example where the kind of optimization required to make a difference is way beyond the capability of a JIT compiler.
On the other hand, it would not necessarily be the correct thing to sort the array ... just because a sorted array can be processed faster. The cost of sorting the array, could well be (much) greater than the saving.
Using either & or && still requires a condition to be evaluated so it's unlikely it will save any processing time - it might even add to it considering you're evaluating both expressions when you only need to evaluate one.
Using & over && to save a nanosecond if that in some very rare situations is pointless, you've already wasted more time contemplating the difference than you would've saved using & over &&.
Edit
I got curious and decided to run some bench marks.
I made this class:
public class Main {
static int x = 22, y = 48;
public static void main(String[] args) {
runWithOneAnd(30);
runWithTwoAnds(30);
}
static void runWithOneAnd(int value){
if(value >= x & value <= y){
}
}
static void runWithTwoAnds(int value){
if(value >= x && value <= y){
}
}
}
and ran some profiling tests with NetBeans. I didn't use any print statements to save processing time, just know both evaluate to true.
First test:
Second test:
Third test:
As you can see by the profiling tests, using only one & actually takes 2-3 times longer to run compared to using two &&. This does strike as some what odd as i did expect better performance from only one &.
I'm not 100% sure why. In both cases, both expressions have to be evaluated because both are true. I suspect that the JVM does some special optimization behind the scenes to speed it up.
Moral of the story: convention is good and premature optimization is bad.
Edit 2
I redid the benchmark code with #SvetlinZarev's comments in mind and a few other improvements. Here is the modified benchmark code:
public class Main {
static int x = 22, y = 48;
public static void main(String[] args) {
oneAndBothTrue();
oneAndOneTrue();
oneAndBothFalse();
twoAndsBothTrue();
twoAndsOneTrue();
twoAndsBothFalse();
System.out.println(b);
}
static void oneAndBothTrue() {
int value = 30;
for (int i = 0; i < 2000; i++) {
if (value >= x & value <= y) {
doSomething();
}
}
}
static void oneAndOneTrue() {
int value = 60;
for (int i = 0; i < 4000; i++) {
if (value >= x & value <= y) {
doSomething();
}
}
}
static void oneAndBothFalse() {
int value = 100;
for (int i = 0; i < 4000; i++) {
if (value >= x & value <= y) {
doSomething();
}
}
}
static void twoAndsBothTrue() {
int value = 30;
for (int i = 0; i < 4000; i++) {
if (value >= x & value <= y) {
doSomething();
}
}
}
static void twoAndsOneTrue() {
int value = 60;
for (int i = 0; i < 4000; i++) {
if (value >= x & value <= y) {
doSomething();
}
}
}
static void twoAndsBothFalse() {
int value = 100;
for (int i = 0; i < 4000; i++) {
if (value >= x & value <= y) {
doSomething();
}
}
}
//I wanted to avoid print statements here as they can
//affect the benchmark results.
static StringBuilder b = new StringBuilder();
static int times = 0;
static void doSomething(){
times++;
b.append("I have run ").append(times).append(" times \n");
}
}
And here are the performance tests:
Test 1:
Test 2:
Test 3:
This takes into account different values and different conditions as well.
Using one & takes more time to run when both conditions are true, about 60% or 2 milliseconds more time. When either one or both conditions are false, then one & runs faster, but it only runs about 0.30-0.50 milliseconds faster. So & will run faster than && in most circumstances, but the performance difference is still negligible.
What you are after is something like this:
x <= value & value <= y
value - x >= 0 & y - value >= 0
((value - x) | (y - value)) >= 0 // integer bit-or
Interesting, one would almost like to look at the byte code.
But hard to say. I wish this were a C question.
I was curious to the answer as well, so I wrote the following (simple) test for this:
private static final int max = 80000;
private static final int size = 100000;
private static final int x = 1500;
private static final int y = 15000;
private Random random;
#Before
public void setUp() {
this.random = new Random();
}
#After
public void tearDown() {
random = null;
}
#Test
public void testSingleOperand() {
int counter = 0;
int[] numbers = new int[size];
for (int j = 0; j < size; j++) {
numbers[j] = random.nextInt(max);
}
long start = System.nanoTime(); //start measuring after an array has been filled
for (int i = 0; i < numbers.length; i++) {
if (numbers[i] >= x & numbers[i] <= y) {
counter++;
}
}
long end = System.nanoTime();
System.out.println("Duration of single operand: " + (end - start));
}
#Test
public void testDoubleOperand() {
int counter = 0;
int[] numbers = new int[size];
for (int j = 0; j < size; j++) {
numbers[j] = random.nextInt(max);
}
long start = System.nanoTime(); //start measuring after an array has been filled
for (int i = 0; i < numbers.length; i++) {
if (numbers[i] >= x & numbers[i] <= y) {
counter++;
}
}
long end = System.nanoTime();
System.out.println("Duration of double operand: " + (end - start));
}
With the end result being that the comparison with && always wins in terms of speed, being about 1.5/2 milliseconds quicker than &.
EDIT:
As #SvetlinZarev pointed out, I was also measuring the time it took Random to get an integer. Changed it to use a pre-filled array of random numbers, which caused the duration of the single operand test to wildly fluctuate; the differences between several runs were up to 6-7ms.
The way this was explained to me, is that && will return false if the first check in a series is false, while & checks all items in a series regardless of how many are false. I.E.
if (x>0 && x <=10 && x
Will run faster than
if (x>0 & x <=10 & x
If x is greater than 10, because single ampersands will continue to check the rest of the conditions whereas double ampersands will break after the first non-true condition.
Why is the following code giving me an error?
int n = 30000; // Some number
for (int i = 0;
0 <= n ? (i < n) : (i > n);
0 <= n ? (i++) : (i--)) { // ## Error "not a statement" ##
f(i,n);
}
It's because the for loop has been defined that way in the Java Language Specification.
14.14.1 The basic for statement
BasicForStatement:
for ( ForInit ; Expression ; ForUpdate ) Statement
ForStatementNoShortIf:
for ( ForInit ; Expression ; ForUpdate ) StatementNoShortIf
ForInit:
StatementExpressionList
LocalVariableDeclaration
ForUpdate:
StatementExpressionList
StatementExpressionList:
StatementExpression
StatementExpressionList , StatementExpression
So it needs to be a StatementExpression or multiple StatementExpressions, and StatementExpression is defined as:
14.8 Expression statements
StatementExpression:
Assignment
PreIncrementExpression
PreDecrementExpression
PostIncrementExpression
PostDecrementExpression
MethodInvocation
ClassInstanceCreationExpression
0 <= n ? (i++) : (i--) is none of those, so it is not accepted. i += ((0 <= n) ? 1 : -1) is an assignment, so it works.
First of all, I would recommend against writing the code this way. The purpose of the code is "count up from zero to n if n is positive, count down from 0 to n if n is negative", but I would be inclined to instead write:
for (int i = 0; i < abs(n); i += 1)
{
int argument = n < 0 ? -i : i;
f(argument, n);
}
But that does not answer your question, which is:
Why can't I use ?: operators in the 3rd argument of for loops in Java?
A for loop has the structure for ( initialization ; condition ; action ).
The purpose of an expression is to compute a value.
The purpose of a statement is to take an action.
There are some expressions which by design both compute a value and take an action. i++, i += j, new foo(), method() and so on.
It is bad style to have any other expression that both computes a value and takes an action. Such expressions are difficult to reason about.
Therefore the action of the for loop is restricted to be only those expressions which by design both compute a value and take an action.
Basically, by forbidding this code the compiler is telling you that you've made a bad stylistic choice. b?i++:i-- is a legal expression but it is really bad style because it makes what is supposed to be computing a value into producing a side effect and ignoring the value.
replace
0 <= n ? (i++) : (i--)
with
i += ((0 <= n) ? 1 : -1)
that should work
Your code is giving you an error mostly because you're trying to solve your problem with invalid algorithm. The fact that JLS doesn't allow ternary as a condition in for loop doesn't help either - but the main problem is that you miss the valid solution of the task at hand.
Let's start with a common statement, prematureOptimization == sqrt(sum(evil)) - first you should consider what you want to do, not how to do it or why the code doesn't work.
the loop should just execute n times, using i as a counter
i step should be 1 if n is >= 0, otherwise -1
(side note: if n is invariant (and it is here) using e.g. abs(n) or n < 0 in the condition is a bad practice; although most compiler will try to factor the invariant out of the loop, you should usually simply use a temporary var to store the result and use the result in the comparison instead)
So, the code at hand should be:
void doSomething( int n ) {
if ( n >= 0 )
for( int i = 0; i < n; i++ )
f( i, n );
else
for( int i = 0; i > n; i-- )
f( i, n );
}
Factoring out invariants and separating distinct code branches are basic techniques used to increase algorithms efficiency (not a premature optimization, mind me); there's no faster nor more clean way to do this. Some may argue this is a case of loop unwinding - it very well would be, if not for the fact that those two loops shouldn't be wound together in the first place...
Another thing: third op in for loop was always an ordinary statement; let's try to guess why doesn't the following code compile?
0 <= n ? (i++) : (i--); // error: not a statement
... maybe because following code won't compile either?
0 <= n ? i : i; // error: not a statement
... and that's for the very same reason code below won't work in Java either?
i; // error: not a statement
Your answer is: ternary is not a statement - ternary just returns the value, and value is not a statement (at least in Java); i++ and i-- are allowed in ternary just because they return a value, but they also produce side effects here.
When you have a circular buffer represented as an array, and you need the index to wraparound (i.e., when you reach the highest possible index and increment it), is it "better" to:
return (++i == buffer.length) ? 0: i;
Or
return ++i % buffer.length;
Has using the modulo operator any drawbacks? Is it less readable than the first solution?
EDIT:
Of course it should be ++i instead of i++, changed that.
EDIT 2:
One interesting note: I found the first line of code in ArrayBlockingQueue's implementation by Doug Lea.
Update: OP has admitted in a comment that it should have been pre-increment instead. Most of the other answers missed this. There lies proof that the increment in this scenario leads to horrible readability: there's a bug, and most people couldn't see it.
The most readable version is the following:
return (i == buffer.length-1) ? 0 : i+1;
Using ++ adds unnecessary side effect to the check (not to mention that I strongly feel that you should've used pre-increment instead)
What's the problem with the original code? Let's have a look, shall we?
return (i++ == N) ? 0 : i; // OP's original, slightly rewritten
So we know that:
i is post-incremented, so when i == N-1 before the return statement, this will return N instead of wrapping to 0 immediately
Is this intended? Most of the time, the intent is to use N as an exclusive upper bound
The variable name i suggests a local variable by naming convention, but is it really?
Need to double check if it's a field, due to side-effect
In comparison:
return (i == N-1) ? 0 : i+1; // proposed alternative
Here we know that:
i is not modified, doesn't matter if it's local variable or field
When i == N-1, the returned value is 0, which is more typical scenario
The % approach
Alternatively, you can also use the % version as follows:
return (i+1) % N;
What's the problem with %? Well, the problem is that even though most people think it's the modulo operator, it's NOT! It's the remainder operator (JLS 15.17.3). A lot of people often get this confused. Here's a classic example:
boolean isOdd(int n) {
return (n % 2) == 1; // does this work???
}
That code is broken!!! It returns false for all negative values! The problem is that -1 % 2 == -1, although mathematically -1 = 1 (mod 2).
% can be tricky, and that's why I recommend the ternary operator version instead. The most important part, though, is to remove the side-effect of the increment.
See also
Wikipedia: modulo operation
Don't ask me to choose between two options which both contain postincrement (*) mixed with expression evaluation. I'll say "none".
(*) Update: It was later fixed to preincrement.
Wouldn't the i++ % buffer.length version have the drawback that it keeps incrementing i, which could lead to it hitting some sort of max_int/max_long/max_whatever limit?
Also, I would split this into
i = (i++ == buffer.length) ? 0 : i;
return i;
since otherwise you'd most likely have a bug.
The first one will give you an ArrayIndexOutOfBoundsException because i is never actually reset to 0.
The second one will (probably) give you an overflow error (or related undesirable effect) when i == Integer.MAX_VALUE (which might not actually happen in your case, but isn't good practice, IMHO).
So I'd say the second one is "more correct", but I would use something like:
i = (i+1) % buffer.length;
return i;
Which I think has neither of the two problems.
I went ahead and tested everyone's code, and was sad to find that only one of the previous posts (at the time of this post's writing) works. (Which one? Try them all to find out! You might be surprised!)
public class asdf {
static int i=0;
static int[] buffer = {0,1,2};
public static final void main(String args[]){
for(int j=0; j<5; j++){
System.out.println(buffer[getIndex()]);
}
}
public static int getIndex(){
// return (++i == buffer.length) ? 0: i;
// return ++i % buffer.length;
// i = (i++ == buffer.length) ? 0 : i;
// return i;
// i++;
// if (i >= buffer.length)
// {
// i = 0;
// }
// return i;
// return (i+1 == buffer.length) ? 0 : i+1;
i = (i+1) % buffer.length;
return i;
}
}
Expected output is:
1
2
0
1
2
Apologies in advance if there's a coding error on my part and I accidentally insult someone! x.x
PS: +1 for the previous comment about not using post-increment with equality checks (I can't actually upmod posts yet =/ )
I prefer the condition approach even if we use unsigned type, modulo operation has drawbacks. Using modulo has a bad side effect when the number tested rolls back to zero
Example:
255 % 7 == 3
So if you use byte (unsigned char) for example, when the number roll after 255 (i.e. zero), it will not result to 4. Should result to 4 (when 256 % 7), so it rotates correctly. So just use testing(if and ternary operator) constructs for correctness
If for achieving performance, and if the number is multiple of 2 (i.e. 2, 4, 8, 16, 32, 64, ...), use & operator.
So if the buffer length is 16, use:
n & 15
If buffer length is 64, use 63:
n & 63
Those rotate correctly even if the number goes back to zero. By the way, if the number is multiple of 2, even the modulo/remainder approach would also fit the bill, i.e. it will rotate correctly. But I can hazard a guess that & operation is faster than % operation.
I think the second solution has the clear advantage that it works, whereas the first does not. The first solution will always return zero when i becomes bigger than buffer.length because i is never reset.
The modulo operator has no drawbacks.
Surely it would be more readable to use an if:
i++;
if (i >= buffer.length)
{
i = 0;
}
return i;
Depends a bit if buffer.length ever changes.
This is very subjective and depends on what your colleagues are used to see. I would personally prefer the first option, as it expresses explicitly what the code does, i.e. if the buffer length is reached, reset to 0. You don't have to perform any mathematical thinking or even know what the modulo does (of course you should! :)
Personally, I prefer the modulo approach. When I see modulo, I immediately think of range limiting and looping but when I see the ternary operator, I always want to think more carefully about it simply because there are more terms to look at. Readability is subjective though, as you already pointed out in your tagging, and I suspect that most people will disagree with my opinion.
However, performance is not subjective. Modulo implies a divison operation which is often slower than a comparison against zero. Obviously, this is more difficult to determine in Java since we're not compiling to native code until the jitter kicks in.
My advice would be write which ever you feel is most appropriate (so long as it works!) and get a colleague (assuming you have one) to asses it. If they disagree, ask another colleague - then go with the majority vote. #codingbydemocracy
It is also worth noting, that if our buffer has length of power of 2 then very efficient bit manipulation will work:
idx = (idx + 1) & (length - 1)
You can use also bit manipulation:
idx = idx & ((idx-length)>>31)
But it's not faster than the if-variant on my machine.
Here is some code to compare running time in C#:
Stopwatch sw = new Stopwatch();
long cnt = 0;
int k = 0;
int modulo = 10;
sw.Start();
k = 0;
cnt = 0;
for ( int j=0 ; j<100000000 ; j++ ) {
k = (k+1) % modulo;
cnt += k;
}
sw.Stop();
Console.WriteLine( "modulo cnt=" + cnt.ToString() + " " + sw.Elapsed.ToString() );
sw.Reset();
sw.Start();
k = 0;
cnt = 0;
for (int j = 0; j < 100000000; j++) {
if ( ++k == modulo )
k = 0;
cnt += k;
}
sw.Stop();
Console.WriteLine( "if cnt=" + cnt.ToString() + " " + sw.Elapsed.ToString() );
sw.Reset();
sw.Start();
k = 0;
cnt = 0;
for (int j = 0; j < 100000000; j++) {
++k;
k = k&((k-modulo)>>31);
cnt += k;
}
sw.Stop();
Console.WriteLine( "bit cnt=" + cnt.ToString() + " " + sw.Elapsed.ToString() );
The Output:
modulo cnt=450000000 00:00:00.6406035
if cnt=450000000 00:00:00.2058015
bit cnt=450000000 00:00:00.2182448
I prefer the modulo operator for the simple reason it is shorter. And any program should be able to dream in modulo since it is almost as common as a plus operator.