Expression evaluation in C vs Java - java

int y=3;
int z=(--y) + (y=10);
when executed in C language the value of z evaluates to 20
but when the same expression in java, when executed gives the z value as 12.
Can anyone explain why this is happening and what is the difference?

when executed in C language the value of z evaluates to 20
No it does not. This is undefined behavior, so z could get any value. Including 20. The program could also theoretically do anything, since the standard does not say what the program should do when encountering undefined behavior. Read more here: Undefined, unspecified and implementation-defined behavior
As a rule of thumb, never modify a variable twice in the same expression.
It's not a good duplicate, but this will explain things a bit deeper. The reason for undefined behavior here is sequence points. Why are these constructs using pre and post-increment undefined behavior?
In C, when it comes to arithmetic operators, like + and /, the order of evaluation of the operands is not specified in the standard, so if the evaluation of those has side effects, your program becomes unpredictable. Here is an example:
int foo(void)
{
printf("foo()\n");
return 0;
}
int bar(void)
{
printf("bar()\n");
return 0;
}
int main(void)
{
int x = foo() + bar();
}
What will this program print? Well, we don't know. I'm not entirely sure if this snippet invokes undefined behavior or not, but regardless, the output is not predictable. I made a question, Is it undefined behavior to use functions with side effects in an unspecified order? , about that, so I'll update this answer later.
Some other variables have specified order (left to right) of evaluation, like || and && and this feature is used for short circuiting. For instance, if we use the above example functions and use foo() && bar(), only the foo() function will be executed.
I'm not very proficient in Java, but for completeness, I want to mention that Java basically does not have undefined or unspecified behavior except for very special situations. Almost everything in Java is well defined. For more details, read rzwitserloot's answer

There are 3 parts to this answer:
How this works in C (unspecified behaviour)
How this works in Java (the spec is clear on how this should be evaluated)
Why is there a difference.
For #1, you should read #klutt's fantastic answer.
For #2 and #3, you should read this answer.
How does it work in java?
Unlike in C, java's language specification is far more clearly specified. For example, C doesn't even tell you how many bits the data type int is supposed to have, whereas the java lang spec does: 32 bits. Even on 64-bit processors and a 64-bit java implementation.
The java spec clearly says that x+y is to be evaluated left-to-right (vs. C's 'in any order you please, compiler'), thus, first --y is evaluated which is clearly 2 (with the side-effect of making y 2), and then y=10 is evaluated which is clearly 10 (with the side effect of making y 10), and then 2+10 is evaluated which is clearly 12.
Obviously, a language like java is just better; after all, undefined behaviour is pretty much a bug by definition, whatever was wrong with the C lang spec writers to introduce this crazy stuff?
The answer is: performance.
In C, your source code is turned into machine code by the compiler, and the machine code is then interpreted by the CPU. A 2-step model.
In java, your source code is turned into bytecode by the compiler, the bytecode is then turned into machine code by the runtime, and the machine code is then interpreted by the CPU. A 3-step model.
If you want to introduce optimizations, you don't control what the CPU does, so for C there is only 1 step where it can be done: Compilation.
So C (the language) is designed to give lots of freedom to C compilers to attempt to produce optimized machine code. This is a cost/benefit scenario: At the cost of having a ton of 'undefined behaviour' in the lang spec, you get the benefit of better optimizing compilers.
In java, you get a second step, and that's where java does its optimizations: At runtime. java.exe does it to class files; javac.exe is quite 'stupid' and optimizes almost nothing. This is on purpose; at runtime you can do a better job (for example, you can use some bookkeeping to track which of two branches is more commonly taken and thus branch predict better than a C app ever could) - it also means that cost/benefit analysis now results in: The lang spec should be clear as day.
So java code is never undefined behaviour?
Not so. Java has a memory model which includes a ton of undefined behaviour:
class X { int a, b; }
X instance = new X();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 5;
instance.b = 6;
System.out.print(a);
System.out.print(b);
}}.start();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 1;
instance.b = 2;
System.out.print(a);
System.out.print(b);
}}.start();
is undefined in java. It may print 0056, 0012, 0010, 0002, 5600, 0600, and many many more possibilities. Something like 5000 (which it could legally print) is hard to imagine: How can the read of a 'work' but the read of b then fail?
For the exact same reason your C code produces arbitrary answers:
Optimization.
The cost/benefit of 'hardcoding' in the spec exactly how this code would behave would have a large cost to it: You'd take away most of the room for optimization. So java paid the cost and now has a langspec that is ambigous whenever you modify/read the same fields from different threads without establish so-called 'comes-before' guards using e.g. synchronized.

When executed in C language the value of z evaluates to 20
It is not the truth. The compiler you use evaluates it to 20. Another one can evaluate it completely different way: https://godbolt.org/z/GcPsKh
This kind of behaviour is called Undefined Behaviour.
In your expression you have two problems.
Order of eveluation (except the logical expressions) is not specified in C (it is an Unspecified Behaviour)
In this expression there is also problem with the sequence point (Undefined Bahaviour)

Related

Function call with i + 1, ++i implementation difference?

Just a quickie about Java, and let me say this first: I'm not worried about this due to performance optimization, just curious about the behind the scenes stuff in general. Since I found nothing to this I assume they are equivalent in every respect, but just to be sure:
In a recursive function foo(int i), or I guess in generel in a function call, is there a difference between foo(++i) and foo(i + 1)? I know the result is the same, but I thought the one with ++i might do some extrawork in the background because it is an assignment? So maybe it creates an (anonymous field) = i in the background which gets incremented and then referenced?
Or does the compiler just optimize ++i away to i + 1?
My understanding is a bit fuzzy as you see, so help would be appreciated.
EDIT to clarify:
At first this was due to a recursive function e.g.
public static int foo(int i) {
return (i >= 4) ? 0
: i + foo(++i);
}
The "functions in general"-part came in the middle of writing the question and, as remarked, makes this ambiguous etc. Hope this clarifies everything.
If the answer is not about semantics but about performance at the machine level after the IR has been optimized, translated into machine instructions, I would say, "no unless measured and proven otherwise."
It's very unlikely that there will be a performance difference after all the optimizations are made between f(++i) and f(i+1), assuming your code is such that you can actually consider these as alternatives (assuming the state of i ceases to become relevant after the function call).
It's just basic hardware and compiler design, the cost of instructions for memory already stored in a register, the simplicity of optimizing away this code to the same machine code in even a semi-competent compiler (and I'd think Java's JIT would at least be that). It's a very basic nature of a compiler to recognize unnecessary side effects and eliminate them outright (actually one of the reasons why artificial micro-level benchmarks can be misleading unless they're written very carefully in a way that prevents the optimizer from skipping certain code outright). Among the easiest side effects to eliminate is a case like this, where we're incrementing a variable, i but not depending on the state change afterwards.
It seems unlikely to have any real impact on performance. Of course the ultimate way is to look at the final resulting machine code (not the bytecode IR but the actual final machine code) or measure and profile it. But I'd be quite shocked, to say the least, if one is faster than the other, and it would tend to make me think that the compiler is doing a pretty poor job in either instruction selection or register allocation.
That said, if one actually was (remote chance) faster than the other, I think you have it backwards. ++i would likely require less work since it can just increment a value in a register. ++i is a unary operation on one operand, and that works well with registers which are mutable. i + 1 is an expression that requires that i be treated as immutable and would call for a second register (but only in a really horrible kind of toy compiler scenario that didn't optimize anything, though in that case we're also compiling this expression in the context of a function call and have to consider things like stack spills, so even a toy compiler might make these somewhat equivalent).
There is no difference in the execution of foo(++i) and foo(i + 1): the value of the parameter passed to foo is the same.
However, in the former case, the local variable i is increased by one; in the latter case, it retains its original value.
So if you make the same method call twice, there is a difference:
foo(++i);
foo(++i); // Invoke with i 1 greater than before.
is different from
foo(i + 1);
foo(i + 1); // Invoke with the same argument
As such, in general, the compiler cannot optimize anything away, because they have different semantics.
Beside the other comments about the value of i after using the prefix increment. There is a small difference in the generated bytecode. See below the simplified results.
foo(++i) compiled to
iinc
iload_1
invokestatic
foo(i + 1) compiled to
iload_1
iconst_1
iadd
invokestatic
yes, ++i consists of two operations addition and assignment.Since, you don't need i later, i+1 is better to use rather than ++i.
If i is used after foo(++i), the value will have been changed
If you process i after foo(i + 1), the value will not have been changed
foo(int i)
{
//process stuff A
foo(i + 1);
//process stuff B
}
bar(int i)
{
//process stuff A
bar(i + 1);
//process stuff B
}
Depending on what is in //process stuff B, you may either want to use i + 1 and have the value of i be the same, or ++i and have i be incremented

Why does compareTo return an integer

I recently saw a discussion in an SO chat but with no clear conclusions so I ended up asking there.
Is this for historical reasons or consistency with other languages? When looking at the signatures of compareTo of various languages, it returns an int.
Why it doesn't return an enum instead. For example in C# we could do:
enum CompareResult {LessThan, Equals, GreaterThan};
and :
public CompareResult CompareTo(Employee other) {
if (this.Salary < other.Salary) {
return CompareResult.LessThan;
}
if (this.Salary == other.Salary){
return CompareResult.Equals;
}
return CompareResult.GreaterThan;
}
In Java, enums were introduced after this concept (I don't remember about C#) but it could have been solved by an extra class such as:
public final class CompareResult {
public static final CompareResult LESS_THAN = new Compare();
public static final CompareResult EQUALS = new Compare();
public static final CompareResult GREATER_THAN = new Compare();
private CompareResult() {}
}
and
interface Comparable<T> {
Compare compareTo(T obj);
}
I'm asking this because I don't think an int represents well the semantics of the data.
For example in C#,
l.Sort(delegate(int x, int y)
{
return Math.Min(x, y);
});
and its twin in Java 8,
l.sort(Integer::min);
compiles both because Min/min respect the contracts of the comparator interface (take two ints and return an int).
Obviously the results in both cases are not the ones expected. If the return type was Compare it would have cause a compile error thus forcing you to implement a "correct" behavior (or at least you are aware of what you are doing).
A lot of semantic is lost with this return type (and potentially can cause some difficult bugs to find), so why design it like this?
[This answer is for C#, but it probably also apples to Java to some extent.]
This is for historical, performance and readability reasons. It potentially increases performance in two places:
Where the comparison is implemented. Often you can just return "(lhs - rhs)" (if the values are numeric types). But this can be dangerous: See below!
The calling code can use <= and >= to naturally represent the corresponding comparison. This will use a single IL (and hence processor) instruction compared to using the enum (although there is a way to avoid the overhead of the enum, as described below).
For example, we can check if a lhs value is less than or equal to a rhs value as follows:
if (lhs.CompareTo(rhs) <= 0)
...
Using an enum, that would look like this:
if (lhs.CompareTo(rhs) == CompareResult.LessThan ||
lhs.CompareTo(rhs) == CompareResult.Equals)
...
That is clearly less readable and is also inefficient since it is doing the comparison twice. You might fix the inefficiency by using a temporary result:
var compareResult = lhs.CompareTo(rhs);
if (compareResult == CompareResult.LessThan || compareResult == CompareResult.Equals)
...
It's still a lot less readable IMO - and it's still less efficient since it's doing two comparison operations instead of one (although I freely admit that it is likely that such a performance difference will rarely matter).
As raznagul points out below, you can actually do it with just one comparison:
if (lhs.CompareTo(rhs) != CompareResult.GreaterThan)
...
So you can make it fairly efficient - but of course, readability still suffers. ... != GreaterThan is not as clear as ... <=
(And if you use the enum, you can't avoid the overhead of turning the result of a comparison into an enum value, of course.)
So this is primarily done for reasons of readability, but also to some extent for reasons of efficiency.
Finally, as others have mentioned, this is also done for historical reasons. Functions like C's strcmp() and memcmp() have always returned ints.
Assembler compare instructions also tend to be used in a similar way.
For example, to compare two integers in x86 assembler, you can do something like this:
CMP AX, BX ;
JLE lessThanOrEqual ; jump to lessThanOrEqual if AX <= BX
or
CMP AX, BX
JG greaterThan ; jump to greaterThan if AX > BX
or
CMP AX, BX
JE equal ; jump to equal if AX == BX
You can see the obvious comparisons with the return value from CompareTo().
Addendum:
Here's an example which shows that it's not always safe to use the trick of subtracting the rhs from the lhs to get the comparison result:
int lhs = int.MaxValue - 10;
int rhs = int.MinValue + 10;
// Since lhs > rhs, we expect (lhs-rhs) to be +ve, but:
Console.WriteLine(lhs - rhs); // Prints -21: WRONG!
Obviously this is because the arithmetic has overflowed. If you had checked turned on for the build, the code above would in fact throw an exception.
For this reason, the optimization of suusing subtraction to implement comparison is best avoided. (See comments from Eric Lippert below.)
Let's stick to bare facts, with absolute minumum of handwaving and/or unnecessary/irrelevant/implementation dependent details.
As you already figured out yourself, compareTo is as old as Java (Since: JDK1.0 from Integer JavaDoc); Java 1.0 was designed to be familiar to C/C++ developers, and mimicked a lot of it's design choices, for better or worse. Also, Java has a backwards compatibility policy - thus, once implemented in core lib, the method is almost bound to stay in it forever.
As to C/C++ - strcmp/memcmp, which existed for as long as string.h, so essentially as long as C standard library, return exactly the same values (or rather, compareTo returns the same values as strcmp/memcmp) - see e.g. C ref - strcmp. At the time of Java's inception going that way was the logical thing to do. There weren't any enums in Java at that time, no generics etc. (all that came in >= 1.5)
The very decision of return values of strcmp is quite obvious - first and foremost, you can get 3 basic results in comparison, so selecting +1 for "bigger", -1 for "smaller" and 0 for "equal" was the logical thing to do. Also, as pointed out, you can get the value easily by subtraction, and returning int allows to easily use it in further calculations (in a traditional C type-unsafe way), while also allowing efficient single-op implementation.
If you need/want to use your enum based typesafe comparison interface - you're free to do so, but since the convention of strcmp returning +1/0/-1 is as old as contemporary programming, it actually does convey semantic meaning, in the same way null can be interpreted as unknown/invalid value or a out of bounds int value (e.g. negative number supplied for positive-only quality) can be interpreted as error code. Maybe it's not the best coding practice, but it certainly has its pros, and is still commonly used e.g. in C.
On the other hand, asking "why the standard library of language XYZ does conform to legacy standards of language ABC" is itself moot, as it can only be accurately answered by the very language designed who implemented it.
TL;DR it's that way mainly because it was done that way in legacy versions for legacy reasons and POLA for C programmers, and is kept that way for backwards-compatibility & POLA, again.
As a side note, I consider this question (in its current form) too broad to be answered precisely, highly opinion-based, and borderline off-topic on SO due to directly asking about Design Patterns & Language Architecture.
This practice comes from comparing integers this way, and using a subtract between first non-matching chars of a string.
Note that this practice is dangerous with things that are partially comparable while using a -1 to mean that a pair of things was incomparable. This is because it could create a situation of a < b and b < a (which the application might use to define "incomparable"). Such a situation can lead to loops that don't terminate correctly.
An enumeration with values {lt,eq,gt,incomparable} would be more correct.
My understanding is that this is done because you can order the results (i.e., the operation is reflexive and transitive). For example, if you have three objects (A,B,C) you can compare A->B and B->C, and use the resulting values to order them properly. There is an implied assumption that if A.compareTo(B) == A.compareTo(C) then B==C.
See java's comparator documentation.
Reply this is due to performance reasons.
If you need to compare int as often happens you can return the following:
Infact comparison are often returned as substractions.
As an example
public class MyComparable implements Comparable<MyComparable> {
public int num;
public int compareTo(MyComparable x) {
return num - x.num;
}
}

Java not deterministic?

I've written a little predator-prey simulation in Java. Even if the rules are quite complicated and end up in a chaotic system the techniques used are simple:
arithmetics and decisions on basic data types
no external libraries
no external systems are included
no concurrency occurs
no use of current time or date
So I thought when initializing the system with identical parameters it should output identical results, but it doesn't and I wonder why.
Some thoughts on that:
My application uses Randoms, but for that test I initialize them all with a given value, so in my understanding they should create for every run the same outputs in the same order.
I'm iterating through Sets, and I know that the order a Set is iterated isn't defined. But I don't see any reason why a Set that is filled in the same order with the same values should behave different in several runs. Does it?
I'm using a lot of floats. Datatypes where 1 + 1 = 1.9999999999725 are always suspect to me, but I even if their behavior is strange to me, it should be always the same strange. Isn't it?
Garbage Collection isn't deterministic, but as long as I don't rely on destructors I should be safe.
As said above, there is no concurrency and no datatypes depending on the actual time in use.
I can't reproduce that behavior in a simple example. But going through my code, I can't see anything that could be unpredictable. So are any of my assumptions above wrong? Any ideas what I could be missing?
Here's a test to verify my assumptions:
public static void main(String[] args) {
Random r = new Random(1);
Set<Float> s = new HashSet<Float>();
for (int i = 0; i < 1000000; i++) {
s.add(r.nextFloat());
}
float ret = 1;
int cnt = 0;
for (Float f : s) {
float multiply = 0.3f;
if (cnt++ % 2 == 0) {
multiply = 0.7f;
}
float f2 = (f * multiply);
ret += f2;
}
System.out.println(ret);
}
It results always in 242455.25 for me.
You can write a deterministic program in Java. You just need to eliminate the possible sources of non-determinism.
It's hard to know what could be causing non-determinism without seeing your actual code, and concrete evidence of that determinism.
There are any number of library methods that could potentially be sources of non-deterministic behaviour ... depending on how you use them.
For example, the value returned by Object.hashcode() (the first time it is called on an instance) is non-deterministic. And that percolates through to any library that uses hashing. It can definitely affecting the order in which elements of a HashSet or HashMap are returned when you iterate them ... if the element class doesn't override hashcode().
Random number generators may or may not be deterministic. If they are pseudo-random and they are initialized with fixed seeds, then the sequence of numbers produced by each one will be deterministic.
Floating point arithmetic should be deterministic. For any (fixed) set of inputs to an arithmetic expression, the result should always be the same. (I'm not sure that determinism of floating point arithmetic is guaranteed by the JLS, but it would be mighty strange if it happened in practice. As in ... you are running on broken hardware.)
FOLLOWUP ... on strictfp and non-determinism.
According to the JLS 15.4:
"Within an expression that is not FP-strict, some leeway is granted for an implementation to use an extended exponent range to represent intermediate results; the net effect, roughly speaking, is that a calculation might produce "the correct answer" in situations where exclusive use of the float value set or double value set might result in overflow or underflow."
This doesn't exactly say how much "leeway" the implementation has in a non-FP-strict expressions. However, I'd have thought that that leeway would not extend to allowing non-deterministic behaviour. I'd have thought that a JIT compiler on a particular platform would always generate equivalent native code for the same expression, and that code would be deterministic. (I can't see any reason for non-determinism ... unless the hardware itself has non-deterministic floating point.) The other possible source of non-determinism might be that behaviour of JIT compiled and interpreted code might be different. But frankly, I think that it would be "nuts" to allow that to happen ... and I think we'd have heard of it.
So while non-FP-strict expression evaluation could be non-deterministic in theory, I think we should discount this ... unless there is clear evidence that it happens in practice.
(Note that I'm talking about real non-determinism, not platform differences.)
I'm iterating throu Sets, and I know that the order a Set is iterated isn't definied. But I don't see any reason why a Set that is filled in the same order with the same values should behave differnet in several runs. Does it?
It can. The implementation is free to use, for example, the object's location in memory as the key into the underlying hash table. That can vary depending on when garbage collection runs.

Use of uninitialized final field - with/without 'this.' qualifier

Can someone explain to me why the first of the following two samples compiles, while the second doesn't? Notice the only difference is that the first one explicitly qualifies the reference to x with '.this', while the second doesn't. In both cases, the final field x is clearly attempted to be used before initialized.
I would have thought both samples would be treated completely equally, resulting in a compilation error for both.
1)
public class Foo {
private final int x;
private Foo() {
int y = 2 * this.x;
x = 5;
}
}
2)
public class Foo {
private final int x;
private Foo() {
int y = 2 * x;
x = 5;
}
}
After a bunch of spec-reading and thought, I've concluded that:
In a Java 5 or Java 6 compiler, this is correct behavior. Chapter 16 "Definite Assignment of The Java Language Specification, Third Edition says:
Each local variable (§14.4) and every blank final (§4.12.4) field (§8.3.1.2) must have a definitely assigned value when any access of its value occurs. An access to its value consists of the simple name of the variable occurring anywhere in an expression except as the left-hand operand of the simple assignment operator =.
(emphasis mine). So in the expression 2 * this.x, the this.x part is not considered an "access of [x's] value" (and therefore is not subject to the rules of definite assignment), because this.x is not the simple name of the instance variable x. (N.B. the rule for when definite assignment occurs, in the paragraph after the above-quoted text, does allow something like this.x = 3, and considers x to be definitely assigned thereafter; it's only the rule for accesses that doesn't count this.x.) Note that the value of this.x in this case will be zero, per §17.5.2.
In a Java 7 compiler, this is a compiler bug, but an understandable one. Chapter 16 "Definite Assignment" of the Java Language Specification, Java 7 SE Edition says:
Each local variable (§14.4) and every blank final field (§4.12.4, §8.3.1.2) must have a definitely assigned value when any access of its value occurs.
An access to its value consists of the simple name of the variable (or, for a field, the simple name of the field qualified by this) occurring anywhere in an expression except as the left-hand operand of the simple assignment operator = (§15.26.1).
(emphasis mine). So in the expression 2 * this.x, the this.x part should be considered an "access to [x's] value", and should give a compile error.
But you didn't ask whether the first one should compile, you asked why it does compile (in some compilers). This is necessarily speculative, but I'll make two guesses:
Most Java 7 compilers were written by modifying Java 6 compilers. Some compiler-writers may not have noticed this change. Furthermore, many Java-7 compilers and IDEs still support Java 6, and some compiler-writers may not have felt motivated to specifically reject something in Java-7 mode that they accept in Java-6 mode.
The new Java 7 behavior is strangely inconsistent. Something like (false ? null : this).x is still allowed, and for that matter, even (this).x is still allowed; it's only the specific token-sequence this plus . plus the field-name that's affected by this change. Granted, such an inconsistency already existed on the left-hand side of an assignment statement (we can write this.x = 3, but not (this).x = 3), but that's more readily understandable: it's accepting this.x = 3 as a special permitted case of the otherwise forbidden construction obj.x = 3. It makes sense to allow that. But I don't think it makes sense to reject 2 * this.x as a special forbidden case of the otherwise permitted construction 2 * obj.x, given that (1) this special forbidden case is easily worked around by adding parentheses, that (2) this special forbidden case was allowed in previous versions of the language, and that (3) we still need the special rule whereby final fields have their default values (e.g. 0 for an int) until they're initialized, both because of cases like (this).x, and because of cases like this.foo() where foo() is a method that accesses x. So some compiler-writers may not have felt motivated to make this inconsistent change.
Either of these would be surprising — I assume that compiler-writers had detailed information about every single change to the spec, and in my experience Java compilers are usually pretty good about sticking to the spec exactly (unlike some languages, where every compiler has its own dialect) — but, well, something happened, and the above are my only two guesses.
When you use this in the constructor, compiler is seeing x as a member attribute of this object (default initialized). Since x is int, it's default initialized with 0. This makes compiler happy and its working fine at run time too.
When you don't use this, then compiler is using x declaration directly in the lexical analysis and hence it complains about it's initialization (compile time phenomenon).
So It's definition of this, which makes compiler to analyze x as a member variable of an object versus direct attribute during the lexical analysis in the compilation and resulting into different compilation behavior.
When used as a primary expression, the keyword this denotes a value that is a reference to the object for which the instance method was invoked (§15.12), or to the object being constructed.
I think the compiler estimates that writing this.x implies 'this' exists, so a Constructor has been called (and final variable has been initialized).
But you should get a RuntimeException when trying to run it
I assume you refer to the behaviour in Eclipse. (As stated as comment a compile with javac works).
I think this is an Eclipse problem. It has its own compiler, and own set of rules. One of them is that you may not access a field which is not initialized, although the Java-commpiler would initialize variables for you.

Boolean expressions optimizations in Java

Consider the following method in Java:
public static boolean expensiveComputation() {
for (int i = 0; i < Integer.MAX_VALUE; ++i);
return false;
}
And the following main method:
public static void main(String[] args) {
boolean b = false;
if (expensiveComputation() && b) {
}
}
Logical conjunction (same as &&) is a commutative operation. So why the compiler doesn't optimize the if-statement code to the equivalent:
if (b && expensiveComputation()) {
}
which has the benefits of using short-circuit evaluation?
Moreover, does the compiler try to make other logic simplifications or permutation of booleans in order to generate faster code? If not, why? Surely some optimizations would be very difficult, but my example isn't simple? Calling a method should always be slower than reading a boolean, right?
Thank you in advance.
It doesn't do that because expensiveComputation() may have side effects which change the state of the program. This means that the order in which the expressions in the boolean statements are evaluated (expensiveComputation() and b) matters. You wouldn't want the compiler optimizing a bug into your compiled program, would you?
For example, what if the code was like this
public static boolean expensiveComputation() {
for (int i = 0; i < Integer.MAX_VALUE; ++i);
b = false;
return false;
}
public static boolean b = true;
public static void main(String[] args) {
if (expensiveComputation() || b) {
// do stuff
}
}
Here, if the compiler performed your optimization, then the //do stuff would run when you wouldn't expect it to by looking at the code (because the b, which is originally true, is evaluated first).
Because expensiveComputation() may have side-effects.
Since Java doesn't aim to be a functionally pure language, it doesn't inhibit programmers from writing methods that have side-effects. Thus there probably isn't a lot of value in the compiler analyzing for functional purity. And then, optimizations like you posit are unlikely to be very valuable in practice, as expensiveComputation() would usually be required to executed anyway, to get the side effects.
Of course, for a programmer, it's easy to put the b first if they expect it to be false and explicitly want to avoid the expensive computation.
Actually, some compilers can optimise programs like the one you suggested, it just has to make sure that the function has no side-effects. GCC has a compiler directive you can annotate a function with to show that it has no side-effects, which the compiler may then use when optimizing. Java may have something similar.
A classic example is
for(ii = 0; strlen(s) > ii; ii++) < do something >
which gets optimized to
n = strlen(s); for(ii = 0; n > ii; ii++) < do something >
by GCC with optimization level 2, at least on my machine.
The compiler will optimize this if you run the code often enough, probably by inlining the method and simplifying the resulting boolean expression (but most likely not by reordering the arguments of &&).
You can benchmark this by timing a loop of say a million iterations of this code repeatedly. The first iteration or two are much slower than the following.
The version of java I am using optimises a in an expression a && b but not with b.
i.e. If a is false, b does not get evaluated but if b was false it did not do this.
I found this out when I was implementing validation in a website form: I created messages to display on the web-page in a series of boolean methods.
I expected of the fields in the page which were incorrectly entered to become highlighted but, because of Java's speed-hack, the code was only executed until the first incorrect field was discovered. After that, Java must have thought something like "false && anything is always false" and skipped the remaining validation methods.
I suppose, as a direct answer to your question, if you make optimisations like this, your program may run slower than it could. However, someone else's program will completely break because they have assumed the non-optimised behaviour like the side effect thing mentioned in other answers.
Unfortunately, it's difficult to automate intelligent decisions, especially with imperative languages (C, C++, Java, Python,... i.e the normal languages).

Categories