bad java compiler optimization?

bad java compiler optimization? - java

I have this piece of code:
private void prepareContent() {
log.info("do something");
// success?
boolean suc = false;
suc = suc || uncompressToContent("file.tar.gz");
suc = suc || uncompressToContent("file.tgz");
for (int i = 0; i <= 9; i++) {
suc = suc || uncompressToContent("dir/" + i + ".tgz");
suc = suc || uncompressToContent("dir/" + i + ".tar.gz");
}
if (!suc) {
log.error("unable to do something");
}
}
The function returns false for "file.tar.gz" and file.tgz".
The problem is the the call to uncompressToContent("dir/1.tgz") returns true and the code stops its execution. The remaining code is not executed.
I'm not sure if this is an error in the compiler. What do you think?
Added: I forgot to mention that I need to execute all the calls to uncompressToContent and check if any returns true, using the fewer instructions as possible.

There is no error in the compiler.
As soon as suc is set to true (i.e. from the first uncompressToContent call) then all of the future expressions will return true without calling uncompressToContent. This is becuase you are using short circuit boolean or ("||") which do not evaluate the second argument if the first argument is true.
If you want all the calls to be made, use the normal or operator ("|") instead.

If the uncompress method returns true if there was a successful decompression, then suc would become true the first time that this happens. Once suc is true, all the other conditions would be true as soon as suc is evaluated, so the other part of the OR would not be evaluated. Thus, no decompressions will be attempted once at least one is successful.
This is called short circuiting and is the correct behavior and is a very useful property in most languages. And is also not a compiler optimization since it is part of the defined behavior of the language.
Beyond this answer, there are, I think, ways to make this code more readable.
First, are you sure that you want to OR rather than AND here? It seems like you want to quit as soon as one file did not compress decorrectly, not stop as one did decompress correctly.
Second, a better design, IMHO, would be to create a list of all the filenames you want to decompress, and then do a for-each over that list and do all the decompressions, it would make things more readable.
Third, if in most cases decompression would be successful, I think that exception handling is much better than boolean return values.
Here is how I would write something like this (and I would break it into functions)
List<String> filenames = new ArrayList<String>();
this.collectFilenamesToDecompress(filenames) // Write one or more than one functions of this sort based on the semantics of your problem
try
{
for(String filename: filenames)
{
uncompressFile(filename); // This will throw an exception if there is a failure
}
} catch(Exception e)
{
// Announce that there was an error and you stopped decompressing because there was an error.
// Return or quit
}
// If you got here, everything is great!

This behavior is by design.
Logical operators in most languages are short-circuiting.
In the expression a || b, b will only be evaluated if a is false.
Therefore, once suc becomes true, none of the other calls to uncompressToContent will be evaluated.

I think the compiler is doing something like: suc = uncompressToContent("file.tar.gz") || uncompressToContent("file.tgz") || uncompressToContent("...") || ... So, when it finds one true value, the execution is stopped. Is this feature documented?
Yes. It is clearly documented in the Java Language Specification section 15.24, where it says this:
"The || operator is like | (§15.22.2), but evaluates its right-hand operand only if the value of its left-hand operand is false."
The JLS then goes on to explain exactly what happens in excruciating detail. Follow the link above if you are interested.
Oh yea, and in this respect the Java || operator behaves the same as in C, C++, C#, Perl and many other programming languages.

Related

Confused about ? -1 : true [duplicate]

I have been working with Java a couple of years, but up until recently I haven't run across this construct:
int count = isHere ? getHereCount(index) : getAwayCount(index);
This is probably a very simple question, but can someone explain it? How do I read it? I am pretty sure I know how it works.
if isHere is true, getHereCount() is called,
if isHere is false getAwayCount() is called.
Correct? What is this construct called?

Yes, it is a shorthand form of
int count;
if (isHere)
count = getHereCount(index);
else
count = getAwayCount(index);
It's called the conditional operator. Many people (erroneously) call it the ternary operator, because it's the only ternary (three-argument) operator in Java, C, C++, and probably many other languages. But theoretically there could be another ternary operator, whereas there can only be one conditional operator.
The official name is given in the Java Language Specification:
§15.25 Conditional Operator ? :
The conditional operator ? : uses the boolean value of one expression to decide which of two other expressions should be evaluated.
Note that both branches must lead to methods with return values:
It is a compile-time error for either the second or the third operand expression to be an invocation of a void method.
In fact, by the grammar of expression statements (§14.8), it is not permitted for a conditional expression to appear in any context where an invocation of a void method could appear.
So, if doSomething() and doSomethingElse() are void methods, you cannot compress this:
if (someBool)
doSomething();
else
doSomethingElse();
into this:
someBool ? doSomething() : doSomethingElse();
Simple words:
booleanCondition ? executeThisPartIfBooleanConditionIsTrue : executeThisPartIfBooleanConditionIsFalse

Others have answered this to reasonable extent, but often with the name "ternary operator".
Being the pedant that I am, I'd like to make it clear that the name of the operator is the conditional operator or "conditional operator ?:". It's a ternary operator (in that it has three operands) and it happens to be the only ternary operator in Java at the moment.
However, the spec is pretty clear that its name is the conditional operator or "conditional operator ?:" to be absolutely unambiguous. I think it's clearer to call it by that name, as it indicates the behaviour of the operator to some extent (evaluating a condition) rather than just how many operands it has.

According to the Sun Java Specification, it's called the Conditional Operator. See section 15.25. You're right as to what it does.
The conditional operator ? : uses the boolean value of one expression to decide which of two other expressions should be evaluated.
The conditional operator is syntactically right-associative (it groups right-to-left), so that a?b:c?d:e?f:g means the same as a?b:(c?d:(e?f:g)).
ConditionalExpression:
ConditionalOrExpression
ConditionalOrExpression ? Expression : ConditionalExpression
The conditional operator has three operand expressions; ? appears between the first and second expressions, and : appears between the second and third expressions.
The first expression must be of type boolean or Boolean, or a compile-time error occurs.

condition ? truth : false;
If the condition is true then evaluate the first expression. If the condition is false, evaluate the second expression.
It is called the Conditional Operator and it is a type of Ternary Operation.

int count = isHere ? getHereCount(index) : getAwayCount(index);
means :
if (isHere) {
count = getHereCount(index);
} else {
count = getAwayCount(index);
}

Not exactly correct, to be precise:
if isHere is true, the result of getHereCount() is returned
otheriwse the result of getAwayCount() is returned
That "returned" is very important. It means the methods must return a value and that value must be assigned somewhere.
Also, it's not exactly syntactically equivalent to the if-else version. For example:
String str1,str2,str3,str4;
boolean check;
//...
return str1 + (check ? str2 : str3) + str4;
If coded with if-else will always result in more bytecode.

Ternary, conditional; tomato, tomatoh. What it's really valuable for is variable initialization. If (like me) you're fond of initializing variables where they are defined, the conditional ternary operator (for it is both) permits you to do that in cases where there is conditionality about its value. Particularly notable in final fields, but useful elsewhere, too.
e.g.:
public class Foo {
final double value;
public Foo(boolean positive, double value) {
this.value = positive ? value : -value;
}
}
Without that operator - by whatever name - you would have to make the field non-final or write a function simply to initialize it. Actually, that's not right - it can still be initialized using if/else, at least in Java. But I find this cleaner.

You might be interested in a proposal for some new operators that are similar to the conditional operator. The null-safe operators will enable code like this:
String s = mayBeNull?.toString() ?: "null";
It would be especially convenient where auto-unboxing takes place.
Integer ival = ...; // may be null
int i = ival ?: -1; // no NPE from unboxing
It has been selected for further consideration under JDK 7's "Project Coin."

This construct is called Ternary Operator in Computer Science and Programing techniques. And Wikipedia suggest the following explanation:
In computer science, a ternary operator (sometimes incorrectly called a tertiary operator) is an operator that takes three arguments. The arguments and result can be of different types. Many programming languages that use C-like syntax feature a ternary operator, ?: , which defines a conditional expression.
Not only in Java, this syntax is available within PHP, Objective-C too.
In the following link it gives the following explanation, which is quiet good to understand it:
A ternary operator is some operation operating on 3 inputs. It's a shortcut for an if-else statement, and is also known as a conditional operator.
In Perl/PHP it works as: boolean_condition ? true_value : false_value
In C/C++ it works as: logical expression ? action for true : action for false
This might be readable for some logical conditions which are not too complex otherwise it is better to use If-Else block with intended combination of conditional logic.
We can simplify the If-Else blocks with this Ternary operator for one code statement line.For Example:
if ( car.isStarted() ) {
car.goForward();
} else {
car.startTheEngine();
}
Might be equal to the following:
( car.isStarted() ) ? car.goForward() : car.startTheEngine();
So if we refer to your statement:
int count = isHere ? getHereCount(index) : getAwayCount(index);
It is actually the 100% equivalent of the following If-Else block:
int count;
if (isHere) {
count = getHereCount(index);
} else {
count = getAwayCount(index);
}
That's it!
Hope this was helpful to somebody!
Cheers!

Correct. It's called the ternary operator. Some also call it the conditional operator.

Its Ternary Operator(?:)
The ternary operator is an operator that takes three arguments. The first
argument is a comparison argument, the second is the result upon a true
comparison, and the third is the result upon a false comparison.

Actually it can take more than 3 arguments. For instance if we want to check wether a number is positive, negative or zero we can do this:
String m= num > 0 ? "is a POSITIVE NUMBER.": num < 0 ?"is a NEGATIVE NUMBER." :"IT's ZERO.";
which is better than using if, else if, else.

?: is a Ternary Java Operator.
Its syntax is:
condition ? expression1 : expression2;
Here, the condition is evaluated and
condition returns true, the expression1 will execute.
condition returns false, the expression2 will execute.
public class Sonycode {
public static void main(String[] args) {
double marks = 90;
String result = (marks > 40) ? "passed in exam" : "failed in exam";
System.out.println("Your result is : " + result);
}
}
Output :-
Your result is : passed in exam

It's the conditional operator, and it's more than just a concise way of writing if statements.
Since it is an expression that returns a value it can be used as part of other expressions.

Yes, you are correct. ?: is typically called the "ternary conditional operator", often referred to as simply "ternary operator". It is a shorthand version of the standard if/else conditional.
Ternary Conditional Operator

I happen to really like this operator, but the reader should be taken into consideration.
You always have to balance code compactness with the time spent reading it, and in that it has some pretty severe flaws.
First of all, there is the Original Asker's case. He just spent an hour posting about it and reading the responses. How longer would it have taken the author to write every ?: as an if/then throughout the course of his entire life. Not an hour to be sure.
Secondly, in C-like languages, you get in the habit of simply knowing that conditionals are the first thing in the line. I noticed this when I was using Ruby and came across lines like:
callMethodWhatever(Long + Expression + with + syntax) if conditional
If I was a long time Ruby user I probably wouldn't have had a problem with this line, but coming from C, when you see "callMethodWhatever" as the first thing in the line, you expect it to be executed. The ?: is less cryptic, but still unusual enough as to throw a reader off.
The advantage, however, is a really cool feeling in your tummy when you can write a 3-line if statement in the space of 1 of the lines. Can't deny that :) But honestly, not necessarily more readable by 90% of the people out there simply because of its' rarity.
When it is truly an assignment based on a Boolean and values I don't have a problem with it, but it can easily be abused.

Conditional expressions are in a completely different style, with no explicit if in the statement.
The syntax is:
boolean-expression ? expression1 : expression2;
The result of this conditional expression is
expression1 if boolean-expression is true;
otherwise the result is expression2.
Suppose you want to assign the larger number of variable num1 and num2 to max. You can simply write a statement using the conditional expression:
max = (num1 > num2) ? num1 : num2;
Note: The symbols ? and : appear together in a conditional expression. They form a conditional operator and also called a ternary operator because it uses three operands. It is the only ternary operator in Java.
cited from: Intro to Java Programming 10th edition by Y. Daniel Liang page 126 - 127

Should I separate AND with two if statements?

I have the following lines in my code:
if (command.equals("sort") && args.length == 2) {
//run some program
}
Someone suggests that I should use two separate if statements because there's no need to evaluate any of the other if statements if the command does not equal to "sort", regardless of whether or not the args length is correct.
So according to that that, I need to rewrite my code to:
if (command.equals("sort")) {
if (args.length == 2) {
//run some program
}
}
I know they both do the job, but my question is which one is better and more efficient?

No, that's not true. They call it short circuit, if the first condition evaluates as false, the second one would not be evaluated at all.

Well, since && is a short-circuit operator. So both the if statements are effectively the same.
So, in first case, if your command.equals("sort"), returns false, the following condition will not be evaluated at all. So, in my opinion, just go with the first one. It's clearer.

As stated, short circuit will cause the program to exit the if statement the moment a condition fails, meaning any further conditions will not be evaluated, so there's no real difference in the way the two formats are evaluated.
I would like to note that code legibility is negatively affected when you have several if statements nested within each other, and that to me is the main reason not to nest. For example:
if( conditionA && conditionaB && !conditionC ){
// Do Something
}
is much cleaner than:
if( conditionA ){
if( conditionB ){
if( !conditionC ){
// Do Something
}
}
}
Imagine that with 20 nested if statements? Not a common occurrence, sure, but possible.

They are the same. For your first example, any modern runtime will ignore the second expression if the first expression is false.

short circuiting is better which is done by && if you are check null case for a value and then apply a function on that object, short circuit operator works well. It stops from condition 2 to be executed if condition 1 is false.
ex:
String s=null;
if(s!=null && s.length())
This doesnt throw exceptions and also in most cases you save one more if check.

If the conditions are in the same order, they are exactly the same in terms of efficient.
if (command.equals("sort") && args.length == 2)
Will drop out if command.squals("sort") returns false and args.length will never be checked. That's the short-circuit operation of the && operator.
What it comes down to is a matter of style and readability. IMO When you start chaining too many together in a single if statement it can get hard to read.

Actually, it is called [Lazy_evaluation]: http://en.wikipedia.org/wiki/Lazy_evaluation

That's not really the question but note that if you want the two if evaluated, you can use & :
if (methodA() & methodB()) {
//
}
instead of
boolean a = methodA();
boolean b = methodB();
if (a && b) {
//
}

yeah, their suggestions are completely right. What I suggest you is to write the first check as:
"sort".equals(command)
Maybe it does not make sense in this case but in future. Use the static type first so you never need a null check before

what does the keyword assert means in java?

i saw somewhere in GWT code , it was something like this
assert display instanceof Widget : "display must extend Widget";

The assert keyword, as the name implies, makes an assertion about the code. It is used to specify something that holds true all the time -- or that, at least, should be true!
The assert keyword is followed by a boolean value (true or false), or an expression, to be evaluated at runtime, that returns a boolean.
assert true;
assert 1 == 1;
If, for any reason, the boolean expression evaluates to false, then an AssertionError is thrown.
// this will throw an AssertionError:
int x = 1;
assert x == 2;
When you use it, you make a clear statement about the state of your program on a given point, which can make it easier for readers to follow through your code.
There's a programming paradigm called program by contract, in which pieces of code make statements about the pre-conditions that must hold true for them to execute properly, and the post-conditions, that are guaranteed to hold true after their execution. You can use the assert keyword to implement this.
For example, if you write a method that calculates the square root of a number, it will only work for numbers that are greater than or equal to zero, and the result is guaranteed to satisfy the same conditions:
public double sqrt(final double x) {
assert x >= 0 : "Cannot calculate the square root of a negative number!"
double result = ...;
assert result >= 0 : "Something went wrong when calculating the square root!"
return result;
}
The most interesting aspect of assertions is that you can ask the compiler to remove them from the bytecode (by means of the -disableassertion argument), so that you won't get any kind of performance penalty at runtime on production. For this precise reason, it is of fundamental importance that the expression to be evaluate does not cause side-effects, that is, the expression should look like a pure mathematical function. Otherwise, the behavior of your program could change if the compiler removed your assertions.
Finally, if the assertions are compiled into the bytecode, they can be read by a software that will automatically generate tests that will try to break your code. It can be useful to find bugs earlier!

The assert keyword was introduced in 1.4 (follow that link for a complete description). It is a shorthand to throw an exception at runtime if a condition is not satisfied.
Think of it as
assert condition : message
as
if ( ! condition ) {
throw new AssertionError ( message ) ;
}
The idea is to give developers an easy way to help users (in your case GWT API users) to detect common errors/pitfalls
When it was introduced, the assert statement became a reserved word and that caused a few compilation issues when old code was recompiled for I.4. Especially for JUnit test suites where there was a much used assert() method. JUnit reacted by replacing assert with assertTrue()

It means that if display isn't an object of type Widget, you'll get an AssertionError with the text string that follows the assertion. Assertions are helpful for debugging.

assert keyword is used to simplify the userdefined exception.what happens,to define a userdefined exception we have to create our own exception class by defining the exception causing condition first then we have to throw that in our program.
but from java 1.5 onwards we have a keyword as assert where only we have to write assert(condition) if condtion is true it executes the other part of the program or else if it is false the it creates the object of AssertionError class and we have to handle it.
so no need to define our userdefind error.

The following text (emphasis mine) explains various forms of assertions clearly:
The assertion statement has two forms.
The first, simpler form is:
assert Expression1 ;
where Expression1 is a boolean expression. When
the system runs the assertion, it evaluates Expression1 and if it is
false throws an AssertionError with no detail message.
The second form of the assertion statement is:
assert Expression1 : Expression2 ; (Your example falls here)
where: Expression1 is a boolean expression. Expression2 is an expression that
has a value. (It cannot be an invocation of a method that is declared
void.) Use this version of the assert statement to provide a detail
message for the AssertionError. The system passes the value of
Expression2 to the appropriate AssertionError constructor, which uses
the string representation of the value as the error's detail message.
Also, refer the following oracle link for detailed information:
http://docs.oracle.com/javase/7/docs/technotes/guides/language/assert.html

What is better: multiple "if" statements or one "if" with multiple conditions?

For my work I have to develop a small Java application that parses very large XML files (~300k lines) to select very specific data (using Pattern), so I'm trying to optimize it a little. I was wondering what was better between these 2 snippets:
if (boolean_condition && matcher.find(string)) {
...
}
OR
if (boolean_condition) {
if (matcher.find(string)) {
...
}
}
Other details:
These if statements are executed on each iteration inside a loop (~20k iterations)
The boolean_condition is a boolean calculated on each iteration using an external function
If the boolean is set to false, I don't need to test the regular expression for matches
Thanks for your help.

One golden rule I follow is to "Avoid Nesting" as much as I can. But if it is at the cost of making my single if condition too complex, I don't mind nesting it out.
Besides you're using the short-circuit && operator. So if the boolean is false, it won't even try matching!
So,
if (boolean_condition && matcher.find(string)) {
...
}
is the way to go!

The following two methods:
public void oneIf(boolean a, boolean b)
{
if (a && b)
{
}
}
public void twoIfs(boolean a, boolean b)
{
if (a)
{
if (b)
{
}
}
}
produce the exact same byte code for the method body so there won't be any performance difference meaning it is purely a stylistic matter which you use (personally I prefer the first style).

Both ways are OK, and the second condition won't be tested if the first one is false.
Use the one that makes the code the more readable and understandable. For just two conditions, the first way is more logical and readable. It might not be the case anymore with 5 or 6 conditions linked with &&, || and !.

I recommend extracting your expression to a semantically meaningful variable and then passing that to your evaluation. Instead of:
if (boolean_condition && matcher.find(string)) { ... }
Assign the expression to a variable, then evaluate the variable:
const hasItem = boolean_condition && matcher.find(string)
if (hasItem) { ... }
With this method, you can keep even the most complex evaluations readable:
const hasItem = boolean_condition && matcher.find(string)
const hasOtherThing = boolean_condition || boolean_condition
const isBeforeToday = new Date(string) < new Date()
if (hasItem && hasOtherThing && isBeforeToday) { ... }

Java uses short-circuiting for those boolean operators, so both variations are functionally identical. Therefore, if the boolean_condition is false, it will not continue on to the matching
Ultimately, it comes down to which you find easier to read and debug, but deep nesting can become unwieldy if you end up with a massive amount of braces at the end
One way you can improve the readability, should the condition become longer is to simply split it onto multiple lines:
if(boolean_condition &&
matcher.find(string))
{
...
}
The only choice at that point is whether to put the && and || at the end of the previous line, or the start of the current.

I tend to see too many && and || strung together into a logic soup and are often the source of subtle bugs.
It is too easy to just add another && or || to what you think is the right spot and break existing logic.
Because of this as a general rule i try not to use either of them to avoid the temptation of adding more as requirements change.

If you like to be compliant to Sonar rule squid:S1066 you should collapse if statements to avoid warning since it states:
Collapsible "if" statements should be merged

The first one. I try to avoid if nesting like that, i think it's poor style/ugly code and the && will shortcircuit and only test with matcher.find() if the boolean is true.

In terms of performance, they're the same.
But even if they weren't
what's almost certain to dominate the time in this code is matcher.find(string) because it's a function call.

Most would prefer to use the below one, because of "&&".
if (boolean_condition && matcher.find(string)) {
...
}
We normally called these as "short-circuit (or minimum evaluation)". It means the 2nd argument (here it is "matcher.find(string)") is only evaluated only if the 1st argument doesn't have sufficient information to determine the value of the expression. As an example, if the "boolean_condition" is false, then the overall condition must be false (because of here logical AND operator). Then compiler won't check the 2nd argument which will cause to reduce the running time of your code.

Does Java check all arguments in "&&" (and) operator even if one of them is false?

I have such code:
if(object != null && object.field != null){
object.field = "foo";
}
Assume that object is null.
Does this code result in nullPointerException or just if statement won't be executed?
If it does, how to refactor this code to be more elegant (if it is possible of course)?

&& does short circuit while & would not.
But with simple questions like this, it is best to just try it (ideone can help when you don't have access to a machine).
&& - http://ideone.com/LvV6w
& - http://ideone.com/X5PdU
Finally the place to check for sure would be the JLS §15.23. Not the most easy thing to read, the relevent section states:
The && operator is like & (§15.22.2), but evaluates its right-hand operand only if the value of its left-hand operand is true.

Java does have short circuit evaluation, i.e. your code should be ok

One way to know it! Test it! How? Well, make a method which prints out something:
public static boolean test(int i)
{
System.out.println(i);
return false;
}
...
if (test(1) && test(2) && test(3))
{
// not reached
}
This prints:
1
So the answer on your question is "no".

Best way to find out would be try it, especially for a single line question. Would have been faster, too.
The answer is that Java will not execute the body of the "if".

This will not throw any NullPointerException . The condition will be evaluated from left to right and the moment first false expression is found it will not evaluate remaining expression.

Maybe this other question helps you:
Differences in boolean operators: & vs && and | vs ||

Java has short circuit evaluation, so it will be fine.
The code looks ok to me, but do you actually need to check object.field != null? I think that test can be omitted as you never use the variable, just set it.
On a side-note, most programmers wouldn't access fields directly (object.field) but rather through getters/setters (object.setField(x);). Without any more context to go on, I can't say if this is appropriate in your case.

&& and || conditions stops at the point they can decide whether the condition is true/false, in your case, the condition will stop right after object != null and I think that your code is just fine for this case

If you want all of your boolean expressions evaluated regardless of the truth value of each, then you can use & and | instead of && and ||. However make sure you use these only on boolean expressions. Unlike && and ||, & and | also have a meaning for numeric types which is completely different from their meaning for booleans.
http://ibiblio.org/java/course/week2/46.html

Although short circuiting would work here, its not a guarantee that (like I have done many times) you'll get the order wrong when writing another, it would be better practice to nest those if statements and define the order you want the boolean checks to break:
if(object != null)
{
if(object.field != null)
{
object.field = "foo";
}
}
This does exactly the same as you're essentially saying, if the first boolean check fails don't do the second; it is also nullPointerException safe as object.field will not be checked unless object is not null
Using short-circuiting on booleans can become annoying later on as when you have a multiple bool if statement it becomes trickier to efficiently debug which part short circuited.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.