Unreachable statement: while true vs if true [duplicate] - java

This question already has answers here:
Unreachable statement error using while loop in java [duplicate]
(2 answers)
Why does Java have an "unreachable statement" compiler error?
(8 answers)
if(false) vs. while(false): unreachable code vs. dead code
(3 answers)
Closed 6 years ago.
How should I understand this Java compiler behaviour?
while (true) return;
System.out.println("I love Java");
// Err: unreachable statement
if (true) return;
System.out.println("I hate Java");
// OK.
Thanks.
EDIT:
I find out the point after a few minutes:
In the first case compiler throws error because of infinite loop. In both cases compiler does not think about the code inside the statement consequent.
EDIT II:
What impress me on javac now is:
if (true) return; // Correct
}
while (true) return; // Correct
}
It looks like javac knows what is inside both loop and if consequent,
but when you write another command (as in the first example) you get non-equivalent behaviour (which looks like javac forgot what is inside loop/if).
public static final EDIT III:
As the result of this answer I may remark (hopefully correct):
Expressions as if (arg) { ...; return;} and while (arg) { ...; return;} are equivalent both semantically and syntactically (in bytecode) for Java iff argv is non-constant (or effectively final type) expression. If argv is constant expression bytecode (and behaviour) may differs.
Disclaimer
This question is not on unreachable statements but different handling of logically equivalent expressions such as while true return and if true return.

There are quite strict rules when statements are reachable in java. These rules are design to be easily evaluated and not to be 100% acurate. It should prevent basic programming errors. To reason about reachability in java you are restricted to these rules, "common logic" does not apply.
So here are the rules from the Java Language Specification 14.21. Unreachable Statements
An if-then statement can complete normally iff it is reachable.
So without an else, statements after an if-then are always reachable
A while statement can complete normally iff at least one of the following is true:
The while statement is reachable and the condition expression is not a constant expression (§15.28) with value true.
There is a reachable break statement that exits the while statement.
The condition is a constant expression "true", there is no break. Hence it does not complete normally.

According to the docs:
Except for the special treatment of while, do, and for statements whose condition expression has the constant value true, the values of expressions are not taken into account in the flow analysis.

If you change your code slightly (remove the constant expression), so it doesnt trigger javac reachability it will actually produce identical bytecode for both.
static boolean flag = true;
static void twhile(){
while (flag) return;
System.out.println("Java");
}
static void tif(){
if (flag) return;
System.out.println("Java");
}
The resulting bytecode:
static void twhile();
descriptor: ()V
flags: ACC_STATIC
Code:
stack=2, locals=0, args_size=0
StackMap locals:
StackMap stack:
0: getstatic #10 // Field flag:Z
3: ifeq 7
6: return
StackMap locals:
StackMap stack:
7: getstatic #20 // Field java/lang/System.out:Ljava/io/PrintStream;
10: ldc #26 // String Java
12: invokevirtual #28 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
15: return
LineNumberTable:
line 8: 0
line 9: 7
line 10: 15
LocalVariableTable:
Start Length Slot Name Signature
StackMapTable: number_of_entries = 1
frame_type = 7 /* same */
static void tif();
descriptor: ()V
flags: ACC_STATIC
Code:
stack=2, locals=0, args_size=0
StackMap locals:
StackMap stack:
0: getstatic #10 // Field flag:Z
3: ifeq 7
6: return
StackMap locals:
StackMap stack:
7: getstatic #20 // Field java/lang/System.out:Ljava/io/PrintStream;
10: ldc #26 // String Java
12: invokevirtual #28 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
15: return
LineNumberTable:
line 12: 0
line 13: 7
line 14: 15
LocalVariableTable:
Start Length Slot Name Signature
StackMapTable: number_of_entries = 1
frame_type = 7 /* same */

Related

Problem marking the lines in the java compiler

I am a master's student and I am researching static analysis.
In one of my tests I came across a problem in marking lines in the java compiler.
I have the following java code:
226: String json = "/org/elasticsearch/index/analysis/commongrams/commongrams_query_mode.json";
227: Settings settings = Settings.settingsBuilder()
228: .loadFromStream(json, getClass().getResourceAsStream(json))
229: .put("path.home", createHome())
230: .build();
When compiling this code, and executing the command javap -p -v CLASSNAME, I get a table with the corresponding line of the source code for each instruction in the bytecode.
See the image below:
Bytecode table
The problem is that in the call to the .put (" path.home ", createHome ()) method, bytecode generates basically four instructions:
19: anewarray
24: ldc - String path.home
30: invokespecial - createHome
34: invokevirtual - put
Being the first two marked as line 228 (Wrong) and the last two as line 229 (correct).
See the image below:
Bytecode table
This is the original implementation of the .put("path.home", createHome()) method:
public Builder put(Object... settings) {
if (settings.length == 1) {
// support cases where the actual type gets lost down the road...
if (settings[0] instanceof Map) {
//noinspection unchecked
return put((Map) settings[0]);
} else if (settings[0] instanceof Settings) {
return put((Settings) settings[0]);
}
}
if ((settings.length % 2) != 0) {
throw new IllegalArgumentException("array settings of key + value order doesn't hold correct number of arguments (" + settings.length + ")");
}
for (int i = 0; i < settings.length; i++) {
put(settings[i++].toString(), settings[i].toString());
}
return this;
}
I have already tried to compile the code using Oracle-JDK v8 and Open-JDK v16 and in both results.
I also did a test by making a change to the put() method by removing its parameters. When compiling this code the problem in marking the lines did not occur.
I wonder why the bytecode instructions map the line 229: .put (" path.home ", createHome ()) on lines other than the original in the java source code? Does anyone know if this is done on purpose?
This is connected to the way, the line number association is stored in the class file and the history of the javac compiler.
The line number table only contains entries associating line numbers to a a code location marking its beginning. So all instructions after that location are assumed to belong to the same line up to the next location that has been explicitly mentioned in the table.
Since detailed information will take up space and the specification does not demand a particular precision for the line number table, compiler vendors made different decisions about which details to include.
In the past, i.e. up to Java 7, javac only generated line number table entries for the beginning of statements, so when I compile the following code with Java 7’s javac
String settings = new StringBuilder() // this is line 7 in my .java file
.append('a')
.append(
5
+
"".length())
.toString();
I get something like
stack=3, locals=2, args_size=1
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: bipush 97
9: invokevirtual #4 // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
12: iconst_5
13: ldc #5 // String
15: invokevirtual #6 // Method java/lang/String.length:()I
18: iadd
19: invokevirtual #7 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
22: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
25: astore_1
26: return
LineNumberTable:
line 7: 0
line 14: 26
which would cause all instructions belonging to the statement to be associated with line 7 only.
This has been considered to be too little, so starting with Java 8, javac generates additional entries for method invocations within an expression spanning multiple lines. So when I compile the same code with Java 8 or newer, I get
stack=3, locals=2, args_size=1
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: bipush 97
9: invokevirtual #4 // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
12: iconst_5
13: ldc #5 // String
15: invokevirtual #6 // Method java/lang/String.length:()I
18: iadd
19: invokevirtual #7 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
22: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
25: astore_1
26: return
LineNumberTable:
line 7: 0
line 8: 9
line 12: 15
line 9: 19
line 13: 22
line 14: 26
Note how each additional entry (compared to the Java 7 version) points to an invocation instruction, to ensure that the method invocations are associated with the correct line number. This greatly improves exception stack traces as well as step debugging.
The non-invocation instructions having no explicit entry will still get associated with their closest preceding code location that has an entry.
Therefore, the bipush 97 instruction corresponding to the 'a' constant gets associated with line 7 as only the subsequent append invocation consuming the constant has an explicit entry associating it with line 8.
The consequences for the next expression, 5 + "".length(), are even more dramatic.
The instructions for pusing the constants, iconst_5 and ldc [""], get associated to line 8, the location of the previous append invocation, whereas the iadd instruction, actually belonging to the + operator between the 5 and "" constants, gets associated with the line 12, as the most recent invocation instruction that got an explicit line number is the length() invocation.
For comparison, this is how Eclipse compiles the same code:
stack=3, locals=2, args_size=1
0: new #20 // class java/lang/StringBuilder
3: dup
4: invokespecial #22 // Method java/lang/StringBuilder."<init>":()V
7: bipush 97
9: invokevirtual #23 // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
12: iconst_5
13: ldc #27 // String
15: invokevirtual #29 // Method java/lang/String.length:()I
18: iadd
19: invokevirtual #35 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
22: invokevirtual #38 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
25: astore_1
26: return
LineNumberTable:
line 6: 0
line 7: 7
line 9: 12
line 11: 13
line 9: 18
line 8: 19
line 12: 22
line 6: 25
line 13: 26
The Eclipse compiler doesn’t have javac’s history, but rather has been designed to produce line number entries for expressions in the first place. We can see that it associates the first instruction belonging to an invocation expression (not the invocation instruction) with the right line, i.e. the bipush 97 for append('a') and ldc [""] for "".length().
Further, it has additional entries for iconst_5, iadd, and astore_1, to associate them with the right lines. Of course, this higher precision also results in slightly bigger class files.

How does for-each loop works internally in JAVA?

I was trying to find the working of for-each loop when I make a function call. Please see following code,
public static int [] returnArr()
{
int [] a=new int [] {1,2,3,4,5};
return a;
}
public static void main(String[] args)
{
//Version 1
for(int a : returnArr())
{
System.out.println(a);
}
//Version 2
int [] myArr=returnArr();
for(int a : myArr)
{
System.out.println(a);
}
}
In version 1, I'm calling returnArr() method in for-each loop and in version 2, I'm explicitly calling returnArr() method and assigning it to an array and then iterating through it. Result is same for both the scenarios. I would like to know which is more efficient and why.
I thought version 2 will be more efficient, as I'm not calling method in every iteration. But to my surprise, when I debugged the code using version 1, I saw the method call happened only once!
Can anyone please explain how does it actually work? Which is more efficient/better when I code for complex objects?
The Java Language Specification shows the underlying compilation
Let L1 ... Lm be the (possibly empty) sequence of labels immediately
preceding the enhanced for statement.
The enhanced for statement is equivalent to a basic for statement of
the form:
T[] #a = Expression;
L1: L2: ... Lm:
for (int #i = 0; #i < #a.length; #i++) {
{VariableModifier} TargetType Identifier = #a[#i];
Statement
}
where Expression is the right hand side of the : in an enhanced for statement (your returnArr()). In both cases, it gets evaluated only once: in version 1, as part of the enhanced for statement; in version 2, because its result is assigned to a variable which is then used in the enhanced for statement.
The compiler is calling the method returnArr() only once. compile time optimization :)
Byte code :
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=6, args_size=1
** case -1 start ***
0: invokestatic #20 // Method returnArr:()[I --> called only once.
3: dup
4: astore 4
6: arraylength
7: istore_3
8: iconst_0
9: istore_2
10: goto 28
13: aload 4 --> loop start
15: iload_2
16: iaload
17: istore_1
18: getstatic #22 // Field java/lang/System.out:Ljav
/io/PrintStream;
21: iload_1
22: invokevirtual #28 // Method java/io/PrintStream.prin
ln:(I)V
25: iinc 2, 1
28: iload_2
29: iload_3
30: if_icmplt 13
***case -2 start****
33: invokestatic #20 // Method returnArr:()[I
36: astore_1
37: aload_1
38: dup
39: astore 5
41: arraylength
42: istore 4
44: iconst_0
45: istore_3
46: goto 64
49: aload 5 --> loop start case 2
51: iload_3
52: iaload
53: istore_2
54: getstatic #22 // Field java/lang/System.out:Ljav
/io/PrintStream;
57: iload_2
58: invokevirtual #28 // Method java/io/PrintStream.prin
ln:(I)V
61: iinc 3, 1
64: iload_3
65: iload 4
67: if_icmplt 49
70: return
Note : I am using jdk 8.
I'm not going to copy paste from the Java Language Specification, like one of the previous answers did, but instead interpret the specification in a readable format.
Consider the following code:
for (T x : expr) {
// do something with x
}
If expr evaluates to an array type like in your case, the language specification states that the resulting bytecode will be the same as:
T[] arr = expr;
for (int i = 0; i < arr.length; i++) {
T x = arr[i];
// do something with x
}
The difference only is that the variables arr and i will not be visible to your code - or the debugger, unfortunately. That's why for development, the second version might be more useful: You have the return value stored in a variable accessible by the debugger.
In your first version expr is simply the function call, while in the second version you declare another variable and assign the result of the function call to that, then use that variable as expr. I'd expect them to exhibit no measurable difference in performance, as that additional variable assignment in the second version should be optimized away by the JIT compiler, unless you also use it elsewhere.
foreach internally uses list iterator to traverse through list and yes there is a difference between them.
If you just want to traverse the list and do not have any intension to modify it then you should use foreach else use list iterator.
for (String i : myList) {
System.out.println(i);
list.remove(i); // Exception here
}
Iterator it=list.iterator();
while (it.hasNext()){
System.out.println(it.next());
it.remove(); // No Exception
}
Also if using foreach you are passing a list which is null then you will get null pointer exception in java.util.ArrayList.iterator()

JVM bug? Cached Object field value cause ArrayIndexOutOfBoundsException

This is kind of strange, but code speaks more then words, so look at the test to see what I'm doing. In my current setup (Java 7 update 21 on Windows 64 bit) this test fails with ArrayIndexOutOfBoundsException, but replacing the test method code with the commented code, it the works. And I wonder if there is any part of the Java specification that would explain why.
It seems to me, as "michael nesterenko" suggested, that the value of the array field is cached in the stack, before calling the method, and not updated on return from the call. I can't tell if it's a JVM bug or a documented "optimisation". No multi-threading or "magic" involved.
public class TestAIOOB {
private String[] array = new String[0];
private int grow(final String txt) {
final int index = array.length;
array = Arrays.copyOf(array, index + 1);
array[index] = txt;
return index;
}
#Test
public void testGrow() {
//final int index = grow("test");
//System.out.println(array[index]);
System.out.println(array[grow("test")]);
}
}
This is well defined by the Java Language Specification: to evaluate x[y], first x is evaluated, and then y is evaluated. In your case, x evaluates to a String[] with zero elements. Then, y modifies a member variable, and evaluates to 0. Trying to access the 0th element of the already-returned array fails. The fact that the member array changes has no bearing on the array lookup, because we're looking at the String[] that array referenced at the time we evaluated it.
This behavior is mandated by the JLS. Per 15.13.1, "An array access expression is evaluated using the following procedure: First, the array reference expression is evaluated. If this evaluation completes abruptly, then the array access completes abruptly for the same reason and the index expression is not evaluated. Otherwise, the index expression is evaluated. [...]".
Compare the compiled Java code by using javap -c TestAIOOB
Uncommented code:
public void testGrow();
Code:
0: getstatic #6; //Field java/lang/System.out:Ljava/io/PrintStream;
3: aload_0
4: getfield #3; //Field array:[Ljava/lang/String;
7: aload_0
8: ldc #7; //String test
10: invokespecial #8; //Method grow:(Ljava/lang/String;)I
13: aaload
14: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/St
ing;)V
17: return
Commented code:
public void testGrow();
Code:
0: aload_0
1: ldc #6; //String test
3: invokespecial #7; //Method grow:(Ljava/lang/String;)I
6: istore_1
7: getstatic #8; //Field java/lang/System.out:Ljava/io/PrintStream;
10: aload_0
11: getfield #3; //Field array:[Ljava/lang/String;
14: iload_1
15: aaload
16: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/Str
ing;)V
19: return
In the first the getfield happens before the call to grow and in the second it happens after.

When Java evaluates a conjunction (<boolean exp1> && <boolean exp2>), does it eval exp2 if exp1 is false?

I'm wondering if it's guaranteed that in a Java program, the boolean expression on the right of a conjunction (exp2 above) will NOT be evaluated as long as the expression on the left (exp1) evaluated to false. I'm wondering because I have an expression like the following:
if (var != null && var.somePredicate())
// do something
If Java is not guaranteed to stop evaluating (var != null && var.somePredicate()) after it sees that var is null, then it may try to evaluate var.somePredicate() which would throw a NullPointerException.
So my question is, does Java guarantee a certain behavior when it comes to this? Or would it be safer to write
if (var != null)
{
if (var.somePredicate())
// do something
}
From the Java Language Specification, 15.23 Conditional-And Operator &&:
The && operator is like & (§15.22.2), but evaluates its right-hand operand only if the value of its left-hand operand is true.
So the language spec guarantees that the right-hand side of your expression will not be evaluated if the left hand side is false.
No, java uses Short circuit evaluation. If expr1 is false, expr2 will not be evaluated, thus your && usage is perfectly safe.
Also, if you have if (exp1 || exp2) { .. } - exp2 will not be evaluated if exp1 is true.
Let us perform our own experiment by looking directly at the opcodes that are generated from this sample code:
public class Compare {
public static void main(String... args) {
boolean t = true;
boolean f = false;
if(f && t) {
System.out.println("Both true");
}
else {
System.out.println("One false");
}
}
}
javap -v generates:
0: iconst_1
1: istore_1
2: iconst_0
3: istore_2
4: iload_2
5: ifeq 23
8: iload_1
9: ifeq 23
12: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream;
15: ldc #3; //String No
17: invokevirtual #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
20: goto 31
23: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream;
26: ldc #5; //String Yes
28: invokevirtual #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
31: return
The relevant opcodes are ifeq for my small program. They check to see if the variables are equal to 0, and jump a certain number of operations forward if they are, in this case, to opcode 23. So if the first ifeq evaluates to false it will jump past the second ifeq instruction straight to the else statement.
This is called short circuit evaluation.
If you use && or ||, java will use short-circuit evaluation (ie not evaluate the second expression unless it needs to)
If you use & or |, java will always evaluate the second expression, even if the first was true
That's safe, Java does short circuit evaluations.

Java 7 String switch decompiled: unexpected instruction

I have decompiled a very simple class that uses the new Java 7 String Switch feature.
The class:
public class StringSwitch {
public static void main(String[] args) {
final String color = "red";
switch (color) {
case "red":
System.out.println("IS RED!");
break;
case "black":
System.out.println("IS BLACK");
break;
case "blue":
System.out.println("IS BLUE");
break;
case "green":
System.out.println("IS GREEN");
break;
}
}
}
Running the Java 7 "javap" against this class, generates an interesting set of instructions (the complete disassembled code is available here):
public static void main(java.lang.String[]);
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=4, args_size=1
...
12: lookupswitch { // 4
112785: 56
3027034: 84
93818879: 70
98619139: 98
default: 109
}
56: aload_2
57: ldc #2 // String red
...
110: tableswitch { // 0 to 3
0: 140
1: 151
2: 162
3: 173
default: 181
}
140: getstatic #8 // Field java/lang/System.out:Ljava/io/PrintStream;
143: ldc #9 // String IS RED!
...
181: return
The "LOOKUPSWITCH" is an instruction used when the switch case is sparse and can replace the TABLESWITCH, that is the default instruction for "switch" statements.
So, the question is, why are we seeing a "LOOKUPSWITCH" followed by a "TABLESWITCH"?
Thanks
Luciano
With strings in switch finding the correct case statement is a 2 step process.
Compute the hashcode of the switch string and look for a 'hashcode match' among the case statements, this is done via LOOKUPSWITCH. Note the large integer numbers under LOOKUPSWITCH, these are hashcodes of the strings in case statements.
Now 2 strings can have the same hashcode, however unlikely it may be. Hence the actual string comparison must still take place. Hence once the hashcode is matched, the switch string is compared with the string in the matched case statement. The instructions between LOOKUPSWITCH and TABLESWITCH do exactly this. Once the match is confirmed, the code to be executed for the matched case statement is reached via TABLESWITCH.
Also note that it is useful to specify which compiler you used - javac or ECJ (Eclipse compiler for java). Both compilers may generate the bytecode differently.

Categories