Can someone explain to me why the first of the following two samples compiles, while the second doesn't? Notice the only difference is that the first one explicitly qualifies the reference to x with '.this', while the second doesn't. In both cases, the final field x is clearly attempted to be used before initialized.
I would have thought both samples would be treated completely equally, resulting in a compilation error for both.
1)
public class Foo {
private final int x;
private Foo() {
int y = 2 * this.x;
x = 5;
}
}
2)
public class Foo {
private final int x;
private Foo() {
int y = 2 * x;
x = 5;
}
}
After a bunch of spec-reading and thought, I've concluded that:
In a Java 5 or Java 6 compiler, this is correct behavior. Chapter 16 "Definite Assignment of The Java Language Specification, Third Edition says:
Each local variable (§14.4) and every blank final (§4.12.4) field (§8.3.1.2) must have a definitely assigned value when any access of its value occurs. An access to its value consists of the simple name of the variable occurring anywhere in an expression except as the left-hand operand of the simple assignment operator =.
(emphasis mine). So in the expression 2 * this.x, the this.x part is not considered an "access of [x's] value" (and therefore is not subject to the rules of definite assignment), because this.x is not the simple name of the instance variable x. (N.B. the rule for when definite assignment occurs, in the paragraph after the above-quoted text, does allow something like this.x = 3, and considers x to be definitely assigned thereafter; it's only the rule for accesses that doesn't count this.x.) Note that the value of this.x in this case will be zero, per §17.5.2.
In a Java 7 compiler, this is a compiler bug, but an understandable one. Chapter 16 "Definite Assignment" of the Java Language Specification, Java 7 SE Edition says:
Each local variable (§14.4) and every blank final field (§4.12.4, §8.3.1.2) must have a definitely assigned value when any access of its value occurs.
An access to its value consists of the simple name of the variable (or, for a field, the simple name of the field qualified by this) occurring anywhere in an expression except as the left-hand operand of the simple assignment operator = (§15.26.1).
(emphasis mine). So in the expression 2 * this.x, the this.x part should be considered an "access to [x's] value", and should give a compile error.
But you didn't ask whether the first one should compile, you asked why it does compile (in some compilers). This is necessarily speculative, but I'll make two guesses:
Most Java 7 compilers were written by modifying Java 6 compilers. Some compiler-writers may not have noticed this change. Furthermore, many Java-7 compilers and IDEs still support Java 6, and some compiler-writers may not have felt motivated to specifically reject something in Java-7 mode that they accept in Java-6 mode.
The new Java 7 behavior is strangely inconsistent. Something like (false ? null : this).x is still allowed, and for that matter, even (this).x is still allowed; it's only the specific token-sequence this plus . plus the field-name that's affected by this change. Granted, such an inconsistency already existed on the left-hand side of an assignment statement (we can write this.x = 3, but not (this).x = 3), but that's more readily understandable: it's accepting this.x = 3 as a special permitted case of the otherwise forbidden construction obj.x = 3. It makes sense to allow that. But I don't think it makes sense to reject 2 * this.x as a special forbidden case of the otherwise permitted construction 2 * obj.x, given that (1) this special forbidden case is easily worked around by adding parentheses, that (2) this special forbidden case was allowed in previous versions of the language, and that (3) we still need the special rule whereby final fields have their default values (e.g. 0 for an int) until they're initialized, both because of cases like (this).x, and because of cases like this.foo() where foo() is a method that accesses x. So some compiler-writers may not have felt motivated to make this inconsistent change.
Either of these would be surprising — I assume that compiler-writers had detailed information about every single change to the spec, and in my experience Java compilers are usually pretty good about sticking to the spec exactly (unlike some languages, where every compiler has its own dialect) — but, well, something happened, and the above are my only two guesses.
When you use this in the constructor, compiler is seeing x as a member attribute of this object (default initialized). Since x is int, it's default initialized with 0. This makes compiler happy and its working fine at run time too.
When you don't use this, then compiler is using x declaration directly in the lexical analysis and hence it complains about it's initialization (compile time phenomenon).
So It's definition of this, which makes compiler to analyze x as a member variable of an object versus direct attribute during the lexical analysis in the compilation and resulting into different compilation behavior.
When used as a primary expression, the keyword this denotes a value that is a reference to the object for which the instance method was invoked (§15.12), or to the object being constructed.
I think the compiler estimates that writing this.x implies 'this' exists, so a Constructor has been called (and final variable has been initialized).
But you should get a RuntimeException when trying to run it
I assume you refer to the behaviour in Eclipse. (As stated as comment a compile with javac works).
I think this is an Eclipse problem. It has its own compiler, and own set of rules. One of them is that you may not access a field which is not initialized, although the Java-commpiler would initialize variables for you.
Related
We know, that if if statement's boolean expression/condition contains compile-time constant (or a variable, holding compile-time constant), then compiler can resolve this constant expression and:
public void ctc1() {
final int x = 1;
String text;
if (x > 0) text = "some text";
System.out.println(text); //compiles fine, as compile-time constant is resolved during compilation phase;
}
would compile and work fine, without "variable might not have been initialized" compiler error. No initialisation of text in "else" branch (or after "if") is required, as compiler "understands" the constant is always going to result in true, while evaluating x > 0 (which ends up being 1 > 0).
Why, however, same resolution does not work (or works differently) when we want to return from the method, as:
public int ctc2() {
final int x = 1;
if (x > 0) return 1;
//requires another explicit "return" for any other condition, than x > 0
}
or, moreover, as:
public int ctc2() {
final int x = 1;
if (1 > 0) return 1;
}
?
Why compiler cannot infer/understand/resolve absolutely identical semantics and cannot be sure, that the return is always executed and code is OK to be compiled?
In case of initialisation in the branch containing compile-time constant, compiler can resolve the constant value(s), and as it knows they will never change, it is sure, variable is going to be initialised. So, it allows usage of the variable after if statement.
Why resolving constant expression works differently for the return case, though? what is the point behind this difference/limitation?
To me, this looks like "two identical logical semantics" work differently.
I'm using:
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
This is a difference (mismatch? inconsistency?) between the rules around definite assignment and the particular normal completion rules of the if statement.
Specifically, the definite assignment rules say:
In all, there are four possibilities for a variable V after a statement or expression has been executed:
V is definitely assigned and is not definitely unassigned.
(The flow analysis rules prove that an assignment to V has occurred.)
...
The "flow analysis rules" are not clearly specified with regard to branch pruning, but it doesn't seem unreasonable to assume that the flow analysis is able to take into account constant values when deciding whether to follow a branch, meaning it is able to determine there is only one of the 4 states possible (definitely assigned after the if statement).
However, the reachability rules for an if statement say that:
An if-then statement can complete normally iff it is reachable.
Nothing about the expression value or flow analysis here. It's perhaps worth pointing out that this is itself different to the reachability rules for while, do and basic for loops, which do explicitly mention the case of a constant true expression. Any of these returns would be accepted in ctc2():
while (true) return 1;
do { return 1; } while (true);
for (;;) return 1;
So, the language is specified in such a way that it overlooks the fact that your if statement cannot complete normally because of a) the constant expression, b) the return statement, despite that being "obvious" to a human reader.
An example of this difference actually being desirable (or, at least, the reachability rules being desirable) is if you have a DEBUG boolean (as in, a constant-valued to trigger debug-only behaviour). You can imagine a method something like:
if (!DEBUG) {
return value;
}
return otherValue;
If the "conditional" return were treated in the same way as definite assignment, at least one of the return statements unreachable.
This would be a pain for debugging-time alternate behaviour like this.
Ofc one might argue that you could instead do something that isn't compile-time constant, e.g. invoke a method. I guess you can do that, but I would argue that not allowing use of the dirt-simplest method is.... unnecessarily restrictive, for the sake of avoiding a pretty rare "head-scratcher" in code.
int y=3;
int z=(--y) + (y=10);
when executed in C language the value of z evaluates to 20
but when the same expression in java, when executed gives the z value as 12.
Can anyone explain why this is happening and what is the difference?
when executed in C language the value of z evaluates to 20
No it does not. This is undefined behavior, so z could get any value. Including 20. The program could also theoretically do anything, since the standard does not say what the program should do when encountering undefined behavior. Read more here: Undefined, unspecified and implementation-defined behavior
As a rule of thumb, never modify a variable twice in the same expression.
It's not a good duplicate, but this will explain things a bit deeper. The reason for undefined behavior here is sequence points. Why are these constructs using pre and post-increment undefined behavior?
In C, when it comes to arithmetic operators, like + and /, the order of evaluation of the operands is not specified in the standard, so if the evaluation of those has side effects, your program becomes unpredictable. Here is an example:
int foo(void)
{
printf("foo()\n");
return 0;
}
int bar(void)
{
printf("bar()\n");
return 0;
}
int main(void)
{
int x = foo() + bar();
}
What will this program print? Well, we don't know. I'm not entirely sure if this snippet invokes undefined behavior or not, but regardless, the output is not predictable. I made a question, Is it undefined behavior to use functions with side effects in an unspecified order? , about that, so I'll update this answer later.
Some other variables have specified order (left to right) of evaluation, like || and && and this feature is used for short circuiting. For instance, if we use the above example functions and use foo() && bar(), only the foo() function will be executed.
I'm not very proficient in Java, but for completeness, I want to mention that Java basically does not have undefined or unspecified behavior except for very special situations. Almost everything in Java is well defined. For more details, read rzwitserloot's answer
There are 3 parts to this answer:
How this works in C (unspecified behaviour)
How this works in Java (the spec is clear on how this should be evaluated)
Why is there a difference.
For #1, you should read #klutt's fantastic answer.
For #2 and #3, you should read this answer.
How does it work in java?
Unlike in C, java's language specification is far more clearly specified. For example, C doesn't even tell you how many bits the data type int is supposed to have, whereas the java lang spec does: 32 bits. Even on 64-bit processors and a 64-bit java implementation.
The java spec clearly says that x+y is to be evaluated left-to-right (vs. C's 'in any order you please, compiler'), thus, first --y is evaluated which is clearly 2 (with the side-effect of making y 2), and then y=10 is evaluated which is clearly 10 (with the side effect of making y 10), and then 2+10 is evaluated which is clearly 12.
Obviously, a language like java is just better; after all, undefined behaviour is pretty much a bug by definition, whatever was wrong with the C lang spec writers to introduce this crazy stuff?
The answer is: performance.
In C, your source code is turned into machine code by the compiler, and the machine code is then interpreted by the CPU. A 2-step model.
In java, your source code is turned into bytecode by the compiler, the bytecode is then turned into machine code by the runtime, and the machine code is then interpreted by the CPU. A 3-step model.
If you want to introduce optimizations, you don't control what the CPU does, so for C there is only 1 step where it can be done: Compilation.
So C (the language) is designed to give lots of freedom to C compilers to attempt to produce optimized machine code. This is a cost/benefit scenario: At the cost of having a ton of 'undefined behaviour' in the lang spec, you get the benefit of better optimizing compilers.
In java, you get a second step, and that's where java does its optimizations: At runtime. java.exe does it to class files; javac.exe is quite 'stupid' and optimizes almost nothing. This is on purpose; at runtime you can do a better job (for example, you can use some bookkeeping to track which of two branches is more commonly taken and thus branch predict better than a C app ever could) - it also means that cost/benefit analysis now results in: The lang spec should be clear as day.
So java code is never undefined behaviour?
Not so. Java has a memory model which includes a ton of undefined behaviour:
class X { int a, b; }
X instance = new X();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 5;
instance.b = 6;
System.out.print(a);
System.out.print(b);
}}.start();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 1;
instance.b = 2;
System.out.print(a);
System.out.print(b);
}}.start();
is undefined in java. It may print 0056, 0012, 0010, 0002, 5600, 0600, and many many more possibilities. Something like 5000 (which it could legally print) is hard to imagine: How can the read of a 'work' but the read of b then fail?
For the exact same reason your C code produces arbitrary answers:
Optimization.
The cost/benefit of 'hardcoding' in the spec exactly how this code would behave would have a large cost to it: You'd take away most of the room for optimization. So java paid the cost and now has a langspec that is ambigous whenever you modify/read the same fields from different threads without establish so-called 'comes-before' guards using e.g. synchronized.
When executed in C language the value of z evaluates to 20
It is not the truth. The compiler you use evaluates it to 20. Another one can evaluate it completely different way: https://godbolt.org/z/GcPsKh
This kind of behaviour is called Undefined Behaviour.
In your expression you have two problems.
Order of eveluation (except the logical expressions) is not specified in C (it is an Unspecified Behaviour)
In this expression there is also problem with the sequence point (Undefined Bahaviour)
Well, I really thought that this would work (inside a method):
var x, y = 1;
var x = 1, y = 2;
But it does not, it would not compile - "var is not allowed in a compound definition".
I guess the reason for this is an usual trade-off. This is not a very used feature and thus not implemented, but we could yes and may be might in a future release...
Well, if you give it a manifest type:
int x, y = 1;
This declares two int variables, and initializes one of them. But local variable type inference requires an initializer to infer a type. So you're dead out of the gate.
But, suppose you meant to provide an initializer for both. It's "obvious" what to do when both initializers have the same type. So let's make it harder. Suppose you said:
var x = 1, y = 2.0;
What is this supposed to mean? Does this declare x as int and y as float? Or does it try to find some type that can be the type of both x and y? Whichever we decided, some people would think it should work the other way, and it would be fundamentally confusing.
And, for what benefit? The incremental syntactic cost of saying what you mean is trivial compared to the potential semantic confusion. And that's why we excluded this from the scope of type inference for locals.
You might say, then, "well, only make it work if they are the same type." We could do that, but now the boundary of when you can use inference and when not is even more complicated. And I'd be answering the same sort of "why don't you" question right now anyway ... The reality is that inference schemes always have limits; what you get to pick is the boundary. Better to pick clean, clear limits ("can use it in these contexts") than fuzzy ones.
I would like to have a clear and precise understanding of the difference between the two.
Also is the this keyword used to implicitly reference or explicitly ? This is also why I want clarification between the two?
I assume to use the this keyword is to reference implicitly (being something withing the class) whilst explicitly (is something not belonging to the class itself) like a parameter variable being passed into a method.
Of course my assumptions could obviously be wrong which is why I'm here asking for clarification.
Explicit means done by the programmer.
Implicit means done by the JVM or the tool , not the Programmer.
For Example:
Java will provide us default constructor implicitly.Even if the programmer didn't write code for constructor, he can call default constructor.
Explicit is opposite to this , ie. programmer has to write .
already you have got your answer but I would like to add few more.
Implicit: which is already available into your programming language like methods, classes , dataTypes etc.
-implicit code resolve the difficulties of programmer and save the time of development.
-it provides optimised code. and so on.
Explicit: which is created by the programmer(you) as per their(your) requirement, like your app class, method like getName(), setName() etc.
finally in simple way,
A pre-defined code which provides help to programmer to build their app,programs etc it is know as implicit, and which have been written by the (you)programmer to full fill the requirement it is known as Explicit.
1: Implicit casting (widening conversion)
A data type of lower size (occupying less memory) is assigned to a data type of higher size. This is done implicitly by the JVM. The lower size is widened to higher size. This is also named as automatic type conversion.
Examples:
int x = 10; // occupies 4 bytes
double y = x; // occupies 8 bytes
System.out.println(y); // prints 10.0
In the above code 4 bytes integer value is assigned to 8 bytes double value.
Explicit casting (narrowing conversion)
A data type of higher size (occupying more memory) cannot be assigned to a data type of lower size. This is not done implicitly by the JVM and requires explicit casting; a casting operation to be performed by the programmer. The higher size is narrowed to lower size.
double x = 10.5; // 8 bytes
int y = x; // 4 bytes ; raises compilation error
1
2
double x = 10.5; // 8 bytes
int y = x; // 4 bytes ; raises compilation error
In the above code, 8 bytes double value is narrowed to 4 bytes int value. It raises error. Let us explicitly type cast it.
double x = 10.5;
int y = (int) x;
1
2
double x = 10.5;
int y = (int) x;
The double x is explicitly converted to int y. The thumb rule is, on both sides, the same data type should exist.
I'll try to provide an example of a similar functionality across different programming languages to differentiate between implicit & explicit.
Implicit: When something is available as a feature/aspect of the programming language constructs being used. And you have to do nothing but call the respective functionality through the API/interface directly.
For example Garbage collection in java happens implicitly. The JVM does it for us at an appropriate time.
Explicit: When user/programmer intervention is required to invoke/call a specific functionality, without which the desired action wont take place.
For example, in C++, freeing of the memory (read: Garbage collection version) has to happen by explicitly calling delete and free operators.
Hope this helps you understand the difference clearly.
This was way more complicated than I think it needed to be:
explicit = label names of a index (label-based indexing)
example:
df['index label name']
vs
implicit = integer of index (zero-based indexing)
df[0]
Why is the following giving me a "local variable is redundant error"?
public double depreciationAmount() {
double depreciationAmount = (cost * percentDepreciated);
return depreciationAmount;
}
Why is the following giving me a "local variable is redundant error"?
Because you can trivially write this without using a local variable.
public double depreciationAmount() {
return cost * percentDepreciated;
}
Hence the local variable is deemed to be unnecessary / redundant by the checker.
However, I surmise that this is not a compiler error. It might be a compiler warning, or more likely it is a style checker or bug checker warning. It is something you could ignore without any risk to the correctness of your code ... as written.
Also, I would predict that once that the code has been JIT compiled (by a modern Hotspot JIT compiler ...) there would be no performance difference between the two versions.
I won't attempt to address the issue as to whether the warning is appropriate1. If you feel it is inappropriate, then "Local variable is redundant" using Java explains how to suppress it.
1 - Except to say that it is too much to expect current generation style checkers to know when so-called explaining variables are needed. First you'd need to get a statistically significant2 group of developers to agree on measurable3 criteria for when the variables are needed, and when they aren't.
2 - Yea, I know. Abuse of terminology.
3 - They must be measurable, and there needs to be consensus on what the thresholds should be if this is to be implemented by a checker.
Although not the case here, if having a redundant local variable is desired (I've had one time where this was the case - without getting into specifics), here's how to suppress this specific warning.
#SuppressWarnings("UnnecessaryLocalVariable")
public double depreciationAmount() {
double depreciationAmount = (cost * percentDepreciated);
return depreciationAmount;
}
You only use the value of percentDepreciated to return it when you could have just done return (cost * percentDepreciated).
Why is the following giving me a "local variable is redundant error"?
I believe this message is wrong. Your depreciationAmount variable assignment is totally fine. Moreover, I always prefer this kind of assignment before return, because it helps to avoid confusion while debugging.
In this example, the getValue() method returns an expression result, instead of assigning the expression result to a variable.
Now when I use debugger watch, to know the result of an expression, I got a confusion. My program ends with the wrong result and debugger watch values are inconsistent. It would be easy to avoid this, if I would have a variable assigned before the returning expression:
Integer value = 1 + getCounter();
return value;
instead of:
return 1 + getCounter();
Now I can put a breakpoint at the return statement and know what was the result of the expression, before it was returned. Also I do not need the expression in the watch any more, and code will be executed correctly while debugging.
In computer programming, redundant code is source code or compiled code in a computer program that is unnecessary. In the above code, you can simply return:
(cost * percentDepreciated)