compiler optimizes away a final field added to a serialized class

compiler optimizes away a final field added to a serialized class - java

This is a strange one, am wondering if this is by design or a compiler bug. Am using Sun Java 6 on a PC, but also seen with Sun Java 5 on a linux box.
Suppose I have a file on disk containing a serialized class. I then add a final field to the class and declare and initialize the field in one statement:
public final int newField = 2;
If I read in an object that was serialized before this field was added, this new field is given a default value of 0, which is correct. However, the compiler replaces occurrences of the field by the constant "2". OTOH, if I instead initialize the new field inside of a constructor:
Data (int d) {
this.oldField = d;
this.newField = 2;
}
then the field does not get optimized away and everything works as expected.
More explicitly, here is my Main class:
import java.io.*;
public class Main {
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream ("Data.txt");
ObjectInputStream ois = new ObjectInputStream (fis);
Data d = (Data)ois.readObject();
ois.close();
fis.close();
d.print();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
and here is the Data class:
import java.io.*;
public class Data implements Serializable {
private static final long serialVersionUID = 1;
private int oldField;
private final int newField = 2;
Data(int d) {
this.oldField= d;
}
public void print() { {
System.out.println(this.oldField + " " + this.newField);
}
}
So my Data.txt file has a serialized instance of the original class with just "oldField = 1". So the expected output of running this is
1 0
but instead I get
1 2
whereas if I move the initialization into the constructor the output is correct.
Is this expected behaviour? My feeling is that the compiler should not be optimizing away fields in serialized classes, precisely for this reason. Perhaps that is asking too much. Thanks!

This is by design. If a variable is declared as final, has a primitive type and an initializer which is a constant expressions, then it is a "constant variable". This alters the semantics of initialization among other things. Specifically, the expression value is evaluated at compile time, and then embedded into the code.
Refer to JLS 4.12.4 for details.
When you put the initialization into the constructor, the variable is no longer a "constant variable" and normal initialization occurs.
By the way, I would have thought that a final variable having a different value than what the initializer said was a bad thing, not a good thing. To me, that would be highly unintuitive. At any rate, relying on this kind of behaviour does not strike me as good practice.

It's not clear why you think this is a problem. The compiler is doing exactly what it should do. The class definition says that the final field's value is 2, and that is exactly what you're getting.
The underlying reason is that when the final variable's value is declared in the initializer, the compiler knows its value immediately, so it is able to substitute the actual value as a literal wherever you use the variable. When you initialized it in constructor code, it couldn't see the value so easily so it doesn't do that optimization. It is still free to do so if it can.
In the language of the JLS #4.12.4, this is a 'constant variable', and JLS #13.1 says 'References to fields that are constant variables (§4.12.4) are resolved at compile time to the constant value that is denoted', which is exactly what is happening here.
My feeling is that the compiler should not be optimizing away fields
in serialized classes
Why should the compiler have different rules for serialized classes?

The usage of final variable always provokes compiler optimization. Its doing what it supposed to do.

Related

In Java, is there a difference between initializing an object in constructor or in declaration? [duplicate]

Is there any advantage for either approach?
Example 1:
class A {
B b = new B();
}
Example 2:
class A {
B b;
A() {
b = new B();
}
}

There is no difference - the instance variable initialization is actually put in the constructor(s) by the compiler.
The first variant is more readable.
You can't have exception handling with the first variant.
There is additionally the initialization block, which is as well put in the constructor(s) by the compiler:
{
a = new A();
}
Check Sun's explanation and advice
From this tutorial:
Field declarations, however, are not part of any method, so they cannot be executed as statements are. Instead, the Java compiler generates instance-field initialization code automatically and puts it in the constructor or constructors for the class. The initialization code is inserted into a constructor in the order it appears in the source code, which means that a field initializer can use the initial values of fields declared before it.
Additionally, you might want to lazily initialize your field. In cases when initializing a field is an expensive operation, you may initialize it as soon as it is needed:
ExpensiveObject o;
public ExpensiveObject getExpensiveObject() {
if (o == null) {
o = new ExpensiveObject();
}
return o;
}
And ultimately (as pointed out by Bill), for the sake of dependency management, it is better to avoid using the new operator anywhere within your class. Instead, using Dependency Injection is preferable - i.e. letting someone else (another class/framework) instantiate and inject the dependencies in your class.

Another option would be to use Dependency Injection.
class A{
B b;
A(B b) {
this.b = b;
}
}
This removes the responsibility of creating the B object from the constructor of A. This will make your code more testable and easier to maintain in the long run. The idea is to reduce the coupling between the two classes A and B. A benefit that this gives you is that you can now pass any object that extends B (or implements B if it is an interface) to A's constructor and it will work. One disadvantage is that you give up encapsulation of the B object, so it is exposed to the caller of the A constructor. You'll have to consider if the benefits are worth this trade-off, but in many cases they are.

I got burned in an interesting way today:
class MyClass extends FooClass {
String a = null;
public MyClass() {
super(); // Superclass calls init();
}
#Override
protected void init() {
super.init();
if (something)
a = getStringYadaYada();
}
}
See the mistake? It turns out that the a = null initializer gets called after the superclass constructor is called. Since the superclass constructor calls init(), the initialization of a is followed by the a = null initialization.

my personal "rule" (hardly ever broken) is to:
declare all variables at the start of
a block
make all variables final unless they
cannot be
declare one variable per line
never initialize a variable where
declared
only initialize something in a
constructor when it needs data from
the constructor to do the
initialization
So I would have code like:
public class X
{
public static final int USED_AS_A_CASE_LABEL = 1; // only exception - the compiler makes me
private static final int A;
private final int b;
private int c;
static
{
A = 42;
}
{
b = 7;
}
public X(final int val)
{
c = val;
}
public void foo(final boolean f)
{
final int d;
final int e;
d = 7;
// I will eat my own eyes before using ?: - personal taste.
if(f)
{
e = 1;
}
else
{
e = 2;
}
}
}
This way I am always 100% certain where to look for variables declarations (at the start of a block), and their assignments (as soon as it makes sense after the declaration). This winds up potentially being more efficient as well since you never initialize a variable with a value that is not used (for example declare and init vars and then throw an exception before half of those vars needed to have a value). You also do not wind up doing pointless initialization (like int i = 0; and then later on, before "i" is used, do i = 5;.
I value consistency very much, so following this "rule" is something I do all the time, and it makes it much easier to work with the code since you don't have to hunt around to find things.
Your mileage may vary.

Example 2 is less flexible. If you add another constructor, you need to remember to instantiate the field in that constructor as well. Just instantiate the field directly, or introduce lazy loading somewhere in a getter.
If instantiation requires more than just a simple new, use an initializer block. This will be run regardless of the constructor used. E.g.
public class A {
private Properties properties;
{
try {
properties = new Properties();
properties.load(Thread.currentThread().getContextClassLoader().getResourceAsStream("file.properties"));
} catch (IOException e) {
throw new ConfigurationException("Failed to load properties file.", e); // It's a subclass of RuntimeException.
}
}
// ...
}

Using either dependency injection or lazy initialization is always preferable, as already explained thoroughly in other answers.
When you don't want or can't use those patterns, and for primitive data types, there are three compelling reasons that I can think of why it's preferable to initialize the class attributes outside the constructor:
avoided repetition = if you have more than one constructor, or when you will need to add more, you won't have to repeat the initialization over and over in all the constructors bodies;
improved readability = you can easily tell with a glance which variables will have to be initialized from outside the class;
reduced lines of code = for every initialization done at the declaration there will be a line less in the constructor.

I take it is almost just a matter of taste, as long as initialization is simple and doesn't need any logic.
The constructor approach is a bit more fragile if you don't use an initializer block, because if you later on add a second constructor and forget to initialize b there, you'll get a null b only when using that last constructor.
See http://java.sun.com/docs/books/tutorial/java/javaOO/initial.html for more details about initialization in Java (and for explanations on initalizer blocks and other not well known initialization features).

I've not seen the following in the replies:
A possible advantage of having the initialisation at the time of declaration might be with nowadays IDE's where you can very easily jump to the declaration of a variable (mostly
Ctrl-<hover_over_the_variable>-<left_mouse_click>) from anywhere in your code. You then immediately see the value of that variable. Otherwise, you have to "search" for the place where the initialisation is done (mostly: constructor).
This advantage is of course secondary to all other logical reasonings, but for some people that "feature" might be more important.

Both of the methods are acceptable. Note that in the latter case b=new B() may not get initialized if there is another constructor present. Think of initializer code outside constructor as a common constructor and the code is executed.

I think Example 2 is preferable. I think the best practice is to declare outside the constructor and initialize in the constructor.

The second is an example of lazy initialization. First one is more simple initialization, they are essentially same.

There is one more subtle reason to initialize outside the constructor that no one has mentioned before (very specific I must say). If you are using UML tools to generate class diagrams from the code (reverse engineering), most of the tools I believe will note the initialization of Example 1 and will transfer it to a diagram (if you prefer it to show the initial values, like I do). They will not take these initial values from Example 2. Again, this is a very specific reason - if you are working with UML tools, but once I learned that, I am trying to take all my default values outside of constructor unless, as was mentioned before, there is an issue of possible exception throwing or complicated logic.

The second option is preferable as allows to use different logic in ctors for class instantiation and use ctors chaining. E.g.
class A {
int b;
// secondary ctor
A(String b) {
this(Integer.valueOf(b));
}
// primary ctor
A(int b) {
this.b = b;
}
}
So the second options is more flexible.

It's quite different actually:
The declaration happens before construction. So say if one has initialized the variable (b in this case) at both the places, the constructor's initialization will replace the one done at the class level.
So declare variables at the class level, initialize them in the constructor.

class MyClass extends FooClass {
String a = null;
public MyClass() {
super(); // Superclass calls init();
}
#Override
protected void init() {
super.init();
if (something)
a = getStringYadaYada();
}
}
Regarding the above,
String a = null;
null init could be avoided since anyway it's the default.
However, if you were to need another default value,
then, because of the uncontrolled initialization order,
I would fix as follow:
class MyClass extends FooClass
{
String a;
{
if( a==null ) a="my custom default value";
}
...

Why the function "display()" is automatically placed in constructor

Here is the following code where I have two classes practically doing nothing. When the decompiled code of the "class TestBed" is checked "int val = tb.display()" gets placed in constructor automatically. How is this happening ?
class TestBed
{
int display()
{
return 100;
}
TestBed tb;
int val = tb.display(); /* will get placed in constructor
automatically. But how? */
}
public class DeleteThis {
public static void main(String[] args) {
System.out.println("printing");
}
}
After decompiling "TestBed.class" using decompiler following code appears
/* Following is decompiled code */
class TestBed
{
TestBed tb;
int val;
TestBed()
{
val = tb.display(); /* How int val = tb.display() gets placed
in constructor automatically */
}
int display()
{
return 100;
}
}

That's because of the format of the *.class file. When you compile a *.java file into a *.class file, all instance fields that are initialized in the way your field var are (this is: T myVar = myValue) are going to be initialized in the constructor's code.
A brief and incomplete description of the class format:
A class format is made up of different kind of "structures", two of them are the field structure (where your field var is registered) and the method structure (code lives only inside this kind of structure under the attribute "code").
The field structure doesn't have room for code (the code that would initialize you var: int val = tb.display();) so it needs to be put inside the "code" attribute of the method structure that corresponds to the constructor.
Have a look at the Java Virtual Machine Specification, Ch. 4 for more details (trying to explain everything here in few words would be too complicated)

Your field val has value equals tb.display. All fields initialize with default value (0 for int) and other predefined user value at constructor. So method tb.display is called in constructor.

Run time the above code gives Null Pointer Exception since the tb variable has not been initialized.
As the initialization of Class level variables has to happen on the class instantiation, the compiler places the code in the constructor.
Try defining an explicit constructor with some argument and also try one more approach by below change
static TestBed tb;
static int val = tb.display();

I don't think there is any particular reason for this behavior.
Typically, though, variables are instantiated inside of the constructor and the decompiler is probably just using this standard. The effect of both segments of code, in this case, should be exactly the same.
Regardless, val will be set to tb.display();.

The instance variable initialization is actually put in the constructor(s) by the compiler.
See here

Your variable TestBed tb is not initialized yet you are calling , int val = tb.display(); which will throw exception. The reason you are not getting exception is because you are not instantiating the class TestBed anywhere in the main() , the moment you change code to :-
public class DeleteThis {
public static void main(String[] args) {
TestBed tb=new TestBed();
System.out.println("printing");
}
You wil get nullpointer exception
And the reason why you get that code in decompiler is because of this, reading from a java tutorial (could not find the link for javadocs source :))
The Java compiler generates instance-field initialization code automatically and puts it in the constructor or constructors for the class. The initialization code is inserted into a constructor in the order it appears in the source code.
hence the compiler breaks int val= tb.display() into
int val;
TestBed(){
val=tb.display();
}

Access to final field via getter before its initializing

First of all, this question has forward relation to these nice questions:
1) Use of uninitialized final field - with/without 'this.' qualifier
2) Why isn't a qualified static final variable allowed in a static initialization block?
But I will ask it in slightly another angle of view. Just to remember: mentioned questions above were asked about access to final fields using keyword this in Java 7.
In my question there is something similar, but not the same. Well, consider the following code:
public class TestInit {
final int a;
{
// System.out.println(a); -- compile error
System.out.println(getA());
// a = a; -- compile error
a = getA();
System.out.println(getA());
}
private int getA() {
return a;
}
public static void main(String[] args) {
new TestInit();
}
}
And output is:
0
0
As you can see there are two unclear things here:
There is another legal way to access non-initialized final field: using its getter.
Should we consider that an assigment a = getA(); as legal for blank final field and that it will always assign to it its default value like for non-final field according to JLS? In other words, should it be considered as expected behaviour?

What you're really running into is the inference ability of the compiler.
For your first compiler error, that fails because the compiler definitely knows a is not assigned. Same with the second a=a (you can't assign from a because it's definitely not assigned at that point).
The line that works a=getA() is a first, single, definite assignment of a, so that's fine (regardless where the value comes from; the implementation of getA() doesn't matter and isn't assessed at this point).
The method getA() is valid and can return a because the instance initializer has definitely assigned a value to a.
It makes sense to a human looking at it, but it's not an inference built into the compiler. Each block, independently evaluated, is valid.

scopes of variables in Java

In Java, you don't have declare a method physically before you use it. The same thing doesn't apply for variables.
Why is this the case? Is it just for "legacy" reason (ie., the Java's creators didn't feel like doing it), or is it just not possible?
Eg.,
public class Test
{
// It is OK for meth1 to invoke meth2
public void meth1() { meth2(); }
public void meth2() { }
// But why is it NOT ok for field1 to reference field2
private int field1 = field2;
private int field2 = 3;
}
If I wanted my Java compiler to support this kind of forward-reference, what is the general idea as to how to do it?
I understand there'd be issue about circular depencies, which we'd need to be careful about. But other than that, I really don't see why it should not be possible.
[Edit]Ok, here's my initial thought as to how to do this.
While analysing the code, the compiler would build a graph of dependencies for the variables in the given scope. And if it sees a loop (ie., int a = b; int b = a), then it would throw an error. If there is no loops, then there must be some optimal way to re-arrange the statements (behind the scence) such that a field will only reference fields declared before it, and so it shouuld try to figure out the order. I haven't worked out the exact algorithm, but I think it is possible. Unless someone can scientifically prove me wrong.
Recap the question:
Say I'm trying to build my own dialect of Java, which supports this sort of scoping. My main question is, could you give me some ideas as to how to do this
Thanks

According to the JLS, Section 12.4.1, initialization of class variables proceeds from top to bottom, in "textual order":
The static initializers and class variable initializers are executed in textual order, and may not refer to class variables declared in the class whose declarations appear textually after the use, even though these class variables are in scope (§8.3.2.3). This restriction is designed to detect, at compile time, most circular or otherwise malformed initializations.
So, if you make your own compiler to recognize forward class variable declarations, then it is violating the Java Language Specification.

I will give you a simple piece of code:
public class Test
{
private int foo = bar;
private int bar = foo;
}
What would you expect this to do?
I presume that designers of Java has done it so because assignment of values to instance variables must be executed in some order. In the case of Java, they are executed downwards (from up to bottom).
EDIT
What about this?
public class Test {
private int foo = quu++;
private int bar = quu++;
private int quu = 1;
}
What values would foo and bar have? Which quu++ statement would be executed first?
My point is, Java designers must have thought that it is counter intuitive to do it as you did describe in your question, i.e. unordered execution with compile time code analysis.
FINAL EDIT
Let's complicate things:
class Test {
private James james = new James(anInt);
private Jesse jesse = new Jesse(anInt);
private IntWrapper anInt = new IntWrapper();
}
class James {
public James(IntWrapper anInt) {
if(--anInt.value != 0) {
new Jesse(anInt);
}
else {
anInt.isJames = true;
}
}
}
class Jesse {
public Jesse(IntWrapper anInt) {
if(--anInt.value != 0) {
new James(anInt);
}
else {
anInt.isJames = false;
}
}
}
class IntWrapper {
public int value = 99;
public boolean isJames;
}
I am not sure what it proves regarding your question because I am not sure about your point.
There is no circular dependency here but value of an instance variable of IntWrapper, isJames, depends on the execution order and it might be difficult to detect this kind of stuff with a lexical/semantic analyzer.

How will field1 know value of field2 before it has been defined and assigned a value?

It has to do with the order of initialization. Fields are initialized top to bottom. In your example, when field1 tries to reference field2 in the initializer, the latter is yet to be initialized itself.
The rule about forward references is aimed at catching the most obvious cases of this problem. It does not catch all; for example, you can still do:
private int field1 = getField2();
private int field2 = 3;
private int getField2() { return field2; }
and get field1 intialized to zero where you might be expecting 3.

Initialization of static final fields in Java

public class Main {
static final int alex=getc();
static final int alex1=Integer.parseInt("10");
static final int alex2=getc();
public static int getc(){
return alex1;
}
public static void main(String[] args) {
final Main m = new Main();
System.out.println(alex+" "+alex1 +" "+alex2);
}
}
Can someone tell me why this prints: 0 10 10? I understand that it's a static final variable and its value shouldn't change but it`s a little difficult to understand how the compiler initializes the fields.

It's an ordering problem. Static fields are initialized in the order that they are encountered, so when you call getc() to inititalize the alex variable, alex1 hasn't been set yet. You need to put initialization of alex1 first, then you'll get the expected result.

This situation is covered by JLS 8.3.2.3 "Restrictions on the use of Fields during Initialization".
The JLS rules allows the usage in your Question, and state that the first call to getc() will return default (uninitialized) value of alex.
However, the rules disallow some uses of uninitialized variables; e.g.
int i = j + 1;
int j = i + 1;
is disallowed.
Re some of the other answers. This is not a case where the Java compiler "can't figure it out". The compiler is strictly implementing what the Java Language Specification specifies. (Or to put it another way, a compiler could be written to detect the circularity in your example and call it a compilation error. However, if it did this, it would be rejecting valid Java programs, and therefore wouldn't be a conformant Java compiler.)
In a comment you state this:
... final fields always must be initialized at compile or at runtime before the object creation.
This is not correct.
There are actually two kinds of final fields:
A so-called "constant variable" is indeed evaluated at compile time. (A constant variable is a variable "of primitive type or type String, that is final and initialized with a compile-time constant expression" - see JLS 4.12.4.). Such a field will always have been initialized by the time you access it ... modulo certain complications that are not relevant here.
Other final fields are initialized in the order specified by the JLS, and it is possible to see the field's value before it has been initialized. The restriction on final variables is that they must be initialized once and only once during class initialization (for a static) or during object initialization.
Finally, this stuff is very much "corner case" behavior. A typical well-written class won't need to
access a final field before it has been initialized.

Static final fields whose values are not compile-time constant expressions are initialized in order of declaration. Thus when alex in being initialized, alex1 is not initialized yet, so that getc() returns default values of alex1 (0).
Note that result will be different (10 10 10) in the following case:
static final int alex1 = 10;
In this case alex1 is initialized by a compile-time constant expression, therefore it's initialized from the very beginning.

There is nothing special about static fields, it just that the compiler cannot workout that you are using a method which can access a field before its initialised.
e.g.
public class Main {
private final int a;
public Main() {
System.out.println("Before a=10, a="+getA());
this.a = 10;
System.out.println("After a=10, a="+getA());
}
public int getA() {
return a;
}
public static void main(String... args) {
new Main();
}
}
prints
Before a=10, a=0
After a=10, a=10

Class variables are not necessary to initialize, they are automatically set to their default values. If primitives (like int, short...) it's 0 (zero) for Objects it's null.
Therefore alex1 is set to 0.
Method variables must be initialized, otherwise you will get an compiliation error.
For a better explanation read http://download.oracle.com/javase/tutorial/java/javaOO/classvars.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

compiler optimizes away a final field added to a serialized class - java

The usage of final variable always provokes compiler optimization. Its doing what it supposed to do.

Related

In Java, is there a difference between initializing an object in constructor or in declaration? [duplicate]

Why the function "display()" is automatically placed in constructor

Access to final field via getter before its initializing

scopes of variables in Java

Initialization of static final fields in Java

Categories

Resources