What properties are guaranteed by constructors in Java? - java

I used to think that, intuitively speaking, a constructor in Java is the thing that makes an object, and that nothing can touch that object until its constructor returns. However, I have been proven wrong about this over and over again:
uninitialized objects can be leaked by sharing this
uninitialized objects can be leaked by a subclass accessing it from the finalizer
uninitialized objects can be leaked to another thread before they're fully constructed
All of these facts violate my intuition of what I thought a constructor is.
I can no longer with confidence say what a constructor actually does in Java, or what it's meant to be used for. If I'm making a simple DTO with all final fields, then I can understand what the use of the constructor is, because this is exactly the same as a struct in C except it can't be modified. Other than that, I have no clue what constructors can be reliably used for in Java. Are they just a convention/syntactic sugar? (i.e If there were only factories that initialize objects for you, you would only have X x = new X(), then modify each field in x to make them have non default values - given the 3 facts above, this would be almost equivalent to how Java actually is)
I can name two properties that are actually guaranteed by constructors: If I do X x = new X(), then I know that x is an instance of X but not a subclass of X, and its final fields are fully initialized. You might be tempted to say that you know that constructor of X finished and you have a valid object, but this is untrue if you pass X to another thread - the other thread may see the uninitialized version (i.e what you just said is no different than the guarantees of calling a factory). What other properties do constructors actually guarantee?

All of these facts violate my intuition of what I thought a constructor is.
They shouldn't. A constructor does exactly what you think it does.
1: uninitialized objects can be leaked by sharing this
3: uninitialized objects can be leaked to another thread before they're fully constructed
The problem with the leaking of this, starting threads in the constructor, and storing a newly constructed object where multiple threads access it without synchronization are all problems around the reordering of the initialization of non-final (and non-volatile) fields. But the initialization code is still done by the constructor. The thread that constructed the object sees the object fully. This is about when those changes are visible in other threads which is not guaranteed by the language definition.
You might be tempted to say that you know that constructor of X finished and you have a valid object, but this is untrue if you pass X to another thread - the other thread may see the uninitialized version (i.e what you just said is no different than the guarantees of calling a factory).
This is correct. It is also correct that if you have an unsynchronized object and you mutate it in one thread, other threads may or may not see the mutation. That's the nature of threaded programming. Even constructors are not safe from the need to synchronize objects properly.
2: uninitialized objects can be leaked by a subclass accessing it from the finalizer
This document is talking about finalizers and improperly being able to access an object after it has been garbage collected. By hacking subclasses and finalizers you can generate an object that is not properly constructed but it is a major hack to do so. For me this does not somehow challenge what a constructor does. Instead it demonstrates the complexity of the modern, mature, JVM. The document also shows how you can write your code to work around this hack.
What properties are guaranteed by constructors in Java?
According to the definition, a constructor:
Allocates space for the object.
Sets all the instance variables in the object to their default values. This includes the instance variables in the object's superclasses.
Assigns the parameter variables for the object.
Processes any explicit or implicit constructor invocation (a call to this() or super() in the constructor).
Initializes variables in the class.
Executes the rest of the constructor.
In terms of your 3 issues, #1 and #3 are, again, about when the initialization of non-final and non-volatile fields are seen by threads other than the one that constructed the object. This visibility without synchronization is not guaranteed.
The #2 issue shows a mechanism where if an exception is thrown while executing the constructor, you can override the finalize method to obtain and improperly constructed object. Constructor points 1-5 have occurred. With the hack you can bypass a portion of 6. I guess it is in the eye of the beholder if this challenges the identity of the constructor.

From the JLS section 12.5:
12.5. Creation of New Class Instances
Just before a reference to the newly created object is returned as the
result, the indicated constructor is processed to initialize the new
object using the following procedure:
Assign the arguments for the constructor to newly created parameter variables for this constructor invocation.
If this constructor begins with an explicit constructor invocation (§8.8.7.1) of another constructor in the same class (using this), then
evaluate the arguments and process that constructor invocation
recursively using these same five steps. If that constructor
invocation completes abruptly, then this procedure completes abruptly
for the same reason; otherwise, continue with step 5.
This constructor does not begin with an explicit constructor invocation of another constructor in the same class (using this). If
this constructor is for a class other than Object, then this
constructor will begin with an explicit or implicit invocation of a
superclass constructor (using super). Evaluate the arguments and
process that superclass constructor invocation recursively using these
same five steps. If that constructor invocation completes abruptly,
then this procedure completes abruptly for the same reason. Otherwise,
continue with step 4.
Execute the instance initializers and instance variable initializers for this class, assigning the values of instance variable
initializers to the corresponding instance variables, in the
left-to-right order in which they appear textually in the source code
for the class. If execution of any of these initializers results in an
exception, then no further initializers are processed and this
procedure completes abruptly with that same exception. Otherwise,
continue with step 5.
Execute the rest of the body of this constructor. If that execution completes abruptly, then this procedure completes abruptly
for the same reason. Otherwise, this procedure completes normally.
**
Unlike C++, the Java programming language does not specify altered rules for method >dispatch during the creation of a new class instance. If methods are invoked that are >overridden in subclasses in the object being initialized, then these overriding methods >are used, even before the new object is completely initialized.
And from JLS 16.9:
Note that there are no rules that would allow us to conclude that V is
definitely unassigned before an instance variable initializer. We can
informally conclude that V is not definitely unassigned before any
instance variable initializer of C, but there is no need for such a
rule to be stated explicitly.
Happens before 17.4.5:
Threading 17.5.2:
A read of a final field of an object within the thread that constructs
that object is ordered with respect to the initialization of that
field within the constructor by the usual happens-before rules. If the
read occurs after the field is set in the constructor, it sees the
value the final field is assigned, otherwise it sees the default
value.

A class contains constructors that are invoked to create objects from the class blueprint.
This is what Oracle says about constructors.
Now to your point.
intuitively speaking, a constructor in Java is the thing that makes an object, and that nothing can touch that object until its constructor returns.
So according to the official documentation, your assumption is not right. And the point 1 and 2 are the abuse of the rules and behaviors of Java, unless you consciously want to leak your objects! As also being irrelevant to Constructor, I will skip discussing these points.
Now if we talk about your 3rd point, in multi-threaded environment there is nothing that can guarantee you about the consistency of your code, unless "properly synchronized blocks" or "the atomic instructions". As object creation is not a synchronized nor an atomic instruction, there is no guarantee of being consistent! There is nothing the Constructor can do with it. In other words its not the responsibility of the Constructor to make your object creation atomic.
Now, the answer to your question, "What other properties do constructors actually guarantee?" is somewhat easy. Constructors are merely nothing but special type of methods, that are invoked during object creation from the blue print of the class. So it can guarantee nothing, unless you give it a chance to be executed consistently like any other methods. It after being consistently executed it can guarantee you that, your object is created and initialized as you wanted and instructed in it.

constructors is java are just used to initialize the state of the object created..nothing more.

Related

What does this.getClass() return during construction?

What does Java this.getClass() return during object construction, e.g., when called from a constructor or field initializer expression? Is it safe to expect the most derived class to be returned or not?
This is one of several places where you can observe the difference between how constructing an object works in the Java language versus in the Java virtual machine.
In JVM bytecode, constructing an object of a given type is a single instruction and thus atomic. At no time is the object itself an object of a different type, like a supertype. However, this object will not yet have been "constructed". At the bytecode level, the constructor is a special method named <init>, which can set fields and otherwise manipulate the object which already exists.
The compiler assembles any actual constructor, plus the field initializers and instance initializer blocks into the <init> method. The first operation of <init> is normally to call the corresponding <init> method declared on the supertype. So the initializers get run in order from supertypes to subtypes. However, when the supertype <init> method is executing, the object itself is already of the type it will be once complete; therefore getClass() will return that type, and the value returned by getClass() will not change depending on where it is called.
From the standpoint of the Java language, this is (mostly) a moot point, since if any of the <init> methods involved throws an exception, the object is normally immediately eligible for garbage collection. However, there are a few cases where you can cause a partially-constructed object to become visible to another thread, in which case you can observe objects with inconsistent states. You can also intentionally create an object without running its <init> by using the (intentionally undocumented) methods of sun.misc.Unsafe on a JVM that exposes that class.

creating a variable in function - multithreading environment

could you please help me?
I have a function 'f' in Java. The function works in multithreading environment
f() {
SomeObject someO = new SomeObject();
function1(someO);
.
.
.
function7(someO);
}
The problem: first thread enters into the function 'f' and creates new instance of SomeObject then calls function1, function2 ect. An instant later second thread enters into the method and creates new instnce of SomeObject when the first thread is in function4. The question is which instance of SomeObject will be processing by first thread in the rest of function 5, 6, 7 ?
The first thread has its instance of SomeObject (aka some0) as a local variable, therefore it is local to that stackframe and hence to that thread. It will only be able to reference that instance of some0.
Any other thread calling f() will create a different instance, also named some0, and only be able to reference that copy of it.
These are the rules for a local variable. If some0 were an instance variable -- i.e., declared outside f() -- then that variable could be referenced by different threads if they called f() on the same instance of whatever class holds the definition of f().
Those are the rules -- here's a more complete explanation.
Local variables are declared on the stack; that means that, for any variables declared within a method, there is space for their references in a 'stack frame' for that invocation of that method. Each time a method is invoked, there is space on the stack allocated for all the local variables in the method, and therefore their references are separate from any other invocation of that method. So if a different thread invokes the method, it gets a different stack frame for the local variables.
The same thing happens in a recursive procedure, i.e., if f() were to call itself. The local variable references would still be separate for each invocation of f(), i.e., each recursive call would have its own copy of them. Otherwise it would be very difficult to use recursion at all.
First thread will process first instance of SomeObject.
Every Thread creates its own stack. and whatever methods it is calling and whatever local objects it is creating, will live in that stack and these objects are not affected by other thread. So in your case thread 1 one processing is not going to affect by thread 2 processing.

Java Memory Model: Is it safe to create a cyclical reference graph of final instance fields, all assigned within the same thread?

Can somebody who understand the Java Memory Model better than me confirm my understanding that the following code is correctly synchronized?
class Foo {
private final Bar bar;
Foo() {
this.bar = new Bar(this);
}
}
class Bar {
private final Foo foo;
Bar(Foo foo) {
this.foo = foo;
}
}
I understand that this code is correct but I haven't worked through the whole happens-before math. I did find two informal quotations that suggest this is lawful, though I'm a bit wary of completely relying on them:
The usage model for final fields is a simple one: Set the final fields for an object in that object's constructor; and do not write a reference to the object being constructed in a place where another thread can see it before the object's constructor is finished. If this is followed, then when the object is seen by another thread, that thread will always see the correctly constructed version of that object's final fields. It will also see versions of any object or array referenced by those final fields that are at least as up-to-date as the final fields are. [The Java® Language Specification: Java SE 7 Edition, section 17.5]
Another reference:
What does it mean for an object to be properly constructed? It simply means that no reference to the object being constructed is allowed to "escape" during construction. (See Safe Construction Techniques for examples.) In other words, do not place a reference to the object being constructed anywhere where another thread might be able to see it; do not assign it to a static field, do not register it as a listener with any other object, and so on. These tasks should be done after the constructor completes, not in the constructor. [JSR 133 (Java Memory Model) FAQ, "How do final fields work under the new JMM?"]
Yes, it is safe. Your code does not introduce a data race. Hence, it is synchronized correctly. All objects of both classes will always be visible in their fully initialized state to any thread that is accessing the objects.
For your example, this is quite straight-forward to derive formally:
For the thread that is constructing the threads, all observed field values need to be consistent with program order. For this intra-thread consistency, when constructing Bar, the handed Foo value is observed correctly and never null. (This might seem trivial but a memory model also regulates "single threaded" memory orderings.)
For any thread that is getting hold of a Foo instance, its referenced Bar value can only be read via the final field. This introduces a dereference ordering between reading of the address of the Foo object and the dereferencing of the object's field pointing to the Bar instance.
If another thread is therefore capable of observing the Foo instance altogether (in formal terms, there exists a memory chain), this thread is guaranteed to observe this Foo fully constructed, meaning that its Bar field contains a fully initialized value.
Note that it does not even matter that the Bar instance's field is itself final if the instance can only be read via Foo. Adding the modifier does not hurt and better documents the intentions, so you should add it. But, memory-model-wise, you would be okay even without it.
Note that the JSR-133 cookbook that you quoted is only describing an implementation of the memory model rather than then memory model itself. In many points, it is too strict. One day, the OpenJDK might no longer align with this implementation and rather implement a less strict model that still fulfills the formal requirements. Never code against an implementation, always code against the specification! For example, do not rely on a memory barrier being placed after the constructor, which is how HotSpot more or less implements it. These things are not guaranteed to stay and might even differ for different hardware architectures.
The quoted rule that you should never let a this reference escape from a constructor is also too narrow a view on the problem. You should not let it escape to another thread. If you would, for example, hand it to a virtually dispatched method, you could not longer control where the instance would end up. This is therefore a very bad practice! However, constructors are not dispatched virtually and you can safely create circular references in the manner you depicted. (I assume that you are in control of Bar and its future changes. In a shared code base, you should document tightly that the constructor of Bar must not let the reference slip out.)
Immutable Objects (with only final fields) are only "threadsafe" after they are properly constructed, meaning their constructor has completed. (The VM probably accomplishes this by a memory barrier after the constructor of such objects)
Lets see how to make your example surely unsafe:
If the Bar-Constructor would store a this-reference where another thread could see it, this would be unsafe because Bar isnt constructed yet.
If the Bar-Constructor would store a foo-reference where another thread could see it, this would be unsafe because foo isnt constructed yet.
If the Bar-Constructor would read some foo-fields, then (depending on the order of initialization inside the Foo-constructor) these fields would always be uninitialized. Thats not a threadsafety-problem, just an effect of the order of initialization. (Calling a virtual method inside a constructor has the same issues)
References to immutable Objects (only final fields) which are created by a new-expression are always safe to access (no uninitialized fields visible). But the Objects referenced in these final fields may show uninitialized values if these references were obtained by a constructor giving away its this-reference.
As Assylias already wrote: Because in your example the constructors stored no references to where another thread could see them, your example is "threadsafe". The created Foo-Object can safely be given other threads.

Difference between initialization at declaration and initialization in constructor [duplicate]

This question already has answers here:
Initialize class fields in constructor or at declaration?
(16 answers)
Closed 10 years ago.
What is the difference between the following two, and which is more preferable??
public class foo {
int i = 2;
}
public class foo {
int i;
foo() {
i = 2;
}
}
In your example, there is no difference in behavioural semantics. In Java, all instance field initializers (and instance blocks) are executed after superclass initialization, and before the body of the constructor; see JLS 12.5.
The difference lies in code readability and (in other examples) avoiding repetitious coding and fragility1. These need to be assessed on a case-by-case basis.
It is also worth noting that there are some cases where you have to initialize in the constructor; i.e. when the initialization depends on a constructor parameter.
1 - The repetitiousness and fragility issues are flip-sides of the same thing. If you have multiple constructors, the "initialize in constructor" approach tends to lead to repetition. And if you add extra fields, you might to add the initialization to all relevant constructors; i.e. fragility.
If you have two or more constructors and intialization value differs in each of them, then you should use constructor initialization as there is no way to do the same with member initialization...
however if you have just one constructor...you can use member initialization for better code clarity..
In particular this case there is no difference in these two variants. First variant is more preferable, because initializations of fields inside constructor, as usual, use external values from constructor arguments.
First of all I think the second example should look like this:
public class foo{
int i;
foo(){
i = 0;
}
}
Otherwise i is just a local variable in the C'tor scope.
Second, the first example shows initialization which is called before the class C'tor is invoked. this is good if you want this to happen no matter what C'tor is used.
It also enables you to declare i as readonly.
In your first example, i is an instance variable of class foo (better name would be Foo). It's initialised at class loading.
In your second example, i is also an instance varaible but in this case initialised in the foo() constructor.
There is no real difference here, and especially with primitives.
However, in a multi-threaded environment, if you do intend to initialise your ivars in your constructor, and those ivars are non-primitive, you need to avoid the risk of exposing a partially constructed object. The reason for this is that constructors aren't synchronised and can't have the synchronised keyword applied but then two threads can't be constructing the same object.
So, to avoid this, you should never expose this in your constructor. One way of doing so is to call non-final methods. Doing so, say calling an abstract method, allows some unknown code to do something with your unfinished object. Obviously, this can't be done if you initialise in your declaration.
p.s. I thought there was something on this in Effective Java but couldn't find anything.

Uninitialized variables and members in Java

Consider this:
public class TestClass {
private String a;
private String b;
public TestClass()
{
a = "initialized";
}
public void doSomething()
{
String c;
a.notify(); // This is fine
b.notify(); // This is fine - but will end in an exception
c.notify(); // "Local variable c may not have been initialised"
}
}
I don't get it. "b" is never initialized but will give the same run-time error as "c", which is a compile-time error. Why the difference between local variables and members?
Edit: making the members private was my initial intention, and the question still stands...
The language defines it this way.
Instance variables of object type default to being initialized to null.
Local variables of object type are not initialized by default and it's a compile time error to access an undefined variable.
See section 4.12.5 for SE7 (same section still as of SE14)
http://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.12.5
Here's the deal. When you call
TestClass tc = new TestClass();
the new command performs four important tasks:
Allocates memory on the heap for the new object.
Initiates the class fields to their default values (numerics to 0, boolean to false, objects to null).
Calls the constructor (which may re-initiate the fields, or may not).
Returns a reference to the new object.
So your fields 'a' and 'b' are both initiated to null, and 'a' is re-initiated in the constructor. This process is not relevant for method calling, so local variable 'c' is never initialized.
For the gravely insomniac, read this.
The rules for definite assignment are quite difficult (read chapter 16 of JLS 3rd Ed). It's not practical to enforce definite assignment on fields. As it stands, it's even possible to observe final fields before they are initialised.
The compiler can figure out that c will never be set. The b variable could be set by someone else after the constructor is called, but before doSomething(). Make b private and the compiler may be able to help.
The compiler can tell from the code for doSomething() that c is declared there and never initialized. Because it is local, there is no possibility that it is initialized elsewhere.
It can't tell when or where you are going to call doSomething(). b is a public member. It is entirely possible that you would initialize it in other code before calling the method.
Member-variables are initialized to null or to their default primitive values, if they are primitives.
Local variables are UNDEFINED and are not initialized and you are responsible for setting the initial value. The compiler prevents you from using them.
Therefore, b is initialized when the class TestClass is instantiated while c is undefined.
Note: null is different from undefined.
You've actually identified one of the bigger holes in Java's system of generally attempting to find errors at edit/compile time rather than run time because--as the accepted answer said--it's difficult to tell if b is initialized or not.
There are a few patterns to work around this flaw. First is "Final by default". If your members were final, you would have to fill them in with the constructor--and it would use path-analysis to ensure that every possible path fills in the finals (You could still assign it "Null" which would defeat the purpose but at least you would be forced to recognize that you were doing it intentionally).
A second approach is strict null checking. You can turn it on in eclipse settings either by project or in default properties. I believe it would force you to null-check your b.notify() before you call it. This can quickly get out of hand so it tends to go with a set of annotations to make things simpler:
The annotations might have different names but in concept once you turn on strict null checking and the annotations the types of variables are "nullable" and "NotNull". If you try to place a Nullable into a not-null variable you must check it for null first. Parameters and return types are also annotated so you don't have to check for null every single time you assign to a not-null variable.
There is also a "NotNullByDefault" package level annotation that will make the editor assume that no variable can ever have a null value unless you tag it Nullable.
These annotations mostly apply at the editor level--You can turn them on within eclipse and probably other editors--which is why they aren't necessarily standardized. (At least last time I check, Java 8 might have some annotations I haven't found yet)

Categories