Object creation (state initialisation) and thread safety

Object creation (state initialisation) and thread safety - java

I am look into the book "Java Concurrency in Practice" and found really hard to believe below quoted statement (But unfortunately it make sense).
http://www.informit.com/store/java-concurrency-in-practice-9780321349606
Just wanted to get clear about this 100%
public class Holder {
private int n;
public Holder(int n) { this.n = n; }
public void assertSanity() {
if (n != n)
throw new AssertionError("This statement is false.");
}
}
While it may seem that field values set in a constructor are the first
values written to those fields and therefore that there are no "older"
values to see as stale values, the Object constructor first
writes the default values to all fields before subclass
constructors run. It is therefore Possible to see the default value
for a field as a stale value
Regarding bolded statement in above,
I am aware that the behaviour BUT now it is clear that this calling hierarchy of constructors is NOT guarantee to be ATOMIC (calling super constructors in single synchronised block that is guarded by a lock), but what would be the solution? imagine a class hierarchy that has more than one level (even it is not recommended, lets assume as it is possible). The above code snippest is a kind of a prototype that we see everyday in most of the projects.

You misread the book. It explicitely says:
The problem here is not the Holder class itself, but that the Holder is not properly published.
So the above construct if fine. What's not fine is to improperly publish such an object to other threads. The book explains that in details.

When creating a new object things happen sequentially. I don't know the precise order, but it's something like: allocate the space and initialize it to zeroes, then set the fields that get constant values, then set the fields that get calculated values, then run the constructor code. And, of course, it's got to initialize the subclasses in there somewhere.
So if you try to work with an object that is still being constructed, you can see odd, invalid values in the fields. This doesn't usually happen, but ways to do it:
Reference a field that doesn't yet have a value during an assignment to another field.
Reference a value in the constructor that doesn't get assigned till later in the constructor.
Reference a field in an object in a field in an object that was just read from an ObjectInputStream. (OIS often takes a long time to put values in objects it's read.)
Before Java 5, something like:
public volatile MyClass myObject;
...
myObject = new MyClass( 10 );
could make trouble because another thread could grab the reference to myObject before the MyClass constructor was finished and it would see bad values (zero instead of 10, in this case) inside the object. With Java 5, the JVM is not allowed to make myObject non-null until the constructor is finished.
And today you can still set myObject to this within the constructor and accomplish the same thing.
If you're clever, you can also get hold of Class fields before they've been initialized.
In your code example, (n != n) would be true if something changed the value between the two reads of n. I guess the point is n starts as zero, get's set to something else by the constructor, and assertSanity is called during the construction. In this case, n is not volatile so I don't think the assert will ever be triggered. Make it volatile and it will happen once every million times or so if you time everything precisely right. In real life this kind of problem happens just often enough to wreak havoc but rarely enough that you can't reproduce it.

I guess theoretically it is possible. It is similar to double checked locking problem.
public class Test {
static Holder holder;
static void test() {
if (holder == null) {
holder = new Holder(1);
}
holder.assertSanity();
}
...
If test() is called by 2 threads, thread-2 might see the holder in a state when initialization is still in progress so n != n may happen to be true. Here is bytecode for n != n:
ALOAD 0
GETFIELD x/Holder.n : I
ALOAD 0
GETFIELD x/Holder.n : I
IF_ICMPEQ L1
as you can see JVM loads field n to operand stack twice. So it may happen that the first var gets value before init and the seccond after init

the comment:
the Object constructor first writes the default values to all fields
before subclass constructors run
seems wrong. My prior experience is that the default values for a class are set before its constructor is run. that is a super class will see its init-ed variables set before its constructor runs and does things. This was root of bug a friend looked at where a base class was calling a method during construction that the super class implemented and set a reference that was defined with initialization to null in the super class. the item would be there until entry to the constructor at which time the init set it to null value.
references to the object are not available to another thread (assuming none generated in the constructor) until it completes construction and object reference is returned.

Related

Partial constructed objects in the Java Memory Model

I came across the following code in an article somewhere on the Internet:
public class MyInt {
private int x;
public MyInt(int y) {
this.x = y;
}
public int getValue() {
return this.x;
}
}
The article states that
Constructors are not treated special by the compiler (JIT, CPU etc) so it is allowed to reorder instructions from the constructor and instructions that come after the constructor.
Also, this JSR-133 article about the Java Memory Model states that
A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object’s final fields.
The abovementioned MyInt instance seems immutable (except that the class is not marked final) and thread-safe, but the articles state it is not. They state that it's not guaranteed that x always has the correct value upon read.
But I thought that
only the thread that creates an object should have access to it while it is being constructed
and the Java Tutorials seem so support that.
My question is: does it mean that, with the current JMM, a thread can have access to a partially constructed object due to instruction reordering? And if yes, how? And does that mean that the statement from the Java Tutorials is simply not true?

That article is saying that if you have code like
foo = new MyInt(7);
in a class that has a field
MyInt foo;
then the instructions that amount to
(reference to new object).x = 7;
foo = (reference to new object);
could be swapped over as some kind of optimisation. This will never change the behaviour of the thread that's running this code, but it's possible that some other thread will be able to read foo after the line
foo = (reference to new object);
but before the line
(reference to new object).x = 7;
in which case it would see foo.x as 0, not 7. That is to say, that other thread could run
int bar = someObject.getFoo().getValue();
and end up with bar equal to 0.
I've never seen anything like this happen in the wild, but the author seems to know what he's talking about.

Instruction reordering alone can not lead to another thread seeing a partially constructed object. By definition, the JVM is only allowed to reorder things if they don't affect a correctly synchronized program's behaviour.
It's unsafe publishing of the object reference that enables bad things to happen. Here's a particularly poor attempt at a singleton for example:
public class BadSingleton {
public static BadSingleton theInstance;
private int foo;
public BadSingleton() {
this.foo = 42;
if (theInstance == null) {
theInstance = this;
}
}
}
Here you accidentally publish the reference to the object being constructed in a static field. This would not necessarily be a problem until the JVM decides to reorder things and places this.foo = 42 after the assignment to theInstance. So the two things together conspire to break your invariants and allow another thread to see a BadSingleton.theInstance with its foo field uninitialised.
Another frequent source of accidental publication is calling overrideable methods from the constructor. This does not always lead to accidental publication, but the potential is there, hence it should be avoided.
only the thread that creates an object should have access to it while it is being constructed
And does that mean that the statement from the Java Tutorials is
simply not true?
Yes and no. It depends on how we interpret the word should. There is no guarantee that in every possible case another thread won't see a partially constructed object. But it's true in the sense that you should write code that doesn't allow it to happen.

Java concurrency - inline initialized non-final field and safe publication

Assume I have the following class:
public abstract class Test {
private List<String> strings = new ArrayList<>();
// to be called concurrently
public int getStringsSize() {
return strings.size();
}
}
Is there a chance that getStringsSize() will throw NPE for some subclass of Test? Or what else can go wrong? As far as I understand there're no Java Memory Model guarantees about such case.

The point is: when a subclass object gets instantiated, the super constructor and super field init statements will be executed before any subclass code is executed. See here or there for more information. In other words: before a client that creates a subclass via new gets access to the newly created object, it is guaranteed that those init statements were executed.
And as we are talking about a single private field that only lives in the super class, there is no chance for that method call to produce an NPE
Unless you have some other method that resets the field to null. Which you can avoid by using the final keyword for that field (which is good practice anyway: make it your default to put final on fields).
Finally; for completeness: it is possible to write super/subclass code that fails when doing new on the subclass because of "something not initialized yet" - like when your super class constructor calls a method that is overridden in the subclass. But that is like: bad practice (as it can lead to such bizarre errors). So don't even think about doing that.

No. As long as the strings field is not reassigned (which can never be done by a subclass since it is private), then it will never be accessed in a null state, no matter how many threads try to access it. And as such, strings.size() can never throw a NullPointerException.

getStringSize() would only throw NPE if you set strings to null manually, EG in a setStrings() method. getStringSize() should also be set final.
The initializer runs at instantiation time, that is after the super(...) constructor, but before the first instruction of this(...) constructor.

Making a new Object in java [duplicate]

This question already has answers here:
Should I instantiate instance variables on declaration or in the constructor?
(15 answers)
Closed 9 years ago.
When I use Java based on my C++ knowledge, I love to initialize variable using the following way.
public class ME {
private int i;
public ME() {
this.i = 100;
}
}
After some time, I change the habit to
public class ME {
private int i = 100;
public ME() {
}
}
I came across others source code, some are using 1st convention, others are using 2nd convention.
May I know which convention do you all recommend, and why?

I find the second style (declaration + initialization in one go) superior. Reasons:
It makes it clear at a glance how the variable is initialized. Typically, when reading a program and coming across a variable, you'll first go to its declaration (often automatic in IDEs). With style 2, you see the default value right away. With style 1, you need to look at the constructor as well.
If you have more than one constructor, you don't have to repeat the initializations (and you cannot forget them).
Of course, if the initialization value is different in different constructors (or even calculated in the constructor), you must do it in the constructor.

I have the practice (habit) of almost always initializing in the contructor for two reasons, one in my opinion it adds to readablitiy (cleaner), and two there is more logic control in the constructor than in one line. Even if initially the instance variable doesn't require logic, having it in the constructor gives more flexibility to add logic in the future if needed.
As to the concern mentioned above about multiple constructors, that's easily solved by having one no-arg constructor that initializes all the instance variables that are initilized the same for all constructors and then each constructor calls this() at the first line. That solves your reduncancy issues.

I tend to use the second one to avoid a complicated constructor (or a useless one), also I don't really consider this as an initialization (even if it is an initialization), but more like giving a default value.
For example in your second snippet, you can remove the constructor and have a clearer code.

If you initialize in the top or in constructor it doesn't make much difference .But in some case initializing in constructor makes sense.
class String
{
char[] arr/*=char [20]*/; //Here initializing char[] over here will not make sense.
String()
{
this.arr=new char[0];
}
String(char[] arr)
{
this.arr=arr;
}
}
So depending on the situation sometime you will have to initialize in the top and sometimes in a constructor.
FYI other option's for initialization without using a constructor :
class Foo
{
int i;
static int k;
//instance initializer block
{
//run's every time a new object is created
i=20;
}
//static initializer block
static{
//run's only one time when the class is loaded
k=18;
}
}

The only problem I see with the first method is if you are planning to add more constructors. Then you will be repeating code and maintainability would suffer.

I recommend initializing variables in constructors. That's why they exist: to ensure your objects are constructed (initialized) properly.
Either way will work, and it's a matter of style, but I prefer constructors for member initialization.

Both the options can be correct depending on your situation.
A very simple example would be: If you have multiple constructors all of which initialize the variable the same way(int x=2 for each one of them). It makes sense to initialize the variable at declaration to avoid redundancy.
It also makes sense to consider final variables in such a situation. If you know what value a final variable will have at declaration, it makes sense to initialize it outside the constructors. However, if you want the users of your class to initialize the final variable through a constructor, delay the initialization until the constructor.

One thing, regardless of how you initialize the field, use of the final qualifier, if possible, will ensure the visibility of the field's value in a multi-threaded environment.

I think both are correct programming wise,
But i think your first option is more correct in an object oriented way, because in the constructor is when the object is created, and it is when the variable should initialized.
I think it is the "by the book" convention, but it is open for discussion.
Wikipedia

It can depend on what your are initialising, for example you cannot just use field initialisation if a checked exception is involved. For example, the following:
public class Foo {
FileInputStream fis = new FileInputStream("/tmp"); // throws FileNotFoundException
}
Will cause a compile-time error unless you also include a constructor declaring that checked exception, or extend a superclass which does, e.g.:
public Foo() throws FileNotFoundException {}

I would say, it depends on the default. For example
public Bar
{
ArrayList<Foo> foos;
}
I would make a new ArrayList outside of the constructor, if I always assume foos can not be null. If Bar is a valid object, not caring if foos is null or not, I would put it in the constructor.
You might disagree and say that it's the constructors job to put the object in a valid state. However, if clearly all the constructors should do exactly the same thing(initialize foos), why duplicate that code?

Shared instance variable vs local variable

Is there a reason to prefer using shared instance variable in class vs. local variable and have methods return the instance to it? Or is either one a bad practice?
import package.AClass;
public class foo {
private AClass aVar = new AClass();
// ... Constructor
public AClass returnAClassSetted() {
doStuff(aVar);
return avar;
}
private void doStuff(AClass a) {
aVar = a.setSomething("");
}
}
vs.
import package.AClass;
public class foo {
// ... Constructor
public AClass returnAClassSetted() {
AClass aVar = new AClass();
aVar = doStuff();
return aVar;
}
private AClass doStuff() {
AClass aVar1 = new AClass();
aVar1.setSomething("");
return aVar1;
}
}
First one makes more sense to me in so many ways but I often see code that does the second. Thanks!

Instance variables are shared by all methods in the class. When one method changes the data, another method can be affected by it. It means that you can't understand any one method on its own since it is affected by the code in the other methods in the class. The order in which methods are called can affect the outcome. The methods may not be reentrant. That means that if the method is called, again, before it finishes its execution (say it calls a method that then calls it, or fires an event which then a listener calls the method) then it may fail or behave incorrectly since the data is shared. If that wasn't enough potential problems, when you have multithreading, the data could be changed while you are using it causing inconsistent and hard to reproduce bugs (race conditions).
Using local variables keeps the scope minimized to the smallest amount of code that needs it. This makes it easier to understand, and to debug. It avoids race conditions. It is easier to ensure the method is reentrant. It is a good practice to minimize the scope of data.

Your class name should have been Foo.
The two versions you have are not the same, and it should depend on your use case.
The first version returns the same AClass object when different callers call returnAClassSetted() method using the same Foo object. If one of them changes the state of the returned AClass object, all of them will get see the change. Your Foo class is effectively a Singleton.
The second version returns a new AClass object every time a caller calls returnAClassSetted() method using either the same or different Foo object. Your Foo class is effectively a Builder.
Also, if you want the second version, remove the AClass aVar = new AClass(); and just use AClass aVar = doStuff();. Because you are throwing away the first AClass object created by new AClass();

It's not a yes/no question. It basically depends on the situation and your needs. Declaring the variable in the smallest scope as possible is considered the best practice. However there may be some cases (like in this one) where, depending on the task, it's better to declare it inside/outside the methods. If you declare them outside it will be one instance, and it will be two on the other hand.

Instance properties represent the state of a specific instance of that Class. It might make more sense to think about a concrete example. If the class is Engine, one of the properties that might represent the state of the Engine might be
private boolean running;
... so given an instance of Engine, you could call engine.isRunning() to check the state.
If a given property is not part of the state (or composition) of your Class, then it might be best suited to be a local variable within a method, as implementation detail.

In Instance variables values given are default values means null so if it's an object reference, 0 if it's and int.
Local variables usually don't get default values, and therefore need to be explicitly initialized and the compiler generates an error if you fail to do so.
Further,
Local variables are only visible in the method or block in which they are declared whereas the instance variable can be seen by all methods in the class.

Uninitialized variables and members in Java

Consider this:
public class TestClass {
private String a;
private String b;
public TestClass()
{
a = "initialized";
}
public void doSomething()
{
String c;
a.notify(); // This is fine
b.notify(); // This is fine - but will end in an exception
c.notify(); // "Local variable c may not have been initialised"
}
}
I don't get it. "b" is never initialized but will give the same run-time error as "c", which is a compile-time error. Why the difference between local variables and members?
Edit: making the members private was my initial intention, and the question still stands...

The language defines it this way.
Instance variables of object type default to being initialized to null.
Local variables of object type are not initialized by default and it's a compile time error to access an undefined variable.
See section 4.12.5 for SE7 (same section still as of SE14)
http://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.12.5

Here's the deal. When you call
TestClass tc = new TestClass();
the new command performs four important tasks:
Allocates memory on the heap for the new object.
Initiates the class fields to their default values (numerics to 0, boolean to false, objects to null).
Calls the constructor (which may re-initiate the fields, or may not).
Returns a reference to the new object.
So your fields 'a' and 'b' are both initiated to null, and 'a' is re-initiated in the constructor. This process is not relevant for method calling, so local variable 'c' is never initialized.
For the gravely insomniac, read this.

The rules for definite assignment are quite difficult (read chapter 16 of JLS 3rd Ed). It's not practical to enforce definite assignment on fields. As it stands, it's even possible to observe final fields before they are initialised.

The compiler can figure out that c will never be set. The b variable could be set by someone else after the constructor is called, but before doSomething(). Make b private and the compiler may be able to help.

The compiler can tell from the code for doSomething() that c is declared there and never initialized. Because it is local, there is no possibility that it is initialized elsewhere.
It can't tell when or where you are going to call doSomething(). b is a public member. It is entirely possible that you would initialize it in other code before calling the method.

Member-variables are initialized to null or to their default primitive values, if they are primitives.
Local variables are UNDEFINED and are not initialized and you are responsible for setting the initial value. The compiler prevents you from using them.
Therefore, b is initialized when the class TestClass is instantiated while c is undefined.
Note: null is different from undefined.

You've actually identified one of the bigger holes in Java's system of generally attempting to find errors at edit/compile time rather than run time because--as the accepted answer said--it's difficult to tell if b is initialized or not.
There are a few patterns to work around this flaw. First is "Final by default". If your members were final, you would have to fill them in with the constructor--and it would use path-analysis to ensure that every possible path fills in the finals (You could still assign it "Null" which would defeat the purpose but at least you would be forced to recognize that you were doing it intentionally).
A second approach is strict null checking. You can turn it on in eclipse settings either by project or in default properties. I believe it would force you to null-check your b.notify() before you call it. This can quickly get out of hand so it tends to go with a set of annotations to make things simpler:
The annotations might have different names but in concept once you turn on strict null checking and the annotations the types of variables are "nullable" and "NotNull". If you try to place a Nullable into a not-null variable you must check it for null first. Parameters and return types are also annotated so you don't have to check for null every single time you assign to a not-null variable.
There is also a "NotNullByDefault" package level annotation that will make the editor assume that no variable can ever have a null value unless you tag it Nullable.
These annotations mostly apply at the editor level--You can turn them on within eclipse and probably other editors--which is why they aren't necessarily standardized. (At least last time I check, Java 8 might have some annotations I haven't found yet)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.