What does this.getClass() return during construction? - java

What does Java this.getClass() return during object construction, e.g., when called from a constructor or field initializer expression? Is it safe to expect the most derived class to be returned or not?

This is one of several places where you can observe the difference between how constructing an object works in the Java language versus in the Java virtual machine.
In JVM bytecode, constructing an object of a given type is a single instruction and thus atomic. At no time is the object itself an object of a different type, like a supertype. However, this object will not yet have been "constructed". At the bytecode level, the constructor is a special method named <init>, which can set fields and otherwise manipulate the object which already exists.
The compiler assembles any actual constructor, plus the field initializers and instance initializer blocks into the <init> method. The first operation of <init> is normally to call the corresponding <init> method declared on the supertype. So the initializers get run in order from supertypes to subtypes. However, when the supertype <init> method is executing, the object itself is already of the type it will be once complete; therefore getClass() will return that type, and the value returned by getClass() will not change depending on where it is called.
From the standpoint of the Java language, this is (mostly) a moot point, since if any of the <init> methods involved throws an exception, the object is normally immediately eligible for garbage collection. However, there are a few cases where you can cause a partially-constructed object to become visible to another thread, in which case you can observe objects with inconsistent states. You can also intentionally create an object without running its <init> by using the (intentionally undocumented) methods of sun.misc.Unsafe on a JVM that exposes that class.

Related

Is an object created when you pass a method reference in a method

As far as I know, when you define a method in a function, an object is instantiated:
myList.stream().map(x->x.getName().replaceAll('a','b')).toList();
Or the equivalent
Function<MyObject,String> myFunc = x -> {return x.getName().replaceAll('a','b');}
myList.stream().map(myFunc).toList();
x->x.getName().replaceAll('a','b') is created as a functional interface object (and requires memory allocation, a new somewhere/somehow, right?).
However, if I pass an already existing method as a parameter, is anything created?
class A{
public list<String> transform(List<String> input){
return input.stream().filter(this::myFilter).filter(A::staticFilter).toList();
}
public boolean myFilter(String s){ // Whatever }
public static boolean staticFilter(String s) { // whatever }
}
What happens here:
Is myFilter "wrapped" in a functional interface? (is it the same for a static method reference?)
Is there something specific that happens at bytecode level which is not "clear" on language level (like method pointer or something?).
From JavaDoc Api
Note that instances of functional interfaces can be created with
lambda expressions, method references, or constructor references.
As to if the lambda expression will create an instance in heap or not, you can follow this thread where the top comment from #Brian Goetz might be helpful.
About lambda expressions:
Also as indicated here in Java Specifications for Run-Time Evaluation of Lambda Expressions
These rules are meant to offer flexibility to implementations of the
Java programming language, in that:
A new object need not be allocated on every evaluation.
Objects produced by different lambda expressions need not belong to
different classes (if the bodies are identical, for example).
Every object produced by evaluation need not belong to the same class
(captured local variables might be inlined, for example).
If an "existing instance" is available, it need not have been created
at a previous lambda evaluation (it might have been allocated during
the enclosing class's initialization, for example).
So to your question:
x->x.getName().replaceAll('a','b') is created as a functional
interface object (and requires memory allocation, a new
somewhere/somehow, right?).
The answer is some times yes, some times no. Not always the same case.
About method reference expressions:
Evaluation of a method reference expression produces an instance of a
functional interface type (§9.8). Method reference evaluation does not
cause the execution of the corresponding method; instead, this may
occur at a later time when an appropriate method of the functional
interface is invoked.
Based on what is written here for Run-Time Evaluation of Method References
The timing of method reference expression evaluation is more complex
than that of lambda expressions (§15.27.4). When a method reference
expression has an expression (rather than a type) preceding the ::
separator, that subexpression is evaluated immediately. The result of
evaluation is stored until the method of the corresponding functional
interface type is invoked; at that point, the result is used as the
target reference for the invocation. This means the expression
preceding the :: separator is evaluated only when the program
encounters the method reference expression, and is not re-evaluated on
subsequent invocations on the functional interface type.
I would assume that a functional interface type is created but not each time with each invocation. It should as well be cached and optimized for the less amount of evaluations.
Well, the compiler has a lot of leeway in how it actually implements the code you write, but generally .map() takes a Function Object so whatever expression you put in the parentheses will produce an object.
That does not mean, however, that a new Object is created every time. In your lambda example, the lambda function doesn't reference anything defined in an enclosing method scope, so a single Function object can be created and reused for all calls.
Similarly, the A::staticFilter reference only needs to produce one Function.
The object created by this::myFilter, however, needs to have a reference to this (unless the compiler can determine that it doesn't!), and so you will certainly get a new Function created inside each call to transform.

Are there programs for the JVM that cannot be decompiled to valid Java? [duplicate]

Are there currently (Java 6) things you can do in Java bytecode that you can't do from within the Java language?
I know both are Turing complete, so read "can do" as "can do significantly faster/better, or just in a different way".
I'm thinking of extra bytecodes like invokedynamic, which can't be generated using Java, except that specific one is for a future version.
After working with Java byte code for quite a while and doing some additional research on this matter, here is a summary of my findings:
Execute code in a constructor before calling a super constructor or auxiliary constructor
In the Java programming language (JPL), a constructor's first statement must be an invocation of a super constructor or another constructor of the same class. This is not true for Java byte code (JBC). Within byte code, it is absolutely legitimate to execute any code before a constructor, as long as:
Another compatible constructor is called at some time after this code block.
This call is not within a conditional statement.
Before this constructor call, no field of the constructed instance is read and none of its methods is invoked. This implies the next item.
Set instance fields before calling a super constructor or auxiliary constructor
As mentioned before, it is perfectly legal to set a field value of an instance before calling another constructor. There even exists a legacy hack which makes it able to exploit this "feature" in Java versions before 6:
class Foo {
public String s;
public Foo() {
System.out.println(s);
}
}
class Bar extends Foo {
public Bar() {
this(s = "Hello World!");
}
private Bar(String helper) {
super();
}
}
This way, a field could be set before the super constructor is invoked which is however not longer possible. In JBC, this behavior can still be implemented.
Branch a super constructor call
In Java, it is not possible to define a constructor call like
class Foo {
Foo() { }
Foo(Void v) { }
}
class Bar() {
if(System.currentTimeMillis() % 2 == 0) {
super();
} else {
super(null);
}
}
Until Java 7u23, the HotSpot VM's verifier did however miss this check which is why it was possible. This was used by several code generation tools as a sort of a hack but it is not longer legal to implement a class like this.
The latter was merely a bug in this compiler version. In newer compiler versions, this is again possible.
Define a class without any constructor
The Java compiler will always implement at least one constructor for any class. In Java byte code, this is not required. This allows the creation of classes that cannot be constructed even when using reflection. However, using sun.misc.Unsafe still allows for the creation of such instances.
Define methods with identical signature but with different return type
In the JPL, a method is identified as unique by its name and its raw parameter types. In JBC, the raw return type is additionally considered.
Define fields that do not differ by name but only by type
A class file can contain several fields of the same name as long as they declare a different field type. The JVM always refers to a field as a tuple of name and type.
Throw undeclared checked exceptions without catching them
The Java runtime and the Java byte code are not aware of the concept of checked exceptions. It is only the Java compiler that verifies that checked exceptions are always either caught or declared if they are thrown.
Use dynamic method invocation outside of lambda expressions
The so-called dynamic method invocation can be used for anything, not only for Java's lambda expressions. Using this feature allows for example to switch out execution logic at runtime. Many dynamic programming languages that boil down to JBC improved their performance by using this instruction. In Java byte code, you could also emulate lambda expressions in Java 7 where the compiler did not yet allow for any use of dynamic method invocation while the JVM already understood the instruction.
Use identifiers that are not normally considered legal
Ever fancied using spaces and a line break in your method's name? Create your own JBC and good luck for code review. The only illegal characters for identifiers are ., ;, [ and /. Additionally, methods that are not named <init> or <clinit> cannot contain < and >.
Reassign final parameters or the this reference
final parameters do not exist in JBC and can consequently be reassigned. Any parameter, including the this reference is only stored in a simple array within the JVM what allows to reassign the this reference at index 0 within a single method frame.
Reassign final fields
As long as a final field is assigned within a constructor, it is legal to reassign this value or even not assign a value at all. Therefore, the following two constructors are legal:
class Foo {
final int bar;
Foo() { } // bar == 0
Foo(Void v) { // bar == 2
bar = 1;
bar = 2;
}
}
For static final fields, it is even allowed to reassign the fields outside of
the class initializer.
Treat constructors and the class initializer as if they were methods
This is more of a conceptional feature but constructors are not treated any differently within JBC than normal methods. It is only the JVM's verifier that assures that constructors call another legal constructor. Other than that, it is merely a Java naming convention that constructors must be called <init> and that the class initializer is called <clinit>. Besides this difference, the representation of methods and constructors is identical. As Holger pointed out in a comment, you can even define constructors with return types other than void or a class initializer with arguments, even though it is not possible to call these methods.
Create asymmetric records*.
When creating a record
record Foo(Object bar) { }
javac will generate a class file with a single field named bar, an accessor method named bar() and a constructor taking a single Object. Additionally, a record attribute for bar is added. By manually generating a record, it is possible to create, a different constructor shape, to skip the field and to implement the accessor differently. At the same time, it is still possible to make the reflection API believe that the class represents an actual record.
Call any super method (until Java 1.1)
However, this is only possible for Java versions 1 and 1.1. In JBC, methods are always dispatched on an explicit target type. This means that for
class Foo {
void baz() { System.out.println("Foo"); }
}
class Bar extends Foo {
#Override
void baz() { System.out.println("Bar"); }
}
class Qux extends Bar {
#Override
void baz() { System.out.println("Qux"); }
}
it was possible to implement Qux#baz to invoke Foo#baz while jumping over Bar#baz. While it is still possible to define an explicit invocation to call another super method implementation than that of the direct super class, this does no longer have any effect in Java versions after 1.1. In Java 1.1, this behavior was controlled by setting the ACC_SUPER flag which would enable the same behavior that only calls the direct super class's implementation.
Define a non-virtual call of a method that is declared in the same class
In Java, it is not possible to define a class
class Foo {
void foo() {
bar();
}
void bar() { }
}
class Bar extends Foo {
#Override void bar() {
throw new RuntimeException();
}
}
The above code will always result in a RuntimeException when foo is invoked on an instance of Bar. It is not possible to define the Foo::foo method to invoke its own bar method which is defined in Foo. As bar is a non-private instance method, the call is always virtual. With byte code, one can however define the invocation to use the INVOKESPECIAL opcode which directly links the bar method call in Foo::foo to Foo's version. This opcode is normally used to implement super method invocations but you can reuse the opcode to implement the described behavior.
Fine-grain type annotations
In Java, annotations are applied according to their #Target that the annotations declares. Using byte code manipulation, it is possible to define annotations independently of this control. Also, it is for example possible to annotate a parameter type without annotating the parameter even if the #Target annotation applies to both elements.
Define any attribute for a type or its members
Within the Java language, it is only possible to define annotations for fields, methods or classes. In JBC, you can basically embed any information into the Java classes. In order to make use of this information, you can however no longer rely on the Java class loading mechanism but you need to extract the meta information by yourself.
Overflow and implicitly assign byte, short, char and boolean values
The latter primitive types are not normally known in JBC but are only defined for array types or for field and method descriptors. Within byte code instructions, all of the named types take the space 32 bit which allows to represent them as int. Officially, only the int, float, long and double types exist within byte code which all need explicit conversion by the rule of the JVM's verifier.
Not release a monitor
A synchronized block is actually made up of two statements, one to acquire and one to release a monitor. In JBC, you can acquire one without releasing it.
Note: In recent implementations of HotSpot, this instead leads to an IllegalMonitorStateException at the end of a method or to an implicit release if the method is terminated by an exception itself.
Add more than one return statement to a type initializer
In Java, even a trivial type initializer such as
class Foo {
static {
return;
}
}
is illegal. In byte code, the type initializer is treated just as any other method, i.e. return statements can be defined anywhere.
Create irreducible loops
The Java compiler converts loops to goto statements in Java byte code. Such statements can be used to create irreducible loops, which the Java compiler never does.
Define a recursive catch block
In Java byte code, you can define a block:
try {
throw new Exception();
} catch (Exception e) {
<goto on exception>
throw Exception();
}
A similar statement is created implicitly when using a synchronized block in Java where any exception while releasing a monitor returns to the instruction for releasing this monitor. Normally, no exception should occur on such an instruction but if it would (e.g. the deprecated ThreadDeath), the monitor would still be released.
Call any default method
The Java compiler requires several conditions to be fulfilled in order to allow a default method's invocation:
The method must be the most specific one (must not be overridden by a sub interface that is implemented by any type, including super types).
The default method's interface type must be implemented directly by the class that is calling the default method. However, if interface B extends interface A but does not override a method in A, the method can still be invoked.
For Java byte code, only the second condition counts. The first one is however irrelevant.
Invoke a super method on an instance that is not this
The Java compiler only allows to invoke a super (or interface default) method on instances of this. In byte code, it is however also possible to invoke the super method on an instance of the same type similar to the following:
class Foo {
void m(Foo f) {
f.super.toString(); // calls Object::toString
}
public String toString() {
return "foo";
}
}
Access synthetic members
In Java byte code, it is possible to access synthetic members directly. For example, consider how in the following example the outer instance of another Bar instance is accessed:
class Foo {
class Bar {
void bar(Bar bar) {
Foo foo = bar.Foo.this;
}
}
}
This is generally true for any synthetic field, class or method.
Define out-of-sync generic type information
While the Java runtime does not process generic types (after the Java compiler applies type erasure), this information is still attcheched to a compiled class as meta information and made accessible via the reflection API.
The verifier does not check the consistency of these meta data String-encoded values. It is therefore possible to define information on generic types that does not match the erasure. As a concequence, the following assertings can be true:
Method method = ...
assertTrue(method.getParameterTypes() != method.getGenericParameterTypes());
Field field = ...
assertTrue(field.getFieldType() == String.class);
assertTrue(field.getGenericFieldType() == Integer.class);
Also, the signature can be defined as invalid such that a runtime exception is thrown. This exception is thrown when the information is accessed for the first time as it is evaluated lazily. (Similar to annotation values with an error.)
Append parameter meta information only for certain methods
The Java compiler allows for embedding parameter name and modifier information when compiling a class with the parameter flag enabled. In the Java class file format, this information is however stored per-method what makes it possible to only embed such method information for certain methods.
Mess things up and hard-crash your JVM
As an example, in Java byte code, you can define to invoke any method on any type. Usually, the verifier will complain if a type does not known of such a method. However, if you invoke an unknown method on an array, I found a bug in some JVM version where the verifier will miss this and your JVM will finish off once the instruction is invoked. This is hardly a feature though, but it is technically something that is not possible with javac compiled Java. Java has some sort of double validation. The first validation is applied by the Java compiler, the second one by the JVM when a class is loaded. By skipping the compiler, you might find a weak spot in the verifier's validation. This is rather a general statement than a feature, though.
Annotate a constructor's receiver type when there is no outer class
Since Java 8, non-static methods and constructors of inner classes can declare a receiver type and annotate these types. Constructors of top-level classes cannot annotate their receiver type as they most not declare one.
class Foo {
class Bar {
Bar(#TypeAnnotation Foo Foo.this) { }
}
Foo() { } // Must not declare a receiver type
}
Since Foo.class.getDeclaredConstructor().getAnnotatedReceiverType() does however return an AnnotatedType representing Foo, it is possible to include type annotations for Foo's constructor directly in the class file where these annotations are later read by the reflection API.
Use unused / legacy byte code instructions
Since others named it, I will include it as well. Java was formerly making use of subroutines by the JSR and RET statements. JBC even knew its own type of a return address for this purpose. However, the use of subroutines did overcomplicate static code analysis which is why these instructions are not longer used. Instead, the Java compiler will duplicate code it compiles. However, this basically creates identical logic which is why I do not really consider it to achieve something different. Similarly, you could for example add the NOOP byte code instruction which is not used by the Java compiler either but this would not really allow you to achieve something new either. As pointed out in the context, these mentioned "feature instructions" are now removed from the set of legal opcodes which does render them even less of a feature.
As far as I know there are no major features in the bytecodes supported by Java 6 that are not also accessible from Java source code. The main reason for this is obviously that the Java bytecode was designed with the Java language in mind.
There are some features that are not produced by modern Java compilers, however:
The ACC_SUPER flag:
This is a flag that can be set on a class and specifies how a specific corner case of the invokespecial bytecode is handled for this class. It is set by all modern Java compilers (where "modern" is >= Java 1.1, if I remember correctly) and only ancient Java compilers produced class files where this was un-set. This flag exists only for backwards-compatibility reasons. Note that starting with Java 7u51, ACC_SUPER is ignored completely due to security reasons.
The jsr/ret bytecodes.
These bytecodes were used to implement sub-routines (mostly for implementing finally blocks). They are no longer produced since Java 6. The reason for their deprecation is that they complicate static verification a lot for no great gain (i.e. code that uses can almost always be re-implemented with normal jumps with very little overhead).
Having two methods in a class that only differ in return type.
The Java language specification does not allow two methods in the same class when they differ only in their return type (i.e. same name, same argument list, ...). The JVM specification however, has no such restriction, so a class file can contain two such methods, there's just no way to produce such a class file using the normal Java compiler. There's a nice example/explanation in this answer.
Here are some features that can be done in Java bytecode but not in Java source code:
Throwing a checked exception from a method without declaring that the method throws it. The checked and unchecked exceptions are a thing which is checked only by the Java compiler, not the JVM. Because of this for example Scala can throw checked exceptions from methods without declaring them. Though with Java generics there is a workaround called sneaky throw.
Having two methods in a class that only differ in return type, as already mentioned in Joachim's answer: The Java language specification does not allow two methods in the same class when they differ only in their return type (i.e. same name, same argument list, ...). The JVM specification however, has no such restriction, so a class file can contain two such methods, there's just no way to produce such a class file using the normal Java compiler. There's a nice example/explanation in this answer.
GOTO can be used with labels to create your own control structures (other than for while etc)
You can override the this local variable inside a method
Combining both of these you can create create tail call optimised bytecode (I do this in JCompilo)
As a related point you can get parameter name for methods if compiled with debug (Paranamer does this by reading the bytecode
Maybe section 7A in this document is of interest, although it's about bytecode pitfalls rather than bytecode features.
In Java language the first statement in a constructor must be a call to the super class constructor. Bytecode does not have this limitation, instead the rule is that the super class constructor or another constructor in the same class must be called for the object before accessing the members. This should allow more freedom such as:
Create an instance of another object, store it in a local variable (or stack) and pass it as a parameter to super class constructor while still keeping the reference in that variable for other use.
Call different other constructors based on a condition. This should be possible: How to call a different constructor conditionally in Java?
I have not tested these, so please correct me if I'm wrong.
Something you can do with byte code, rather than plain Java code, is generate code which can loaded and run without a compiler. Many systems have JRE rather than JDK and if you want to generate code dynamically it may be better, if not easier, to generate byte code instead of Java code has to be compiled before it can be used.
I wrote a bytecode optimizer when I was a I-Play, (it was designed to reduce the code size for J2ME applications). One feature I added was the ability to use inline bytecode (similar to inline assembly language in C++). I managed to reduce the size of a function that was part of a library method by using the DUP instruction, since I need the value twice. I also had zero byte instructions (if you are calling a method that takes a char and you want to pass an int, that you know does not need to be cast I added int2char(var) to replace char(var) and it would remove the i2c instruction to reduce the size of the code. I also made it do float a = 2.3; float b = 3.4; float c = a + b; and that would be converted to fixed point (faster, and also some J2ME did not support floating point).
In Java, if you attempt to override a public method with a protected method (or any other reduction in access), you get an error: "attempting to assign weaker access privileges". If you do it with JVM bytecode, the verifier is fine with it, and you can call these methods via the parent class as if they were public.

Constructor bytecode

The ASM guide talks about constructors:
package pkg;
public class Bean {
private int f;
public int getF() {
return this.f;
}
public void setF(int f) {
this.f = f;
}
}
The Bean class also has a default public constructor which is
generated by the compiler, since no explicit constructor was defined
by the programmer. This default public constructor is generated as
Bean() { super(); }. The bytecode of this constructor is the
following:
ALOAD 0
INVOKESPECIAL java/lang/Object <init> ()V
RETURN
The first instruction pushes this on the operand stack. The second
instruction pops this value from the stack, and calls the <init>
method defined in the Object class. This corresponds to the super()
call, i.e. a call to the constructor of the super class, Object. You
can see here that constructors are named differently in compiled and
source classes: in compiled classes they are always named <init>,
while in source classes they have the name of the class in which they
are defined. Finally the last instruction returns to the caller.
How is the value of this already known to the JVM before the first instruction of the constructor?
At the JVM level, first the object is allocated, uninitialized, then the constructor is invoked on that object. The constructor is more-or-less an instance method executed on the uninitialized object.
Even in the Java language, this exists and has all its fields at the first line of the constructor.
The first thing to understand, is how object instantiation work on the bytecode level.
As explained in JVMS, §3.8. Working with Class Instances:
Java Virtual Machine class instances are created using the Java Virtual Machine's new instruction. Recall that at the level of the Java Virtual Machine, a constructor appears as a method with the compiler-supplied name <init>. This specially named method is known as the instance initialization method (§2.9). Multiple instance initialization methods, corresponding to multiple constructors, may exist for a given class. Once the class instance has been created and its instance variables, including those of the class and all of its superclasses, have been initialized to their default values, an instance initialization method of the new class instance is invoked. For example:
Object create() {
return new Object();
}
compiles to:
Method java.lang.Object create()
0 new #1 // Class java.lang.Object
3 dup
4 invokespecial #4 // Method java.lang.Object.<init>()V
7 areturn
So the constructor invocation via invokespecial shares the behavior of passing this as the first argument with invokevirtual.
However, it must be emphasized that a reference to an uninitialized reference is treated specially, as you are not allowed to use it before the constructor (or the super constructor when you’re inside the constructor) has been invoked. This is enforced by the verifier.
JVMS, §4.10.2.4. Instance Initialization Methods and Newly Created Objects:
… The instance initialization method (§2.9) for class myClass sees the new uninitialized object as its this argument in local variable 0. Before that method invokes another instance initialization method of myClass or its direct superclass on this, the only operation the method can perform on this is assigning fields declared within myClass.
When doing dataflow analysis on instance methods, the verifier initializes local variable 0 to contain an object of the current class, or, for instance initialization methods, local variable 0 contains a special type indicating an uninitialized object. After an appropriate instance initialization method is invoked (from the current class or its direct superclass) on this object, all occurrences of this special type on the verifier's model of the operand stack and in the local variable array are replaced by the current class type. The verifier rejects code that uses the new object before it has been initialized or that initializes the object more than once. In addition, it ensures that every normal return of the method has invoked an instance initialization method either in the class of this method or in the direct superclass.
Similarly, a special type is created and pushed on the verifier's model of the operand stack as the result of the Java Virtual Machine instruction new. The special type indicates the instruction by which the class instance was created and the type of the uninitialized class instance created. When an instance initialization method declared in the class of the uninitialized class instance is invoked on that class instance, all occurrences of the special type are replaced by the intended type of the class instance. This change in type may propagate to subsequent instructions as the dataflow analysis proceeds.
So code creating an object via the new instruction can’t use it in any way before the constructor has been called, whereas a constructor’s code can only assign fields before another (this(…) or super(…)) constructor has been called (an opportunity used by inner classes to initialize their outer instance reference as a first action), but still can’t do anything else with their uninitialized this.
It’s also not allowed for a constructor to return when this is still in the uninitialized state. Hence, the automatically generated constructor bears the required minimum, invoking the super constructor and returning (there is no implicit return on the byte code level).
It’s generally recommended to read The Java® Virtual Machine Specification (resp. its Java 11 version) alongside to any ASM specific documentation or tutorials.

What properties are guaranteed by constructors in Java?

I used to think that, intuitively speaking, a constructor in Java is the thing that makes an object, and that nothing can touch that object until its constructor returns. However, I have been proven wrong about this over and over again:
uninitialized objects can be leaked by sharing this
uninitialized objects can be leaked by a subclass accessing it from the finalizer
uninitialized objects can be leaked to another thread before they're fully constructed
All of these facts violate my intuition of what I thought a constructor is.
I can no longer with confidence say what a constructor actually does in Java, or what it's meant to be used for. If I'm making a simple DTO with all final fields, then I can understand what the use of the constructor is, because this is exactly the same as a struct in C except it can't be modified. Other than that, I have no clue what constructors can be reliably used for in Java. Are they just a convention/syntactic sugar? (i.e If there were only factories that initialize objects for you, you would only have X x = new X(), then modify each field in x to make them have non default values - given the 3 facts above, this would be almost equivalent to how Java actually is)
I can name two properties that are actually guaranteed by constructors: If I do X x = new X(), then I know that x is an instance of X but not a subclass of X, and its final fields are fully initialized. You might be tempted to say that you know that constructor of X finished and you have a valid object, but this is untrue if you pass X to another thread - the other thread may see the uninitialized version (i.e what you just said is no different than the guarantees of calling a factory). What other properties do constructors actually guarantee?
All of these facts violate my intuition of what I thought a constructor is.
They shouldn't. A constructor does exactly what you think it does.
1: uninitialized objects can be leaked by sharing this
3: uninitialized objects can be leaked to another thread before they're fully constructed
The problem with the leaking of this, starting threads in the constructor, and storing a newly constructed object where multiple threads access it without synchronization are all problems around the reordering of the initialization of non-final (and non-volatile) fields. But the initialization code is still done by the constructor. The thread that constructed the object sees the object fully. This is about when those changes are visible in other threads which is not guaranteed by the language definition.
You might be tempted to say that you know that constructor of X finished and you have a valid object, but this is untrue if you pass X to another thread - the other thread may see the uninitialized version (i.e what you just said is no different than the guarantees of calling a factory).
This is correct. It is also correct that if you have an unsynchronized object and you mutate it in one thread, other threads may or may not see the mutation. That's the nature of threaded programming. Even constructors are not safe from the need to synchronize objects properly.
2: uninitialized objects can be leaked by a subclass accessing it from the finalizer
This document is talking about finalizers and improperly being able to access an object after it has been garbage collected. By hacking subclasses and finalizers you can generate an object that is not properly constructed but it is a major hack to do so. For me this does not somehow challenge what a constructor does. Instead it demonstrates the complexity of the modern, mature, JVM. The document also shows how you can write your code to work around this hack.
What properties are guaranteed by constructors in Java?
According to the definition, a constructor:
Allocates space for the object.
Sets all the instance variables in the object to their default values. This includes the instance variables in the object's superclasses.
Assigns the parameter variables for the object.
Processes any explicit or implicit constructor invocation (a call to this() or super() in the constructor).
Initializes variables in the class.
Executes the rest of the constructor.
In terms of your 3 issues, #1 and #3 are, again, about when the initialization of non-final and non-volatile fields are seen by threads other than the one that constructed the object. This visibility without synchronization is not guaranteed.
The #2 issue shows a mechanism where if an exception is thrown while executing the constructor, you can override the finalize method to obtain and improperly constructed object. Constructor points 1-5 have occurred. With the hack you can bypass a portion of 6. I guess it is in the eye of the beholder if this challenges the identity of the constructor.
From the JLS section 12.5:
12.5. Creation of New Class Instances
Just before a reference to the newly created object is returned as the
result, the indicated constructor is processed to initialize the new
object using the following procedure:
Assign the arguments for the constructor to newly created parameter variables for this constructor invocation.
If this constructor begins with an explicit constructor invocation (§8.8.7.1) of another constructor in the same class (using this), then
evaluate the arguments and process that constructor invocation
recursively using these same five steps. If that constructor
invocation completes abruptly, then this procedure completes abruptly
for the same reason; otherwise, continue with step 5.
This constructor does not begin with an explicit constructor invocation of another constructor in the same class (using this). If
this constructor is for a class other than Object, then this
constructor will begin with an explicit or implicit invocation of a
superclass constructor (using super). Evaluate the arguments and
process that superclass constructor invocation recursively using these
same five steps. If that constructor invocation completes abruptly,
then this procedure completes abruptly for the same reason. Otherwise,
continue with step 4.
Execute the instance initializers and instance variable initializers for this class, assigning the values of instance variable
initializers to the corresponding instance variables, in the
left-to-right order in which they appear textually in the source code
for the class. If execution of any of these initializers results in an
exception, then no further initializers are processed and this
procedure completes abruptly with that same exception. Otherwise,
continue with step 5.
Execute the rest of the body of this constructor. If that execution completes abruptly, then this procedure completes abruptly
for the same reason. Otherwise, this procedure completes normally.
**
Unlike C++, the Java programming language does not specify altered rules for method >dispatch during the creation of a new class instance. If methods are invoked that are >overridden in subclasses in the object being initialized, then these overriding methods >are used, even before the new object is completely initialized.
And from JLS 16.9:
Note that there are no rules that would allow us to conclude that V is
definitely unassigned before an instance variable initializer. We can
informally conclude that V is not definitely unassigned before any
instance variable initializer of C, but there is no need for such a
rule to be stated explicitly.
Happens before 17.4.5:
Threading 17.5.2:
A read of a final field of an object within the thread that constructs
that object is ordered with respect to the initialization of that
field within the constructor by the usual happens-before rules. If the
read occurs after the field is set in the constructor, it sees the
value the final field is assigned, otherwise it sees the default
value.
A class contains constructors that are invoked to create objects from the class blueprint.
This is what Oracle says about constructors.
Now to your point.
intuitively speaking, a constructor in Java is the thing that makes an object, and that nothing can touch that object until its constructor returns.
So according to the official documentation, your assumption is not right. And the point 1 and 2 are the abuse of the rules and behaviors of Java, unless you consciously want to leak your objects! As also being irrelevant to Constructor, I will skip discussing these points.
Now if we talk about your 3rd point, in multi-threaded environment there is nothing that can guarantee you about the consistency of your code, unless "properly synchronized blocks" or "the atomic instructions". As object creation is not a synchronized nor an atomic instruction, there is no guarantee of being consistent! There is nothing the Constructor can do with it. In other words its not the responsibility of the Constructor to make your object creation atomic.
Now, the answer to your question, "What other properties do constructors actually guarantee?" is somewhat easy. Constructors are merely nothing but special type of methods, that are invoked during object creation from the blue print of the class. So it can guarantee nothing, unless you give it a chance to be executed consistently like any other methods. It after being consistently executed it can guarantee you that, your object is created and initialized as you wanted and instructed in it.
constructors is java are just used to initialize the state of the object created..nothing more.

Why doesn't Java have initializer lists like in C++?

In C++, you can use an initializer list to initialize the class's fields before the constructor begins running. For example:
Foo::Foo(string s, double d, int n) : name(s), weight(d), age(n) {
// Empty; already handled!
}
I am curious why Java does not have a similar feature. According to Core Java: Volume 1:
C++ uses this special syntax to call field constructors. In Java, there is no need for it because objects have no subobjects, only pointers to other objects.
Here are my questions:
What do they mean by "because objects have no subobjects?" I don't understand what a subobject is (I tried looking it up); do they mean an instantiation of a subclass which extends a superclass?
As for why Java does not have initializer lists like C++, I would assume that the reason is because all fields are already initialized by default in Java and also because Java uses the super keyword to call the super(or base in C++ lingo)-class constructor. Is this correct?
In C++, initializer lists are necessary because of a few language features that are either not present in Java or work differently in Java:
const: In C++, you can define a fields that are marked const that cannot be assigned to and must be initialized in the initializer list. Java does have final fields, but you can assign to final fields in the body of a constructor. In C++, assigning to a const field in the constructor is illegal.
References: In C++, references (as opposed to pointers) must be initialized to bind to some object. It is illegal to create a reference without an initializer. In C++, the way that you specify this is with the initializer list, since if you were to refer to the reference in the body of the constructor without first initializing it you would be using an uninitialized reference. In Java, object references behave like C++ pointers and can be assigned to after created. They just default to null otherwise.
Direct subobjects. In C++, an object can contain object directly as fields, whereas in Java objects can only hold references to those objects. That is, in C++, if you declare an object that has a string as a member, the storage space for that string is built directly into the space for the object itself, while in Java you just get space for a reference to some other String object stored elsewhere. Consequently, C++ needs to provide a way for you to give those subobjects initial values, since otherwise they'd just stay uninitialized. By default it uses the default constructor for those types, but if you want to use a different constructor or no default constructor is available the initializer list gives you a way to bypass this. In Java, you don't need to worry about this because the references will default to null, and you can then assign them to refer to the objects you actually want them to refer to. If you want to use a non-default constructor, then you don't need any special syntax for it; just set the reference to a new object initialized via the appropriate constructor.
In the few cases where Java might want initializer lists (for example, to call superclass constructors or give default values to its fields), this is handled through two other language features: the super keyword to invoke superclass constructors, and the fact that Java objects can give their fields default values at the point at which they're declared. Since C++ has multiple inheritance, just having a single super keyword wouldn't unambiguously refer to a single base class, and prior to C++11 C++ didn't support default initializers in a class and had to rely on initializer lists.
C++
There is a difference between
ClassType t(initialization arguments);
and
ClassType * pt;
The latter doesn't need to be initialized (set to NULL). The former does. Think of it as an integer. You can't have an int without a value, BUT you can have an int pointer without a value.
So when you have:
class ClassType
{
OtherClass value;
OtherClass * reference;
};
Then the declaration:
ClassType object;
automatically creates an instance of OtherClass in value. Therefore, if OtherClass has initialization, it must be done in the ClassType constructor. However, reference is just a pointer (address in memory) and can remain uninitialized. If you want an instance of OtherClass you must use
object.reference = new OtherClass(initialization arguments);
Java
There is only
class ClassType
{
OtherClass reference;
}
This is equivalent to a pointer in C++. In this case when you do:
ClassType object = new ClassType();
You don't automatically create an instance of OtherClass. Therefore, you don't have to initialize anything in the constructor unless you want to. When you want an object of OtherClass you can use
object.reference = new OtherClass();
Because Java does not need them to allow initialization of fields whose type has no zero-value.
In C++
class C {
D d;
}
without a member initializer for d, D::D() will be called which makes it impossible to initialize the field if there is no zero-type for D. This can happen when D::D() is explicitly declared private.
In Java, there is a known zero-value for all reference types, null, so a field can always be initialized.
Java also does a bunch of work to make sure* that all final fields are initialized before first use and before the constructor ends, so while Java has a requirement like C++'s const field initialization requirement, it just overloads this.fieldName = <expression> in the constructor body to mean field initialization.
: modulo exceptions thrown in the ctor, overridden method calls from the base class, etc.

Categories