It is known that using sun.misc.Unsafe#allocateInstance one can create an object without calling any class constructors.
Is it possible to do the opposite: given an existing instance, invoke a constructor on it?
Clarification: this is not the question about something I'd do in production code. I'm curious about JVM internals and crazy things that can still be done. Answers specific to some JVM version are welcome.
JVMS §2.9 forbids invocation of constructor on already initialized objects:
Instance initialization methods may be invoked only within the Java
Virtual Machine by the invokespecial instruction, and
they may be invoked only on uninitialized class instances.
However, it is still technically possible to invoke constructor on initialized object with JNI. CallVoidMethod function does not make difference between <init> and ordinary Java methods. Moreover, JNI specification hints that CallVoidMethod may be used to call a constructor, though it does not say whether an instance has to be initialized or not:
When these functions are used to call private methods and constructors, the method ID must be derived from the real class of obj, not from one of its superclasses.
I've verified that the following code works both in JDK 8 and JDK 9. JNI allows you to do unsafe things, but you should not rely on this in production applications.
ConstructorInvoker.java
public class ConstructorInvoker {
static {
System.loadLibrary("constructorInvoker");
}
public static native void invoke(Object instance);
}
constructorInvoker.c
#include <jni.h>
JNIEXPORT void JNICALL
Java_ConstructorInvoker_invoke(JNIEnv* env, jclass self, jobject instance) {
jclass cls = (*env)->GetObjectClass(env, instance);
jmethodID constructor = (*env)->GetMethodID(env, cls, "<init>", "()V");
(*env)->CallVoidMethod(env, instance, constructor);
}
TestObject.java
public class TestObject {
int x;
public TestObject() {
System.out.println("Constructor called");
x++;
}
public static void main(String[] args) {
TestObject obj = new TestObject();
System.out.println("x = " + obj.x); // x = 1
ConstructorInvoker.invoke(obj);
System.out.println("x = " + obj.x); // x = 2
}
}
It seems that with some (very dubious) tricks this is possible, even without going through a custom native library, by (ab)using method handles.
This method essentially tricks the JVM into thinking it is currently invoking a regular method instead of a constructor.
I just have to add a mandatory "this is probably not a good idea", but this is the only way I found for doing this. I also can't attest to how this behaves on different JVMs.
Prerequisites
To do this, an instance of sun.misc.Unsafe is needed. I will not go into detail about how to obtain this here since you already seem to have one, but this guide explains the process.
Step 1: Obtaining a trusted MethodHandles.Lookup
Next, a java.lang.invoke.MethodHandles$Lookup is needed to get the actual method handle for the constructor.
This class has a permission system which works through the allowedModes property in Lookup, which is set to a bunch of Flags. There is a special TRUSTED flag that circumvents all permission checks.
Unfortunately, the allowedModes field is filtered from reflection, so we cannot simply bypass the permissions by setting that value through reflection.
Even though reflecion filters can be circumvented aswell, there is a simpler way: Lookup contains a static field IMPL_LOOKUP, which holds a Lookup with those TRUSTED permissions. We can get this instance by using reflection and Unsafe:
var field = MethodHandles.Lookup.class.getDeclaredField("IMPL_LOOKUP");
var fieldOffset = unsafe.staticFieldOffset(field);
var lookup = (MethodHandles.Lookup) unsafe.getObject(MethodHandles.Lookup.class, fieldOffset);
We use Unsafe here instead of setAccessible and get, because going through reflection will cause issues with the module system in the newer java versions.
Step 2: Finding the constructor
Now we can get a MethodHandle for the constructor we want to invoke. We do this by using the Lookup we just obtained, just like a Lookup would be used normally.
var type = MethodType.methodType(Void.TYPE, <your constructor argument types>);
var constructor = lookup.findConstructor(<your class>, type);
Step 3: Getting the MemberName
While the signature of findConstructor only specifies that it returns a MethodHandle, it actuall returns a java.lang.invoke.DirectMethodHandle$Constructor. This type declares a initMethod field, which contains the java.lang.invoke.MemberName referencing our constructor. The MemberName type is not accessible from the outside, so all interaction with it happens through Unsafe.
We can obtain this MemberName in the same way we also obtained the Lookup:
var constructorClass = Class.forName("java.lang.invoke.DirectMethodHandle$Constructor");
val initMethodField = constructorClass.getDeclaredField("initMethod");
val initMethodFieldOffset = unsafe.objectFieldOffset(initMethodField);
var initMemberName = unsafe.getObject(constructor, initMethodFieldOffset)
Step 4: Tricking Java
The next step is the important part. While there are no physical barriers from the JVM that prevent you from invoking a constructor like any other method, MethodHandle has some checks in place to ensure that you are not doing something fishy.
Most of the checks are circumvented by using the TRUSTED Lookup, and there remains one final check:
The MemberName instance contains a bunch of flags that, among other things, tell the system what kind of member the MemberName is referring to. These flags are checked.
To circumvent this, we can simply change the flags using Unsafe:
var memberNameClass = Class.forName("java.lang.invoke.MemberName");
var flagsField = memberNameClass.getDeclaredField("flags");
var flagsFieldOffset = unsafe.objectFieldOffset(flagsField);
var flags = unsafe.getInt(initMemberName, flagsFieldOffset);
flags &= ~0x00020000; // remove "is constructor"
flags |= 0x00010000; // add "is (non-constructor) method"
unsafe.putInt(initMemberName, flagsFieldOffset, flags);
The values for the flags come from java.lang.invoke.MethodHandleNatives.Constants#MN_IS_METHOD and java.lang.invoke.MethodHandleNatives.Constants#MN_IS_CONSTRUCTOR.
Step 5: Obtaining a REF_invokeVirtual method handle
Now that we have a totally legit method that is not at all a constructor, we just need to obtain a regular method handle for invoking it. Luckly, MethodHandles.Lookup.class has a private method for turning a MemberName into a (Direct)MethodHandle for all kinds of invocations: getDirectMethod.
Ironically, we actually call this method using our all-powerful lookup.
First, we obtain the MethodHandle for getDirectMethod:
var getDirectMethodMethodHandle = lookup.findVirtual(
MethodHandles.Lookup.class,
"getDirectMethod",
MethodType.methodType(
MethodHandle.class,
byte.class,
Class.class,
memberNameClass,
MethodHandles.Lookup.class
)
);
we can now use this with our lookup, to obtain a MethodHandle for our MemberName:
var handle = (MethodHandle) getDirectMethod.invoke(lookup, (byte) 5, Test.class, member, lookup);
The (byte) 5 argument stands for "invoke virtual", and comes from java.lang.invoke.MethodHandleNatives.Constants#REF_invokeVirtual.
Step 6: Profit?
We can now use this handle like a regular MethodHandle, to invoke the constructor on any existing instance of that class:
handle.invoke(<instance>, <constructor arguments...>);
With this handle, the constructor can also be called multiple times, and the instance doesn't actually have to come from Unsafe#allocateInstance - an instance that was created just by using new works aswell.
A constructor is not an instance method, so no you can't invoke a constructor on an instance.
If you look at the reflection library, you'll see that the return type of Class.getConstructor() is Constructor, which doesn't have any methods that can accept a instance - its only relevant method is newInstance(), which doesn't accept a target instance; it creates one.
On the other hand, the return type of Class.getMethod() is Method, whose first parameter is the instance.
A Constructor is not a Method.
In the JVM spec for invokespecial:
An invokespecial instruction is type safe iff all of the following are true:
... (Stuff about non-init methods)
MethodName is <init>.
Descriptor specifies a void return type.
One can validly pop types matching the argument types given in Descriptor and an uninitialized type, UninitializedArg, off the incoming operand stack, yielding OperandStack.
...
If you've already initialized the instance, it's not an uninitialized type, so this will fail.
Note that other invoke* instructions (invokevirtual, invokeinterface, invokestatic, invokedynamic) explicitly preclude invocation of <init> methods, so invokespecial is the only way to invoke them.
From JLS Sec 8.8
Constructors are invoked by class instance creation expressions (§15.9), by the conversions and concatenations caused by the string concatenation operator +(§15.18.1), and by explicit constructor invocations from other constructors (§8.8.7).
...
Constructors are never invoked by method invocation expressions (§15.12).
So no, it's not possible.
If there is some common action you want to take in the constructor and elsewhere, put it into a method, and invoke that from the constructor.
Related
Are there currently (Java 6) things you can do in Java bytecode that you can't do from within the Java language?
I know both are Turing complete, so read "can do" as "can do significantly faster/better, or just in a different way".
I'm thinking of extra bytecodes like invokedynamic, which can't be generated using Java, except that specific one is for a future version.
After working with Java byte code for quite a while and doing some additional research on this matter, here is a summary of my findings:
Execute code in a constructor before calling a super constructor or auxiliary constructor
In the Java programming language (JPL), a constructor's first statement must be an invocation of a super constructor or another constructor of the same class. This is not true for Java byte code (JBC). Within byte code, it is absolutely legitimate to execute any code before a constructor, as long as:
Another compatible constructor is called at some time after this code block.
This call is not within a conditional statement.
Before this constructor call, no field of the constructed instance is read and none of its methods is invoked. This implies the next item.
Set instance fields before calling a super constructor or auxiliary constructor
As mentioned before, it is perfectly legal to set a field value of an instance before calling another constructor. There even exists a legacy hack which makes it able to exploit this "feature" in Java versions before 6:
class Foo {
public String s;
public Foo() {
System.out.println(s);
}
}
class Bar extends Foo {
public Bar() {
this(s = "Hello World!");
}
private Bar(String helper) {
super();
}
}
This way, a field could be set before the super constructor is invoked which is however not longer possible. In JBC, this behavior can still be implemented.
Branch a super constructor call
In Java, it is not possible to define a constructor call like
class Foo {
Foo() { }
Foo(Void v) { }
}
class Bar() {
if(System.currentTimeMillis() % 2 == 0) {
super();
} else {
super(null);
}
}
Until Java 7u23, the HotSpot VM's verifier did however miss this check which is why it was possible. This was used by several code generation tools as a sort of a hack but it is not longer legal to implement a class like this.
The latter was merely a bug in this compiler version. In newer compiler versions, this is again possible.
Define a class without any constructor
The Java compiler will always implement at least one constructor for any class. In Java byte code, this is not required. This allows the creation of classes that cannot be constructed even when using reflection. However, using sun.misc.Unsafe still allows for the creation of such instances.
Define methods with identical signature but with different return type
In the JPL, a method is identified as unique by its name and its raw parameter types. In JBC, the raw return type is additionally considered.
Define fields that do not differ by name but only by type
A class file can contain several fields of the same name as long as they declare a different field type. The JVM always refers to a field as a tuple of name and type.
Throw undeclared checked exceptions without catching them
The Java runtime and the Java byte code are not aware of the concept of checked exceptions. It is only the Java compiler that verifies that checked exceptions are always either caught or declared if they are thrown.
Use dynamic method invocation outside of lambda expressions
The so-called dynamic method invocation can be used for anything, not only for Java's lambda expressions. Using this feature allows for example to switch out execution logic at runtime. Many dynamic programming languages that boil down to JBC improved their performance by using this instruction. In Java byte code, you could also emulate lambda expressions in Java 7 where the compiler did not yet allow for any use of dynamic method invocation while the JVM already understood the instruction.
Use identifiers that are not normally considered legal
Ever fancied using spaces and a line break in your method's name? Create your own JBC and good luck for code review. The only illegal characters for identifiers are ., ;, [ and /. Additionally, methods that are not named <init> or <clinit> cannot contain < and >.
Reassign final parameters or the this reference
final parameters do not exist in JBC and can consequently be reassigned. Any parameter, including the this reference is only stored in a simple array within the JVM what allows to reassign the this reference at index 0 within a single method frame.
Reassign final fields
As long as a final field is assigned within a constructor, it is legal to reassign this value or even not assign a value at all. Therefore, the following two constructors are legal:
class Foo {
final int bar;
Foo() { } // bar == 0
Foo(Void v) { // bar == 2
bar = 1;
bar = 2;
}
}
For static final fields, it is even allowed to reassign the fields outside of
the class initializer.
Treat constructors and the class initializer as if they were methods
This is more of a conceptional feature but constructors are not treated any differently within JBC than normal methods. It is only the JVM's verifier that assures that constructors call another legal constructor. Other than that, it is merely a Java naming convention that constructors must be called <init> and that the class initializer is called <clinit>. Besides this difference, the representation of methods and constructors is identical. As Holger pointed out in a comment, you can even define constructors with return types other than void or a class initializer with arguments, even though it is not possible to call these methods.
Create asymmetric records*.
When creating a record
record Foo(Object bar) { }
javac will generate a class file with a single field named bar, an accessor method named bar() and a constructor taking a single Object. Additionally, a record attribute for bar is added. By manually generating a record, it is possible to create, a different constructor shape, to skip the field and to implement the accessor differently. At the same time, it is still possible to make the reflection API believe that the class represents an actual record.
Call any super method (until Java 1.1)
However, this is only possible for Java versions 1 and 1.1. In JBC, methods are always dispatched on an explicit target type. This means that for
class Foo {
void baz() { System.out.println("Foo"); }
}
class Bar extends Foo {
#Override
void baz() { System.out.println("Bar"); }
}
class Qux extends Bar {
#Override
void baz() { System.out.println("Qux"); }
}
it was possible to implement Qux#baz to invoke Foo#baz while jumping over Bar#baz. While it is still possible to define an explicit invocation to call another super method implementation than that of the direct super class, this does no longer have any effect in Java versions after 1.1. In Java 1.1, this behavior was controlled by setting the ACC_SUPER flag which would enable the same behavior that only calls the direct super class's implementation.
Define a non-virtual call of a method that is declared in the same class
In Java, it is not possible to define a class
class Foo {
void foo() {
bar();
}
void bar() { }
}
class Bar extends Foo {
#Override void bar() {
throw new RuntimeException();
}
}
The above code will always result in a RuntimeException when foo is invoked on an instance of Bar. It is not possible to define the Foo::foo method to invoke its own bar method which is defined in Foo. As bar is a non-private instance method, the call is always virtual. With byte code, one can however define the invocation to use the INVOKESPECIAL opcode which directly links the bar method call in Foo::foo to Foo's version. This opcode is normally used to implement super method invocations but you can reuse the opcode to implement the described behavior.
Fine-grain type annotations
In Java, annotations are applied according to their #Target that the annotations declares. Using byte code manipulation, it is possible to define annotations independently of this control. Also, it is for example possible to annotate a parameter type without annotating the parameter even if the #Target annotation applies to both elements.
Define any attribute for a type or its members
Within the Java language, it is only possible to define annotations for fields, methods or classes. In JBC, you can basically embed any information into the Java classes. In order to make use of this information, you can however no longer rely on the Java class loading mechanism but you need to extract the meta information by yourself.
Overflow and implicitly assign byte, short, char and boolean values
The latter primitive types are not normally known in JBC but are only defined for array types or for field and method descriptors. Within byte code instructions, all of the named types take the space 32 bit which allows to represent them as int. Officially, only the int, float, long and double types exist within byte code which all need explicit conversion by the rule of the JVM's verifier.
Not release a monitor
A synchronized block is actually made up of two statements, one to acquire and one to release a monitor. In JBC, you can acquire one without releasing it.
Note: In recent implementations of HotSpot, this instead leads to an IllegalMonitorStateException at the end of a method or to an implicit release if the method is terminated by an exception itself.
Add more than one return statement to a type initializer
In Java, even a trivial type initializer such as
class Foo {
static {
return;
}
}
is illegal. In byte code, the type initializer is treated just as any other method, i.e. return statements can be defined anywhere.
Create irreducible loops
The Java compiler converts loops to goto statements in Java byte code. Such statements can be used to create irreducible loops, which the Java compiler never does.
Define a recursive catch block
In Java byte code, you can define a block:
try {
throw new Exception();
} catch (Exception e) {
<goto on exception>
throw Exception();
}
A similar statement is created implicitly when using a synchronized block in Java where any exception while releasing a monitor returns to the instruction for releasing this monitor. Normally, no exception should occur on such an instruction but if it would (e.g. the deprecated ThreadDeath), the monitor would still be released.
Call any default method
The Java compiler requires several conditions to be fulfilled in order to allow a default method's invocation:
The method must be the most specific one (must not be overridden by a sub interface that is implemented by any type, including super types).
The default method's interface type must be implemented directly by the class that is calling the default method. However, if interface B extends interface A but does not override a method in A, the method can still be invoked.
For Java byte code, only the second condition counts. The first one is however irrelevant.
Invoke a super method on an instance that is not this
The Java compiler only allows to invoke a super (or interface default) method on instances of this. In byte code, it is however also possible to invoke the super method on an instance of the same type similar to the following:
class Foo {
void m(Foo f) {
f.super.toString(); // calls Object::toString
}
public String toString() {
return "foo";
}
}
Access synthetic members
In Java byte code, it is possible to access synthetic members directly. For example, consider how in the following example the outer instance of another Bar instance is accessed:
class Foo {
class Bar {
void bar(Bar bar) {
Foo foo = bar.Foo.this;
}
}
}
This is generally true for any synthetic field, class or method.
Define out-of-sync generic type information
While the Java runtime does not process generic types (after the Java compiler applies type erasure), this information is still attcheched to a compiled class as meta information and made accessible via the reflection API.
The verifier does not check the consistency of these meta data String-encoded values. It is therefore possible to define information on generic types that does not match the erasure. As a concequence, the following assertings can be true:
Method method = ...
assertTrue(method.getParameterTypes() != method.getGenericParameterTypes());
Field field = ...
assertTrue(field.getFieldType() == String.class);
assertTrue(field.getGenericFieldType() == Integer.class);
Also, the signature can be defined as invalid such that a runtime exception is thrown. This exception is thrown when the information is accessed for the first time as it is evaluated lazily. (Similar to annotation values with an error.)
Append parameter meta information only for certain methods
The Java compiler allows for embedding parameter name and modifier information when compiling a class with the parameter flag enabled. In the Java class file format, this information is however stored per-method what makes it possible to only embed such method information for certain methods.
Mess things up and hard-crash your JVM
As an example, in Java byte code, you can define to invoke any method on any type. Usually, the verifier will complain if a type does not known of such a method. However, if you invoke an unknown method on an array, I found a bug in some JVM version where the verifier will miss this and your JVM will finish off once the instruction is invoked. This is hardly a feature though, but it is technically something that is not possible with javac compiled Java. Java has some sort of double validation. The first validation is applied by the Java compiler, the second one by the JVM when a class is loaded. By skipping the compiler, you might find a weak spot in the verifier's validation. This is rather a general statement than a feature, though.
Annotate a constructor's receiver type when there is no outer class
Since Java 8, non-static methods and constructors of inner classes can declare a receiver type and annotate these types. Constructors of top-level classes cannot annotate their receiver type as they most not declare one.
class Foo {
class Bar {
Bar(#TypeAnnotation Foo Foo.this) { }
}
Foo() { } // Must not declare a receiver type
}
Since Foo.class.getDeclaredConstructor().getAnnotatedReceiverType() does however return an AnnotatedType representing Foo, it is possible to include type annotations for Foo's constructor directly in the class file where these annotations are later read by the reflection API.
Use unused / legacy byte code instructions
Since others named it, I will include it as well. Java was formerly making use of subroutines by the JSR and RET statements. JBC even knew its own type of a return address for this purpose. However, the use of subroutines did overcomplicate static code analysis which is why these instructions are not longer used. Instead, the Java compiler will duplicate code it compiles. However, this basically creates identical logic which is why I do not really consider it to achieve something different. Similarly, you could for example add the NOOP byte code instruction which is not used by the Java compiler either but this would not really allow you to achieve something new either. As pointed out in the context, these mentioned "feature instructions" are now removed from the set of legal opcodes which does render them even less of a feature.
As far as I know there are no major features in the bytecodes supported by Java 6 that are not also accessible from Java source code. The main reason for this is obviously that the Java bytecode was designed with the Java language in mind.
There are some features that are not produced by modern Java compilers, however:
The ACC_SUPER flag:
This is a flag that can be set on a class and specifies how a specific corner case of the invokespecial bytecode is handled for this class. It is set by all modern Java compilers (where "modern" is >= Java 1.1, if I remember correctly) and only ancient Java compilers produced class files where this was un-set. This flag exists only for backwards-compatibility reasons. Note that starting with Java 7u51, ACC_SUPER is ignored completely due to security reasons.
The jsr/ret bytecodes.
These bytecodes were used to implement sub-routines (mostly for implementing finally blocks). They are no longer produced since Java 6. The reason for their deprecation is that they complicate static verification a lot for no great gain (i.e. code that uses can almost always be re-implemented with normal jumps with very little overhead).
Having two methods in a class that only differ in return type.
The Java language specification does not allow two methods in the same class when they differ only in their return type (i.e. same name, same argument list, ...). The JVM specification however, has no such restriction, so a class file can contain two such methods, there's just no way to produce such a class file using the normal Java compiler. There's a nice example/explanation in this answer.
Here are some features that can be done in Java bytecode but not in Java source code:
Throwing a checked exception from a method without declaring that the method throws it. The checked and unchecked exceptions are a thing which is checked only by the Java compiler, not the JVM. Because of this for example Scala can throw checked exceptions from methods without declaring them. Though with Java generics there is a workaround called sneaky throw.
Having two methods in a class that only differ in return type, as already mentioned in Joachim's answer: The Java language specification does not allow two methods in the same class when they differ only in their return type (i.e. same name, same argument list, ...). The JVM specification however, has no such restriction, so a class file can contain two such methods, there's just no way to produce such a class file using the normal Java compiler. There's a nice example/explanation in this answer.
GOTO can be used with labels to create your own control structures (other than for while etc)
You can override the this local variable inside a method
Combining both of these you can create create tail call optimised bytecode (I do this in JCompilo)
As a related point you can get parameter name for methods if compiled with debug (Paranamer does this by reading the bytecode
Maybe section 7A in this document is of interest, although it's about bytecode pitfalls rather than bytecode features.
In Java language the first statement in a constructor must be a call to the super class constructor. Bytecode does not have this limitation, instead the rule is that the super class constructor or another constructor in the same class must be called for the object before accessing the members. This should allow more freedom such as:
Create an instance of another object, store it in a local variable (or stack) and pass it as a parameter to super class constructor while still keeping the reference in that variable for other use.
Call different other constructors based on a condition. This should be possible: How to call a different constructor conditionally in Java?
I have not tested these, so please correct me if I'm wrong.
Something you can do with byte code, rather than plain Java code, is generate code which can loaded and run without a compiler. Many systems have JRE rather than JDK and if you want to generate code dynamically it may be better, if not easier, to generate byte code instead of Java code has to be compiled before it can be used.
I wrote a bytecode optimizer when I was a I-Play, (it was designed to reduce the code size for J2ME applications). One feature I added was the ability to use inline bytecode (similar to inline assembly language in C++). I managed to reduce the size of a function that was part of a library method by using the DUP instruction, since I need the value twice. I also had zero byte instructions (if you are calling a method that takes a char and you want to pass an int, that you know does not need to be cast I added int2char(var) to replace char(var) and it would remove the i2c instruction to reduce the size of the code. I also made it do float a = 2.3; float b = 3.4; float c = a + b; and that would be converted to fixed point (faster, and also some J2ME did not support floating point).
In Java, if you attempt to override a public method with a protected method (or any other reduction in access), you get an error: "attempting to assign weaker access privileges". If you do it with JVM bytecode, the verifier is fine with it, and you can call these methods via the parent class as if they were public.
I can't understand where the final keyword is really handy when it is used on method parameters.
If we exclude the usage of anonymous classes, readability and intent declaration then it seems almost worthless to me.
Enforcing that some data remains constant is not as strong as it seems.
If the parameter is a primitive then it will have no effect since the parameter is passed to the method as a value and changing it will have no effect outside the scope.
If we are passing a parameter by reference, then the reference itself is a local variable and if the reference is changed from within the method, that would not have any effect from outside of the method scope.
Consider the simple test example below.
This test passes although the method changed the value of the reference given to it, it has no effect.
public void testNullify() {
Collection<Integer> c = new ArrayList<Integer>();
nullify(c);
assertNotNull(c);
final Collection<Integer> c1 = c;
assertTrue(c1.equals(c));
change(c);
assertTrue(c1.equals(c));
}
private void change(Collection<Integer> c) {
c = new ArrayList<Integer>();
}
public void nullify(Collection<?> t) {
t = null;
}
Stop a Variable’s Reassignment
While these answers are intellectually interesting, I've not read the short simple answer:
Use the keyword final when you want the compiler to prevent a
variable from being re-assigned to a different object.
Whether the variable is a static variable, member variable, local variable, or argument/parameter variable, the effect is entirely the same.
Example
Let’s see the effect in action.
Consider this simple method, where the two variables (arg and x) can both be re-assigned different objects.
// Example use of this method:
// this.doSomething( "tiger" );
void doSomething( String arg ) {
String x = arg; // Both variables now point to the same String object.
x = "elephant"; // This variable now points to a different String object.
arg = "giraffe"; // Ditto. Now neither variable points to the original passed String.
}
Mark the local variable as final. This results in a compiler error.
void doSomething( String arg ) {
final String x = arg; // Mark variable as 'final'.
x = "elephant"; // Compiler error: The final local variable x cannot be assigned.
arg = "giraffe";
}
Instead, let’s mark the parameter variable as final. This too results in a compiler error.
void doSomething( final String arg ) { // Mark argument as 'final'.
String x = arg;
x = "elephant";
arg = "giraffe"; // Compiler error: The passed argument variable arg cannot be re-assigned to another object.
}
Moral of the story:
If you want to ensure a variable always points to the same object,
mark the variable final.
Never Reassign Arguments
As good programming practice (in any language), you should never re-assign a parameter/argument variable to an object other than the object passed by the calling method. In the examples above, one should never write the line arg = . Since humans make mistakes, and programmers are human, let’s ask the compiler to assist us. Mark every parameter/argument variable as 'final' so that the compiler may find and flag any such re-assignments.
In Retrospect
As noted in other answers…
Given Java's original design goal of helping programmers to avoid dumb mistakes such as reading past the end of an array, Java should have been designed to automatically enforce all parameter/argument variables as 'final'. In other words, Arguments should not be variables. But hindsight is 20/20 vision, and the Java designers had their hands full at the time.
So, always add final to all arguments?
Should we add final to each and every method parameter being declared?
In theory, yes.
In practice, no.➥ Add final only when the method’s code is long or complicated, where the argument may be mistaken for a local or member variable and possibly re-assigned.
If you buy into the practice of never re-assigning an argument, you will be inclined to add a final to each. But this is tedious and makes the declaration a bit harder to read.
For short simple code where the argument is obviously an argument, and not a local variable nor a member variable, I do not bother adding the final. If the code is quite obvious, with no chance of me nor any other programmer doing maintenance or refactoring accidentally mistaking the argument variable as something other than an argument, then don’t bother. In my own work, I add final only in longer or more involved code where an argument might mistaken for a local or member variable.
#Another case added for the completeness
public class MyClass {
private int x;
//getters and setters
}
void doSomething( final MyClass arg ) { // Mark argument as 'final'.
arg = new MyClass(); // Compiler error: The passed argument variable arg cannot be re-assigned to another object.
arg.setX(20); // allowed
// We can re-assign properties of argument which is marked as final
}
record
Java 16 brings the new records feature. A record is a very brief way to define a class whose central purpose is to merely carry data, immutably and transparently.
You simply declare the class name along with the names and types of its member fields. The compiler implicitly provides the constructor, getters, equals & hashCode, and toString.
The fields are read-only, with no setters. So a record is one case where there is no need to mark the arguments final. They are already effectively final. Indeed, the compiler forbids using final when declaring the fields of a record.
public record Employee( String name , LocalDate whenHired ) // 🡄 Marking `final` here is *not* allowed.
{
}
If you provide an optional constructor, there you can mark final.
public record Employee(String name , LocalDate whenHired) // 🡄 Marking `final` here is *not* allowed.
{
public Employee ( final String name , final LocalDate whenHired ) // 🡄 Marking `final` here *is* allowed.
{
this.name = name;
whenHired = LocalDate.MIN; // 🡄 Compiler error, because of `final`.
this.whenHired = whenHired;
}
}
Sometimes it's nice to be explicit (for readability) that the variable doesn't change. Here's a simple example where using final can save some possible headaches:
public void setTest(String test) {
test = test;
}
If you forget the 'this' keyword on a setter, then the variable you want to set doesn't get set. However, if you used the final keyword on the parameter, then the bug would be caught at compile time.
Yes, excluding anonymous classes, readability and intent declaration it's almost worthless. Are those three things worthless though?
Personally I tend not to use final for local variables and parameters unless I'm using the variable in an anonymous inner class, but I can certainly see the point of those who want to make it clear that the parameter value itself won't change (even if the object it refers to changes its contents). For those who find that adds to readability, I think it's an entirely reasonable thing to do.
Your point would be more important if anyone were actually claiming that it did keep data constant in a way that it doesn't - but I can't remember seeing any such claims. Are you suggesting there's a significant body of developers suggesting that final has more effect than it really does?
EDIT: I should really have summed all of this up with a Monty Python reference; the question seems somewhat similar to asking "What have the Romans ever done for us?"
Let me explain a bit about the one case where you have to use final, which Jon already mentioned:
If you create an anonymous inner class in your method and use a local variable (such as a method parameter) inside that class, then the compiler forces you to make the parameter final:
public Iterator<Integer> createIntegerIterator(final int from, final int to)
{
return new Iterator<Integer>(){
int index = from;
public Integer next()
{
return index++;
}
public boolean hasNext()
{
return index <= to;
}
// remove method omitted
};
}
Here the from and to parameters need to be final so they can be used inside the anonymous class.
The reason for that requirement is this: Local variables live on the stack, therefore they exist only while the method is executed. However, the anonymous class instance is returned from the method, so it may live for much longer. You can't preserve the stack, because it is needed for subsequent method calls.
So what Java does instead is to put copies of those local variables as hidden instance variables into the anonymous class (you can see them if you examine the byte code). But if they were not final, one might expect the anonymous class and the method seeing changes the other one makes to the variable. In order to maintain the illusion that there is only one variable rather than two copies, it has to be final.
I use final all the time on parameters.
Does it add that much? Not really.
Would I turn it off? No.
The reason: I found 3 bugs where people had written sloppy code and failed to set a member variable in accessors. All bugs proved difficult to find.
I'd like to see this made the default in a future version of Java. The pass by value/reference thing trips up an awful lot of junior programmers.
One more thing.. my methods tend to have a low number of parameters so the extra text on a method declaration isn't an issue.
Using final in a method parameter has nothing to do with what happens to the argument on the caller side. It is only meant to mark it as not changing inside that method. As I try to adopt a more functional programming style, I kind of see the value in that.
Personally I don't use final on method parameters, because it adds too much clutter to parameter lists.
I prefer to enforce that method parameters are not changed through something like Checkstyle.
For local variables I use final whenever possible, I even let Eclipse do that automatically in my setup for personal projects.
I would certainly like something stronger like C/C++ const.
Since Java passes copies of arguments I feel the relevance of final is rather limited. I guess the habit comes from the C++ era where you could prohibit reference content from being changed by doing a const char const *. I feel this kind of stuff makes you believe the developer is inherently stupid as f*** and needs to be protected against truly every character he types. In all humbleness may I say, I write very few bugs even though I omit final (unless I don't want someone to override my methods and classes). Maybe I'm just an old-school dev.
Short answer: final helps a tiny bit but... use defensive programming on the client side instead.
Indeed, the problem with final is that it only enforces the reference is unchanged, gleefully allowing the referenced object members to be mutated, unbeknownst to the caller. Hence the best practice in this regard is defensive programming on the caller side, creating deeply immutable instances or deep copies of objects that are in danger of being mugged by unscrupulous APIs.
I never use final in a parameter list, it just adds clutter like previous respondents have said. Also in Eclipse you can set parameter assignment to generate an error so using final in a parameter list seems pretty redundant to me.
Interestingly when I enabled the Eclipse setting for parameter assignment generating an error on it caught this code (this is just how I remember the flow, not the actual code. ) :-
private String getString(String A, int i, String B, String C)
{
if (i > 0)
A += B;
if (i > 100)
A += C;
return A;
}
Playing devil's advocate, what exactly is wrong with doing this?
One additional reason to add final to parameter declarations is that it helps to identify variables that need to be renamed as part of a "Extract Method" refactoring. I have found that adding final to each parameter prior to starting a large method refactoring quickly tells me if there are any issues I need to address before continuing.
However, I generally remove them as superfluous at the end of the refactoring.
Follow up by Michel's post. I made myself another example to explain it. I hope it could help.
public static void main(String[] args){
MyParam myParam = thisIsWhy(new MyObj());
myParam.setArgNewName();
System.out.println(myParam.showObjName());
}
public static MyParam thisIsWhy(final MyObj obj){
MyParam myParam = new MyParam() {
#Override
public void setArgNewName() {
obj.name = "afterSet";
}
#Override
public String showObjName(){
return obj.name;
}
};
return myParam;
}
public static class MyObj{
String name = "beforeSet";
public MyObj() {
}
}
public abstract static class MyParam{
public abstract void setArgNewName();
public abstract String showObjName();
}
From the code above, in the method thisIsWhy(), we actually didn't assign the [argument MyObj obj] to a real reference in MyParam. In instead, we just use the [argument MyObj obj] in the method inside MyParam.
But after we finish the method thisIsWhy(), should the argument(object) MyObj still exist?
Seems like it should, because we can see in main we still call the method showObjName() and it needs to reach obj. MyParam will still use/reaches the method argument even the method already returned!
How Java really achieve this is to generate a copy also is a hidden reference of the argument MyObj obj inside the MyParam object ( but it's not a formal field in MyParam so that we can't see it )
As we call "showObjName", it will use that reference to get the corresponding value.
But if we didn't put the argument final, which leads a situation we can reassign a new memory(object) to the argument MyObj obj.
Technically there's no clash at all! If we are allowed to do that, below will be the situation:
We now have a hidden [MyObj obj] point to a [Memory A in heap] now live in MyParam object.
We also have another [MyObj obj] which is the argument point to a [Memory B in heap] now live in thisIsWhy method.
No clash, but "CONFUSING!!" Because they are all using the same "reference name" which is "obj".
To avoid this, set it as "final" to avoid programmer do the "mistake-prone" code.
I use Byte Buddy (v0.5.2) to dynamically create a "subclass" of an interface (actually, I want to create a class that implements that interface). All methods invoked on an instance of this class should be redirected to another (interceptor) class.
I used the following code (with "TestInterface" being an interface that declares exactly one method "sayHello"):
final Interceptor interceptor = new Interceptor();
Class<?> clazz = new ByteBuddy()
.subclass(TestInterface.class)
.method(any()).intercept(MethodDelegation.to(interceptor))
.make()
.load(TestInterface.class.getClassLoader(), ClassLoadingStrategy.Default.INJECTION)
.getLoaded();
TestInterface instance = (TestInterface) clazz.newInstance();
instance.sayHello();
The interceptor class looks like this:
public class Interceptor {
public Object intercept(#Origin MethodHandle method, #AllArguments Object[] args) throws Throwable {
...
}
}
However, when I try to call the "sayHello" method (last line of my code example), I get an "IncompatibleClassChangeError". The stack trace is as follows:
Exception in thread "main" java.lang.IllegalAccessError: no such method: byteuddytest.TestInterface.sayHello()void/invokeVirtual
at java.lang.invoke.MethodHandleNatives.linkMethodHandleConstant(MethodHandleNatives.java:448)
at bytebuddytest.TestInterface$ByteBuddy$0E9xusGs.sayHello(Unknown Source)
at bytebuddytest.Main.main(Main.java:32)
Caused by: java.lang.IncompatibleClassChangeError: Found interface bytebuddytest.TestInterface, but class was expected
at java.lang.invoke.MethodHandleNatives.resolve(Native Method)
at java.lang.invoke.MemberName$Factory.resolve(MemberName.java:965)
at java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:990)
at java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:1387)
at java.lang.invoke.MethodHandles$Lookup.linkMethodHandleConstant(MethodHandles.java:1732)
at java.lang.invoke.MethodHandleNatives.linkMethodHandleConstant(MethodHandleNatives.java:442)
... 2 more
The problem seems to be related to the use of the "MethodHandle" parameter in my interceptor method. When I change the type to "Method", everything works fine. But according to the docs, "MethodHandle" should be preferred to "Method" because of performance reasons.
Is the error caused by a bug in Byte Buddy, or should I actually use a "Method" parameter in this case?
Use a Method parameter and enable caching. That should solve most of your performance issues, if you have any in the first place.
See javadoc for #Origin:
public abstract boolean cacheMethod
If this value is set to true and the annotated parameter is a Method type, the value that is assigned to this parameter is cached in a static field. Otherwise, the instance is looked up from its defining Class on every invocation of the intercepted method.
Method look-ups are normally cached by its defining Class what makes a repeated look-up of a method little expensive. However, because Method instances are mutable by their AccessibleObject contact, any looked-up instance needs to be copied by its defining Class before exposing it. This can cause performance deficits when a method is for example called repeatedly in a loop. By enabling the method cache, this performance penalty can be avoided by caching a single Method instance for any intercepted method as a static field in the instrumented type.
See the answer of Jeor which is totally correct (you should mark it as accepted). Just two remarks that do not fit into a comment:
You should of course only use a MethodHandle instead of a Method if the former allows you what you do. Invoking MethodHandles implies some JVM magic. Handles are resolved with a polymorphic signature by a JVM, i.e. their arguments must not be boxed as the JVM will simply replace the call site with a method call. In your case, this does therefore not work. The advantage of a method handle is however that it can be stored in the constant pool of a class. It is a native concept that can be accessed by a byte code instruction. Compared to that, a Method reference needs to be produced explicitly.
You should therefore rather cache the Method instance (which is mutable!). Also, note that you are currently also intercepting the methods of Object. You can clean up your code a bit by:
Class<? extends TestInterface> clazz = new ByteBuddy()
.subclass(TestInterface.class)
.method(isDeclaredBy(TestInterface.class))
.intercept(MethodDelegation.to(interceptor))
.make()
.load(TestInterface.class.getClassLoader(),
ClassLoadingStrategy.Default.INJECTION)
.getLoaded();
TestInterface instance = clazz.newInstance();
I can't understand where the final keyword is really handy when it is used on method parameters.
If we exclude the usage of anonymous classes, readability and intent declaration then it seems almost worthless to me.
Enforcing that some data remains constant is not as strong as it seems.
If the parameter is a primitive then it will have no effect since the parameter is passed to the method as a value and changing it will have no effect outside the scope.
If we are passing a parameter by reference, then the reference itself is a local variable and if the reference is changed from within the method, that would not have any effect from outside of the method scope.
Consider the simple test example below.
This test passes although the method changed the value of the reference given to it, it has no effect.
public void testNullify() {
Collection<Integer> c = new ArrayList<Integer>();
nullify(c);
assertNotNull(c);
final Collection<Integer> c1 = c;
assertTrue(c1.equals(c));
change(c);
assertTrue(c1.equals(c));
}
private void change(Collection<Integer> c) {
c = new ArrayList<Integer>();
}
public void nullify(Collection<?> t) {
t = null;
}
Stop a Variable’s Reassignment
While these answers are intellectually interesting, I've not read the short simple answer:
Use the keyword final when you want the compiler to prevent a
variable from being re-assigned to a different object.
Whether the variable is a static variable, member variable, local variable, or argument/parameter variable, the effect is entirely the same.
Example
Let’s see the effect in action.
Consider this simple method, where the two variables (arg and x) can both be re-assigned different objects.
// Example use of this method:
// this.doSomething( "tiger" );
void doSomething( String arg ) {
String x = arg; // Both variables now point to the same String object.
x = "elephant"; // This variable now points to a different String object.
arg = "giraffe"; // Ditto. Now neither variable points to the original passed String.
}
Mark the local variable as final. This results in a compiler error.
void doSomething( String arg ) {
final String x = arg; // Mark variable as 'final'.
x = "elephant"; // Compiler error: The final local variable x cannot be assigned.
arg = "giraffe";
}
Instead, let’s mark the parameter variable as final. This too results in a compiler error.
void doSomething( final String arg ) { // Mark argument as 'final'.
String x = arg;
x = "elephant";
arg = "giraffe"; // Compiler error: The passed argument variable arg cannot be re-assigned to another object.
}
Moral of the story:
If you want to ensure a variable always points to the same object,
mark the variable final.
Never Reassign Arguments
As good programming practice (in any language), you should never re-assign a parameter/argument variable to an object other than the object passed by the calling method. In the examples above, one should never write the line arg = . Since humans make mistakes, and programmers are human, let’s ask the compiler to assist us. Mark every parameter/argument variable as 'final' so that the compiler may find and flag any such re-assignments.
In Retrospect
As noted in other answers…
Given Java's original design goal of helping programmers to avoid dumb mistakes such as reading past the end of an array, Java should have been designed to automatically enforce all parameter/argument variables as 'final'. In other words, Arguments should not be variables. But hindsight is 20/20 vision, and the Java designers had their hands full at the time.
So, always add final to all arguments?
Should we add final to each and every method parameter being declared?
In theory, yes.
In practice, no.➥ Add final only when the method’s code is long or complicated, where the argument may be mistaken for a local or member variable and possibly re-assigned.
If you buy into the practice of never re-assigning an argument, you will be inclined to add a final to each. But this is tedious and makes the declaration a bit harder to read.
For short simple code where the argument is obviously an argument, and not a local variable nor a member variable, I do not bother adding the final. If the code is quite obvious, with no chance of me nor any other programmer doing maintenance or refactoring accidentally mistaking the argument variable as something other than an argument, then don’t bother. In my own work, I add final only in longer or more involved code where an argument might mistaken for a local or member variable.
#Another case added for the completeness
public class MyClass {
private int x;
//getters and setters
}
void doSomething( final MyClass arg ) { // Mark argument as 'final'.
arg = new MyClass(); // Compiler error: The passed argument variable arg cannot be re-assigned to another object.
arg.setX(20); // allowed
// We can re-assign properties of argument which is marked as final
}
record
Java 16 brings the new records feature. A record is a very brief way to define a class whose central purpose is to merely carry data, immutably and transparently.
You simply declare the class name along with the names and types of its member fields. The compiler implicitly provides the constructor, getters, equals & hashCode, and toString.
The fields are read-only, with no setters. So a record is one case where there is no need to mark the arguments final. They are already effectively final. Indeed, the compiler forbids using final when declaring the fields of a record.
public record Employee( String name , LocalDate whenHired ) // 🡄 Marking `final` here is *not* allowed.
{
}
If you provide an optional constructor, there you can mark final.
public record Employee(String name , LocalDate whenHired) // 🡄 Marking `final` here is *not* allowed.
{
public Employee ( final String name , final LocalDate whenHired ) // 🡄 Marking `final` here *is* allowed.
{
this.name = name;
whenHired = LocalDate.MIN; // 🡄 Compiler error, because of `final`.
this.whenHired = whenHired;
}
}
Sometimes it's nice to be explicit (for readability) that the variable doesn't change. Here's a simple example where using final can save some possible headaches:
public void setTest(String test) {
test = test;
}
If you forget the 'this' keyword on a setter, then the variable you want to set doesn't get set. However, if you used the final keyword on the parameter, then the bug would be caught at compile time.
Yes, excluding anonymous classes, readability and intent declaration it's almost worthless. Are those three things worthless though?
Personally I tend not to use final for local variables and parameters unless I'm using the variable in an anonymous inner class, but I can certainly see the point of those who want to make it clear that the parameter value itself won't change (even if the object it refers to changes its contents). For those who find that adds to readability, I think it's an entirely reasonable thing to do.
Your point would be more important if anyone were actually claiming that it did keep data constant in a way that it doesn't - but I can't remember seeing any such claims. Are you suggesting there's a significant body of developers suggesting that final has more effect than it really does?
EDIT: I should really have summed all of this up with a Monty Python reference; the question seems somewhat similar to asking "What have the Romans ever done for us?"
Let me explain a bit about the one case where you have to use final, which Jon already mentioned:
If you create an anonymous inner class in your method and use a local variable (such as a method parameter) inside that class, then the compiler forces you to make the parameter final:
public Iterator<Integer> createIntegerIterator(final int from, final int to)
{
return new Iterator<Integer>(){
int index = from;
public Integer next()
{
return index++;
}
public boolean hasNext()
{
return index <= to;
}
// remove method omitted
};
}
Here the from and to parameters need to be final so they can be used inside the anonymous class.
The reason for that requirement is this: Local variables live on the stack, therefore they exist only while the method is executed. However, the anonymous class instance is returned from the method, so it may live for much longer. You can't preserve the stack, because it is needed for subsequent method calls.
So what Java does instead is to put copies of those local variables as hidden instance variables into the anonymous class (you can see them if you examine the byte code). But if they were not final, one might expect the anonymous class and the method seeing changes the other one makes to the variable. In order to maintain the illusion that there is only one variable rather than two copies, it has to be final.
I use final all the time on parameters.
Does it add that much? Not really.
Would I turn it off? No.
The reason: I found 3 bugs where people had written sloppy code and failed to set a member variable in accessors. All bugs proved difficult to find.
I'd like to see this made the default in a future version of Java. The pass by value/reference thing trips up an awful lot of junior programmers.
One more thing.. my methods tend to have a low number of parameters so the extra text on a method declaration isn't an issue.
Using final in a method parameter has nothing to do with what happens to the argument on the caller side. It is only meant to mark it as not changing inside that method. As I try to adopt a more functional programming style, I kind of see the value in that.
Personally I don't use final on method parameters, because it adds too much clutter to parameter lists.
I prefer to enforce that method parameters are not changed through something like Checkstyle.
For local variables I use final whenever possible, I even let Eclipse do that automatically in my setup for personal projects.
I would certainly like something stronger like C/C++ const.
Since Java passes copies of arguments I feel the relevance of final is rather limited. I guess the habit comes from the C++ era where you could prohibit reference content from being changed by doing a const char const *. I feel this kind of stuff makes you believe the developer is inherently stupid as f*** and needs to be protected against truly every character he types. In all humbleness may I say, I write very few bugs even though I omit final (unless I don't want someone to override my methods and classes). Maybe I'm just an old-school dev.
Short answer: final helps a tiny bit but... use defensive programming on the client side instead.
Indeed, the problem with final is that it only enforces the reference is unchanged, gleefully allowing the referenced object members to be mutated, unbeknownst to the caller. Hence the best practice in this regard is defensive programming on the caller side, creating deeply immutable instances or deep copies of objects that are in danger of being mugged by unscrupulous APIs.
I never use final in a parameter list, it just adds clutter like previous respondents have said. Also in Eclipse you can set parameter assignment to generate an error so using final in a parameter list seems pretty redundant to me.
Interestingly when I enabled the Eclipse setting for parameter assignment generating an error on it caught this code (this is just how I remember the flow, not the actual code. ) :-
private String getString(String A, int i, String B, String C)
{
if (i > 0)
A += B;
if (i > 100)
A += C;
return A;
}
Playing devil's advocate, what exactly is wrong with doing this?
One additional reason to add final to parameter declarations is that it helps to identify variables that need to be renamed as part of a "Extract Method" refactoring. I have found that adding final to each parameter prior to starting a large method refactoring quickly tells me if there are any issues I need to address before continuing.
However, I generally remove them as superfluous at the end of the refactoring.
Follow up by Michel's post. I made myself another example to explain it. I hope it could help.
public static void main(String[] args){
MyParam myParam = thisIsWhy(new MyObj());
myParam.setArgNewName();
System.out.println(myParam.showObjName());
}
public static MyParam thisIsWhy(final MyObj obj){
MyParam myParam = new MyParam() {
#Override
public void setArgNewName() {
obj.name = "afterSet";
}
#Override
public String showObjName(){
return obj.name;
}
};
return myParam;
}
public static class MyObj{
String name = "beforeSet";
public MyObj() {
}
}
public abstract static class MyParam{
public abstract void setArgNewName();
public abstract String showObjName();
}
From the code above, in the method thisIsWhy(), we actually didn't assign the [argument MyObj obj] to a real reference in MyParam. In instead, we just use the [argument MyObj obj] in the method inside MyParam.
But after we finish the method thisIsWhy(), should the argument(object) MyObj still exist?
Seems like it should, because we can see in main we still call the method showObjName() and it needs to reach obj. MyParam will still use/reaches the method argument even the method already returned!
How Java really achieve this is to generate a copy also is a hidden reference of the argument MyObj obj inside the MyParam object ( but it's not a formal field in MyParam so that we can't see it )
As we call "showObjName", it will use that reference to get the corresponding value.
But if we didn't put the argument final, which leads a situation we can reassign a new memory(object) to the argument MyObj obj.
Technically there's no clash at all! If we are allowed to do that, below will be the situation:
We now have a hidden [MyObj obj] point to a [Memory A in heap] now live in MyParam object.
We also have another [MyObj obj] which is the argument point to a [Memory B in heap] now live in thisIsWhy method.
No clash, but "CONFUSING!!" Because they are all using the same "reference name" which is "obj".
To avoid this, set it as "final" to avoid programmer do the "mistake-prone" code.
I know that calling a virtual method from a base class constructor can be dangerous since the child class might not be in a valid state. (at least in C#)
My question is what if the virtual method is the one who initializes the state of the object ? Is it good practice or should it be a two step process, first to create the object and then to load the state ?
First option: (using the constructor to initialize the state)
public class BaseObject {
public BaseObject(XElement definition) {
this.LoadState(definition);
}
protected abstract LoadState(XElement definition);
}
Second option: (using a two step process)
public class BaseObject {
public void LoadState(XElement definition) {
this.LoadStateCore(definition);
}
protected abstract LoadStateCore(XElement definition);
}
In the first method the consumer of the code can create and initialize the object with one statement:
// The base class will call the virtual method to load the state.
ChildObject o = new ChildObject(definition)
In the second method the consumer will have to create the object and then load the state:
ChildObject o = new ChildObject();
o.LoadState(definition);
(This answer applies to C# and Java. I believe C++ works differently on this matter.)
Calling a virtual method in a constructor is indeed dangerous, but sometimes it can end up with the cleanest code.
I would try to avoid it where possible, but without bending the design hugely. (For instance, the "initialize later" option prohibits immutability.) If you do use a virtual method in the constructor, document it very strongly. So long as everyone involved is aware of what it's doing, it shouldn't cause too many problems. I would try to limit the visibility though, as you've done in your first example.
EDIT: One thing which is important here is that there's a difference between C# and Java in order of initialization. If you have a class such as:
public class Child : Parent
{
private int foo = 10;
protected override void ShowFoo()
{
Console.WriteLine(foo);
}
}
where the Parent constructor calls ShowFoo, in C# it will display 10. The equivalent program in Java would display 0.
In C++, calling a virtual method in a base class constructor will simply call the method as if the derived class doesn't exist yet (because it doesn't). So that means that the call is resolved at compile time to whatever method it should call in the base class (or classes it derived from).
Tested with GCC, it allows you to call a pure virtual function from a constructor, but it gives a warning, and results in a link time error. It appears that this behavior is undefined by the standard:
"Member functions can be called from a constructor (or destructor) of an abstract class; the effect of making a virtual call (class.virtual) to a pure virtual function directly or indirectly for the object being created (or destroyed) from such a constructor (or destructor) is undefined."
With C++ the virtual methods are routed through the vtable for the class that is being constructed. So in your example it would generate a pure virtual method exception since whilst BaseObject is being constructed there simply is no LoadStateCore method to invoke.
If the function is not abstract, but simply does nothing then you will often get the programmer scratching their head for a while trying to remember why it is that the function doesn't actually get called.
For this reason you simply can't do it this way in C++ ...
For C++ the base constructor is called before the derived constructor, which means that the virtual table (which holds the addresses of the derived class's overridden virtual functions) does not yet exist. For this reason, it is considered a VERY dangerous thing to do (especially if the functions are pure virtual in the base class...this will cause a pure-virtual exception).
There are two ways around this:
Do a two-step process of construction + initialization
Move the virtual functions to an internal class that you can more closely control (can make use of the above approach, see example for details)
An example of (1) is:
class base
{
public:
base()
{
// only initialize base's members
}
virtual ~base()
{
// only release base's members
}
virtual bool initialize(/* whatever goes here */) = 0;
};
class derived : public base
{
public:
derived ()
{
// only initialize derived 's members
}
virtual ~derived ()
{
// only release derived 's members
}
virtual bool initialize(/* whatever goes here */)
{
// do your further initialization here
// return success/failure
}
};
An example of (2) is:
class accessible
{
private:
class accessible_impl
{
protected:
accessible_impl()
{
// only initialize accessible_impl's members
}
public:
static accessible_impl* create_impl(/* params for this factory func */);
virtual ~accessible_impl()
{
// only release accessible_impl's members
}
virtual bool initialize(/* whatever goes here */) = 0;
};
accessible_impl* m_impl;
public:
accessible()
{
m_impl = accessible_impl::create_impl(/* params to determine the exact type needed */);
if (m_impl)
{
m_impl->initialize(/* ... */); // add any initialization checking you need
}
}
virtual ~accessible()
{
if (m_impl)
{
delete m_impl;
}
}
/* Other functionality of accessible, which may or may not use the impl class */
};
Approach (2) uses the Factory pattern to provide the appropriate implementation for the accessible class (which will provide the same interface as your base class). One of the main benefits here is that you get initialization during construction of accessible that is able to make use of virtual members of accessible_impl safely.
For C++, section 12.7, paragraph 3 of the Standard covers this case.
To summarize, this is legal. It will resolve to the correct function to the type of the constructor being run. Therefore, adapting your example to C++ syntax, you'd be calling BaseObject::LoadState(). You can't get to ChildObject::LoadState(), and trying to do so by specifying the class as well as the function results in undefined behavior.
Constructors of abstract classes are covered in section 10.4, paragraph 6. In brief, they may call member functions, but calling a pure virtual function in the constructor is undefined behavior. Don't do that.
If you have a class as shown in your post, which takes an XElement in the constructor, then the only place that XElement could have come from is the derived class. So why not just load the state in the derived class which already has the XElement.
Either your example is missing some fundamental information which changes the situation, or there's simply no need to chain back up to the derived class with the information from the base class, because it has just told you that exact information.
i.e.
public class BaseClass
{
public BaseClass(XElement defintion)
{
// base class loads state here
}
}
public class DerivedClass : BaseClass
{
public DerivedClass (XElement defintion)
: base(definition)
{
// derived class loads state here
}
}
Then your code's really simple, and you don't have any of the virtual method call problems.
For C++, read Scott Meyer's corresponding article :
Never Call Virtual Functions during Construction or Destruction
ps: pay attention to this exception in the article:
The problem would almost certainly
become apparent before runtime,
because the logTransaction function is
pure virtual in Transaction. Unless it
had been defined (unlikely, but
possible) the program wouldn't link: the linker would be unable to find the necessary implementation of Transaction::logTransaction.
Usually you can get around these issues by having a greedier base constructor. In your example, you're passing an XElement to LoadState. If you allow the state to be directly set in your base constructor, then your child class can parse the XElement prior to calling your constructor.
public abstract class BaseObject {
public BaseObject(int state1, string state2, /* blah, blah */) {
this.State1 = state1;
this.State2 = state2;
/* blah, blah */
}
}
public class ChildObject : BaseObject {
public ChildObject(XElement definition) :
base(int.Parse(definition["state1"]), definition["state2"], /* blah, blah */) {
}
}
If the child class needs to do a good bit of work, it can offload to a static method.
In C++ it is perfectly safe to call virtual functions from within the base-class - as long as they are non-pure - with some restrictions. However, you shouldn't do it. Better initialize objects using non-virtual functions, which are explicitly marked as being such initialization functions using comments and an appropriate name (like initialize). If it is even declared as pure-virtual in the class calling it, the behavior is undefined.
The version that's called is the one of the class calling it from within the constructor, and not some overrider in some derived class. This hasn't got much to-do with virtual function tables, but more with the fact that the override of that function might belong to a class that's not yet initialized. So this is forbidden.
In C# and Java, that's not a problem, because there is no such thing as a default-initialization that's done just before entering the constructor's body. In C#, the only things that are done outside the body is calling base-class or sibling constructors i believe. In C++, however, initializations done to members of derived classes by the overrider of that function would be undone when constructing those members while processing the constructor initializer list just before entering the constructors body of the derived class.
Edit: Because of a comment, i think a bit of clarification is needed. Here's an (contrived) example, let's assume it would be allowed to call virtuals, and the call would result in an activation of the final overrider:
struct base {
base() { init(); }
virtual void init() = 0;
};
struct derived : base {
derived() {
// we would expect str to be "called it", but actually the
// default constructor of it initialized it to an empty string
}
virtual void init() {
// note, str not yet constructed, but we can't know this, because
// we could have called from derived's constructors body too
str = "called it";
}
private:
string str;
};
That problem can indeed be solved by changing the C++ Standard and allowing it - adjusting the definition of constructors, object lifetime and whatnot. Rules would have to be made to define what str = ...; means for a not-yet constructed object. And note how the effect of it then depends on who called init. The feature we get does not justify the problems we have to solve then. So C++ simply forbids dynamic dispatch while the object is being constructed.