When & how should constructors enforce limits on instance variables? - java

I'm new to programming and am learning Java as my first oo language, by working through Introduction to Programming Using Java by David J. Eck and reading forum posts when stuck.
My question could be considered a follow-up to Java Class Constructor Parameters with range limits which deals with limiting int arguments to an Hour class' constructor to 0 through 23.
The answers to the above question mentioned throwing either Instantiation Exception or IllegalArgumentException, but were unclear about which is better style.
Also, when, if ever, is the overhead associated with the validation code justified?

It is correct only to throw an IllegalArgumentException.
Thrown to indicate that a method has been passed an illegal or inappropriate argument.
An InstantiationException is for a different purpose.
Thrown when an application tries to create an instance of a class using the newInstance method in class Class, but the specified class object cannot be instantiated. The instantiation can fail for a variety of reasons including but not limited to:
the class object represents an abstract class, an interface, an array class, a primitive type, or void
the class has no nullary constructor
An InstantiationException has to do with a reflection call failing to call a constructor, but an IllegalArgumentException means that the constructor (or method) was called successfully, but the block of code determined that an argument was inappropriate.
It is always best to have a little overhead to validate the arguments coming in to your constructor (and method). A program or class that isn't working properly is worse than a program that works properly and might be negligibly slower.

About the overhead
I think there isn't a rule, but I think, in general, enforcing such a thing in the constructor would make sense if your object needs an external parameter to make sense, just like the java.awt.Color object as #TNT told in the comments.
It could also make sense if you have some stateful object that can only be created using the parameters provided by the constructor, no other setters for the same parameters.
About the Exception
I can't do a better job than #rgettman did ;-) his answer makes complete sense to me

Generally, I like to be certain that an object passed into my code will be valid. If the class itself already enforces that, I can have that certainty without having to check again. Based on that, I'd say it's a good idea to carefully validate your constructor arguments and enforce your invariants at least if your class and the constructor are public, or accessible by a large body of code.
If you have a small package and a class that is only ever used within that limited context, or even a private inner class, it's ok to be more relaxed about validation.
I'd use an IllegalArgumentException or something derived from that for validation, because that type makes it clearer what happened and that the fault is with the caller.

Related

Can I insert instructions in constructors before calling this() / super() and before initialising any final fields?

Preface
I have been experimenting with ByteBuddy and ASM, but I am still a beginner in ASM and between beginner and advanced in ByteBuddy. This question is about ByteBuddy and about JVM bytecode limitations in general.
Situation
I had the idea of creating global mocks for testing by instrumenting constructors in such a way that instructions like these are inserted at the beginning of each constructor:
if (GlobalMockRegistry.isMock(getClass()))
return;
FYI, the GlobalMockRegistry basically wraps a Set<Class<?>> and if that set contains a certain class, then isMock(Class<?>> clazz) would return true. The advantage of that concept is that I can (de)activate global mocking for each class during runtime because if multiple tests run in the same JVM process, one test might need a certain global mock, the next one might not.
What the if(...) return; instructions above want to achieve is that if mocking is active, the constructor should not do anything:
no this() or super() calls, → update: impossible
no field initialisations, → update: possible
no other side effects. → update: might be possible, see my update below
The result would be an object with uninitialised fields that did not create any (possibly expensive) side effects such as resource allocation (database connection, file creation, you name it). Why would I want that? Could I not just create an instance with Objenesis and be happy? Not if I want a global mock, i.e. mock objects I cannot inject because they are created somewhere inside methods or field initialisers I do not have control over. Please do not worry about what method calls on such an object would do if its instance fields are not properly initialised. Just assume I have instrumented the methods to return stub results, too. I know how to do that already, the problem are only constructors in the context of this question.
Questions / problems
Now if I try to simulate the desired result in Java source code, I meet the following limitations:
I cannot insert any code before this() or super(). I could mitigate that by also instrumenting the super class hierarchy with the same if(...) return;, but would like to know if I could in theory use ASM to insert my code before this() or super() using a method visitor. Or would the byte code of the instrumented class somehow be verified during loading or retransformation and then rejected because the byte code is "illegal"? I would like to know before I start learning ASM because I want to avoid wasting time for an idea which is not feasible.
If the class contains final instance fields, I also cannot enter a return before all of those fields have been initialised in the constructor. That might happen at the very end of a complex constructor which performs lots of side effects before actually initialising the last field. So the question is similar to the previous one: Can I use ASM to insert my if(...) return; before any fields (including final ones) are initialised and produce a valid class which I could not produce using javac and will not be rejected when loaded or retransformed?
BTW, if it is relevant, we are talking about Java 8+, i.e. at the time of writing this that would be Java versions 8 to 14.
If anything about this question is unclear, please do not hesitate to ask follow-up questions, so I can improve it.
Update after discussing Antimony's answer
I think this approach could work and avoid side effects, calling the constructor chain but avoiding any side effects and resulting in a newly initialised instance with all fields empty (null, 0, false):
In order to avoid calling this.getClass(), I need to hard-code the mock target's class name directly into all constructors up the parent chain. I.e. if two "global mock" target classes have the same parent class(es), multiple of the following if blocks would be woven into each corresponding parent class, one for each hard-coded child class name.
In order to avoid any side effects from objects being created or methods being called, I need to call a super constructor myself, using null/zero/false values for each argument. That would not matter because the next parent class up the chain would have a similar code block so that the arguments given do not matter anyway.
// Avoid accessing 'this.getClass()'
if (GlobalMockRegistry.isMock(Sub.class)) {
// Identify and call any parent class constructor, ideally a default constructor.
// If none exists, call another one using default values like null, 0, false.
// In the class derived from Object, just call 'Object.<init>'.
super(null, 0, false);
return;
}
// Here follows the original byte code, i.e. the normal super/this call and
// everything else the original constructor does.
Note to myself: Antimony's answer explains "uninitialised this" very nicely. Another related answer can be found here.
Next update after evaluating my new idea
I managed to validate my new idea with a proof of concept. As my JVM byte code knowledge is too limited and I am not used to the way of thinking it requires (stack frames, local variable tables, "reverse" logic of first pushing/popping variables, then applying an operation on them, not being able to easily debug), I just implemented it in Javassist instead of ASM, which in comparison was a breeze after failing miserably with ASM after hours of trial & error.
I can take it from here and I want to thank user Antimony for his very instructive answer + comments. I do know that theoretically the same solution could be implemented using ASM, but it would be exceedingly difficult in comparison because its API is too low level for the task at hand. ByteBuddy's API is too high level, Javassist was just right for me in order to get quick results (and easily maintainable Java code) in this case.
Yes and no. Java bytecode is much less restrictive than Java (source) in this regard. You can put any bytecode you want before the constructor call, as long as you don't actually access the uninitialized object. (The only operations allowed on an uninitialized this value are calling a constructor, setting private fields declared in the same class, and comparing it against null).
Bytecode is also more flexible in where and how you make the constructor call. For example, you can call one of two different constructors in an if statement, or you can wrap the super constructor call in a "try block", both things that are impossible at the Java language level.
Apart from not accessing the uninitialized this value, the only restriction* is that the object has to be definitely initialized along any path that returns from the constructor call. This means the only way to avoid initializing the object is to throw an exception. While being much laxer than Java itself, the rules for Java bytecode were still very deliberately constructed so it is impossible to observe uninitialized objects. In general, Java bytecode is still required to be memory safe and type safe, just with a much looser type system than Java itself. Historically, Java applets were designed to run untrusted code in the JVM, so any method of bypassing these restrictions was a security vulnerability.
* The above is talking about traditional bytecode verification, as that is what I am most familiar with. I believe stackmap verification behaves similarly though, barring implementation bugs in some versions of Java.
P.S. Technically, Java can have code execute before the constructor call. If you pass arguments to the constructor, those expressions are evaluated first, and hence the ability to place bytecode before the constructor call is required in order to compile Java code. Likewise, the ability to set private fields declared in the same class is used to set synthetic variables that arise from the compilation of nested classes.
If the class contains final instance fields, I also cannot enter a return before all of those fields have been initialised in the constructor.
This, however, is eminently possible. The only restriction is that you call some constructor or superconstructor on the uninitialized this value. (Since all constructors recursively have this restriction, this will ultimately result in java.lang.Object's constructor being called). However, the JVM doesn't care what happens after that. In particular, it only cares that the fields have some well typed value, even if it is the default value (null for objects, 0 for ints, etc.) So there is no need to execute the field initializers to give them a meaningful value.
Is there any other way to get the type to be instantiated other than this.getClass() from a super class constructor?
Not as far as I am aware. There's no special opcode for magically getting the Class associated with a given value. Foo.class is just syntactic sugar which is handled by the Java compiler.

Under what circumstances should my Java class have a constructor (and not rely on the default constructor)?

I went through a coding problem in a course I'm taking, and I didn't realize that I needed to include my own constructor until I saw the instructor's solution. This has happened a few times throughout the course: I don't expect that I need a constructor, but it turns out I do need one, according to the answer given (below is one of the answers given to me).
I'm wondering now: do I need to make my own constructors when I need to pass parameters and/or I need additional functionality inside the constructor? Are there other situations when relying on the default constructor would be problematic?
private MenuIterator() {
menuIterator = menu.iterator();
calculateNumMenuItems();
}
You need a constructor exactly when you need to perform some sort of setup for your class and field initialization isn't enough. Your described constructor makes no sense because there's no way for your constructor to get menu (and the private modifier prevents you from calling new MenuIterator() in the usual fashion).
The answer given by chrylis is correct. You may also find this discussion of default constructors useful: Java default constructor. Essentially, if you provide any constructor at all (even a no-arg constructor), you will no longer be provided with a default constructor.
If you need to do anything other than call the class' superclass constructor, you will need to supply your own constructor.
Maybe slightly advanced. In addition to what #chrylis said you also need an explicit constructor if you need the constructor to be anything else than public. This is the case if you want the clients of your class to obtain an instance through a static factory method and not use the constructor directly. The Singleton pattern is just one of many uses of a static method for obtaining an instance.
I wouldn’t worry too much. Even though your instructor has a fine solution with a constructor, it could well be that you have a fine solution without a constructor. Programming problems can always be solved in more than one way.
Links
Java Constructors vs Static Factory Methods
Singleton pattern

Private constructor defined in FileNotFoundException?

Randomly I came across this site: http://resources.mpi-inf.mpg.de/d5/teaching/ss05/is05/javadoc/java/io/FileNotFoundException.html
The class FileNotFoundException has three defined constructors:
FileNotFoundException()
Constructs a FileNotFoundException with null as its error detail message.
FileNotFoundException(String s)
Constructs a FileNotFoundException with the specified detail message.
private FileNotFoundException(String path, String reason)
Constructs a FileNotFoundException with a detail message consisting of the given pathname string followed by the given reason string.
But the last constructor is defined as private?
Again, here: http://www.docjar.com/html/api/java/io/FileNotFoundException.java.html we can see the full class definition. There is no other code, so the singleton pattern is obviously not used for that case, nor we can see, why it should be prevented to instantiate the class outside of the object, nor is it a factory method, static (utility class) method or an constants-only class.
I am C# dev so I might not be aware about some stuff that is going on here but I would still be interested why it is defined as private, for what it is used and if there is any example or an use case for that last constructor.
The comment mentions:
This private constructor is invoked only by native I/O methods.
Anybody explain this a bit further in detail?
Keep in mind: a lot of the libraries of the JVM are written in Java, like that exception. But when interacting with "the rest of the world"; sooner or later Java doesn't do any more - there is a need to talk C/C++ in order to make real system calls.
Meaning: certain operations related to file IO can't be completely implemented in Java. Thus native code comes in (compiled binaries). But of course, such a call can fail as well. But then one needs a mean to communicate that on the Java side - in other words: an exception needs to be thrown.
Given the comments that you are quoting this seems pretty straight forward: when certain IO related native operations fail; they will use that private constructor to create the exception that is then thrown at "you". And yes, native methods can call private methods!
Edit: but when looking at the implementation - there is really nothing specific about that constructor One could easily construct such an exception using the exact same message that this private ctor would create.
private FileNotFoundException(String path, String reason) {
super(path + ((reason == null)
? ""
: " (" + reason + ")"));
}
So, my personal guess: this could even be some "leftover". Something that had a certain meaning 15 years ago; but isn't of "real meaning" any more. Or even more simple, a convenience method allowing native code to either pass a null or a non-null reason string.
The constructor in question is private so that no other class can use it to initialize an instance. It could, in principle, be used by the class itself -- that sort of thing is not unusual when one constructor is intended to be invoked by another, or by a factory method.
In this case, however, the documentation presents a different reason, which you in fact quoted:
This private constructor is invoked only by native I/O methods.
That seems clear enough to me, but I suppose your confusion may revolve around details of Java access control -- in particular, that it does not apply to native methods. Thus the native methods by which various I/O functionalities are implemented can instantiate FileNotFoundException via the private constructor, regardless of which class they belong to.

Virtual Mechanism in C++ and Java [duplicate]

In Java:
class Base {
public Base() { System.out.println("Base::Base()"); virt(); }
void virt() { System.out.println("Base::virt()"); }
}
class Derived extends Base {
public Derived() { System.out.println("Derived::Derived()"); virt(); }
void virt() { System.out.println("Derived::virt()"); }
}
public class Main {
public static void main(String[] args) {
new Derived();
}
}
This will output
Base::Base()
Derived::virt()
Derived::Derived()
Derived::virt()
However, in C++ the result is different:
Base::Base()
Base::virt() // ← Not Derived::virt()
Derived::Derived()
Derived::virt()
(See http://www.parashift.com/c++-faq-lite/calling-virtuals-from-ctors.html for C++ code)
What causes such a difference between Java and C++? Is it the time when vtable is initialized?
EDIT: I do understand Java and C++ mechanisms. What I want to know is the insights behind this design decision.
Both approaches clearly have disadvatages:
In Java, the call goes to a method which cannot use this properly because its members haven’t been initialised yet.
In C++, an unintuitive method (i.e. not the one in the derived class) is called if you don’t know how C++ constructs classes.
Why each language does what it does is an open question but both probably claim to be the “safer” option: C++’s way prevents the use of uninitialsed members; Java’s approach allows polymorphic semantics (to some extent) inside a class’ constructor (which is a perfectly valid use-case).
Well you have already linked to the FAQ's discussion, but that’s mainly problem-oriented, not going into the rationales, the why.
In short, it’s for type safety.
This is one of the few cases where C++ beats Java and C# on type safety. ;-)
When you create a class A, in C++ you can let each A constructor initialize the new instance so that all common assumptions about its state, called the class invariant, hold. For example, part of a class invariant can be that a pointer member points to some dynamically allocated memory. When each publicly available method preserves the class invariant, then it’s guaranteed to hold also on entry to each method, which greatly simplifies things – at least for a well-chosen class invariant!
No further checking is then necessary in each method.
In contrast, using two-phase initialization such as in Microsoft's MFC and ATL libraries you can never be quite sure whether everything has been properly initialized when a method (non-static member function) is called. This is very similar to Java and C#, except that in those languages the lack of class invariant guarantees comes from these languages merely enabling but not actively supporting the concept of a class invariant. In short, Java and C# virtual methods called from a base class constructor can be called down on a derived instance that has not yet been initialized, where the (derived) class invariant has not yet been established!
So, this C++ language support for class invariants is really great, helping do away with a lot of checking and a lot of frustrating perplexing bugs.
However, it makes a bit difficult to do derived class specific initialization in a base class constructor, e.g. doing general things in a topmost GUI Widget class’ constructor.
The FAQ item “Okay, but is there a way to simulate that behavior as if dynamic binding worked on the this object within my base class's constructor?” goes a little into that.
For a more full treatment of the most common case, see also my blog article “How to avoid post-construction by using Parts Factories”.
Regardless of how it's implemented, it's a difference in what the language definition says should happen. Java allows you to call functions on a derived object that hasn't been fully initialized (it has been zero-initialized, but its constructor has not run). C++ doesn't allow that; until the derived class's constructor has run, there is no derived class.
Hopefully this will help:
When your line new Derived() executes, the first thing that happens is the memory allocation. The program will allocate a chunk of memory big enough to hold both the members of Base and Derrived. At this point, there is no object. It's just uninitialized memory.
When Base's constructor has completed, the memory will contain an object of type Base, and the class invariant for Base should hold. There is still no Derived object in that memory.
During the construction of base, the Base object is in a partially-constructed state, but the language rules trust you enough to let you call your own member functions on a partially-constructed object. The Derived object isn't partially constructed. It doesn't exist.
Your call to the virtual function ends up calling the base class's version because at that point in time, Base is the most derived type of the object. If it were to call Derived::virt, it would be invoking a member function of Derived with a this-pointer that is not of type Derrived, breaking type safety.
Logically, a class is something that gets constructed, has functions called on it, and then gets destroyed. You can't call member functions on an object that hasn't been constructed, and you can't call member functions on an object after it's been destroyed. This is fairly fundamental to OOP, the C++ language rules are just helping you avoid doing things that break this model.
In Java, method invocation is based on object type, which is why it is behaving like that (I don't know much about c++).
Here your object is of type Derived, so jvm invokes method on Derived object.
If understand Virtual concept clearly, equivalent in java is abstract, your code right now is not really virtual code in java terms.
Happy to update my answer if something wrong.
Actually I want to know what's the insight behind this design decision
It may be that in Java, every type derives from Object, every Object is some kind of leaf type, and there's a single JVM in which all objects are constructed.
In C++, many types aren't virtual at all. Furthermore in C++, the base class and the subclass can be compiled to machine code separately: so the base class does what it does without whether it's a superclass of something else.
Constructors are not polymorphic in case of both C++ and Java languages, whereas a method could be polymorphic in both languages. This means, when a polymorphic method appears inside a constructor, the designers would be left with two choices.
Either strictly conform to the semantics on non-polymorphic
constructor and thus consider any polymorphic method invoked within a
constructor as non-polymorphic. This is how C++ does§.
Or, compromise
the strict semantics of non-polymorphic constructor and adhere to the
strict semantics of a polymorphic method. Thus polymorphic methods
from constructors are always polymorphic. This is how Java does.
Since none of the strategies offers or compromises any real benefits compared to other and yet Java way of doing it reduces lots of overhead (no need to differentiate polymorphism based on the context of constructors), and since Java was designed after C++, I would presume, the designer of Java opted for the 2nd option seeing the benefit of less implementation overhead.
Added on 21-Dec-2016
§Lest the statement “method invoked within a constructor as non-polymorphic...This is how C++ does” might be confusing without careful scrutiny of the context, I’m adding a formalization to precisely qualify what I meant.
If class C has a direct definition of some virtual function F and its ctor has an invocation to F, then any (indirect) invocation of C’s ctor on an instance of child class T will not influence the choice of F; and in fact, C::F will always be invoked from C’s ctor. In this sense, invocation of virtual F is less-polymorphic (compared to say, Java which will choose F based on T)
Further, it is important to note that, if C inherits definition of F from some parent P and has not overriden F, then C’s ctor will invoke P::F and even this, IMHO, can be determined statically.

What is the proper way to declared class objects?

This is just a quick question to settle a dispute that I stumbled on a while back (sorry I don't have the link).
How I have been declaring object is as so:
class Foo {
private Bar aBar = new Bar();
...
}
Now the dispute that I found says that this is bad Java. I have no idea why he would say that, but he was quite adamant. What he proposed was that all objects should be declared in the class body, but not instantiated until the constructor. Can anyone shed light on this for me? Is it indeed better to instantiate objects in the constructor?
TFYT
~Aedon
Edit 1:
I know that I used the word dispute, but I do not intend for this to be argumentative.
In most cases it doesn't matter. My rule of thumb is:
If you're going to use the same expression to initialize the variable in all constructors, and it doesn't rely on any parameters, do it at the point of declaration.
Otherwise, you're pretty much forced to do it in the constructor anyway.
Reasoning: by initializing at the point of declaration, it's clear that the value is going to be assigned the same way regardless of the constructor and parameters. It also keeps your constructors simpler, and free of duplication.
Caveat: Don't also assign the value in a constructor, as otherwise that invalidates the previous clarity :)
I suggest you ask your colleague (or whatever) for concrete reasons for his claims that your current code is "bad". I'm sure there are valid alternative points of view, but if he can't provide any reasons, then there's no reason to pay attention IMO.
Another quick note - I'm assuming that none of the initializers need to do any significant work. If they do, that could be a point of confusion, especially if exceptions are thrown. In general, I don't like my constructors doing a lot of work.
By assigning properties in the constructor, it becomes immediately clear what code will run when you instantiate your class.
If you assign inside a field declaration, people reading the class constructor won't realize that the field is set elsewhere.
The contract of a constructor is to create an instance that is semantically valid. That is all fields are properly initialized to reasonable values and so on. For this reason, initializing everything in the constructor helps to clarify what makes a valid instance of your class. In addition, mechanisms like constructor chaining can be used to avoid repeating the same code when you have multiple constructors.
However, that is just a textbook-like theory and in real life you sometimes do the more expedient thing. Since it will make almost no difference if you instantiated objects at the point of declaration or not there need be no strong positions that leads to disputes.

Categories