java: combined instanceof and cast? - java

(Please no advise that I should abstract X more and add another method to it.)
In C++, when I have a variable x of type X* and I want to do something specific if it is also of type Y* (Y being a subclass of X), I am writing this:
if(Y* y = dynamic_cast<Y*>(x)) {
// now do sth with y
}
The same thing seems not possible in Java (or is it?).
I have read this Java code instead:
if(x instanceof Y) {
Y y = (Y) x;
// ...
}
Sometimes, when you don't have a variable x but it is a more complex expression instead, just because of this issue, you need a dummy variable in Java:
X x = something();
if(x instanceof Y) {
Y y = (Y) x;
// ...
}
// x not needed here anymore
(Common thing is that something() is iterator.next(). And there you see that you also cannot really call that just twice. You really need the dummy variable.)
You don't really need x at all here -- you just have it because you cannot do the instanceof check at once with the cast. Compare that again to the quite common C++ code:
if(Y* y = dynamic_cast<Y*>( something() )) {
// ...
}
Because of this, I have introduced a castOrNull function which makes it possible to avoid the dummy variable x. I can write this now:
Y y = castOrNull( something(), Y.class );
if(y != null) {
// ...
}
Implementation of castOrNull:
public static <T> T castOrNull(Object obj, Class<T> clazz) {
try {
return clazz.cast(obj);
} catch (ClassCastException exc) {
return null;
}
}
Now, I was told that using this castOrNull function in that way is an evil thing do to. Why is that? (Or to put the question more general: Would you agree and also think this is evil? If yes, why so? Or do you think this is a valid (maybe rare) use case?)
As said, I don't want a discussion whether the usage of such downcast is a good idea at all. But let me clarify shortly why I sometimes use it:
Sometimes I get into the case where I have to choose between adding another new method for a very specific thing (which will only apply for one single subclass in one specific case) or using such instanceof check. Basically, I have the choice between adding a function doSomethingVeryVerySpecificIfIAmY() or doing the instanceof check. And in such cases, I feel that the latter is more clean.
Sometimes I have a collection of some interface / base class and for all entries of type Y, I want to do something and then remove them from the collection. (E.g. I had the case where I had a tree structure and I wanted to delete all childs which are empty leafs.)

Starting Java 14 you should be able to do instanceof and cast at the same time. See https://openjdk.java.net/jeps/305.
Code example:
if (obj instanceof String s) {
// can use s here
} else {
// can't use s here
}
The variable s in the above example is defined if instanceof evaluates to true. The scope of the variable depends on the context. See the link above for more examples.

Now, I was told that using this castOrNull function in that way is an evil thing do to. Why is that?
I can think of a couple of reasons:
It is an obscure and tricky way of doing something very simple. Obscure and tricky code is hard to read, hard to maintain, a potential source of errors (when someone doesn't understand it) and therefore evil.
The obscure and tricky way that the castOrNull method works most likely cannot be optimized by the JIT compiler. You'll end up with at least 3 extra method calls, plus lots of extra code to do the type check and cast reflectively. Unnecessary use of reflection is evil.
(By contrast, the simple way (with instanceof followed by a class cast) uses specific bytecodes for instanceof and class casting. The bytecode sequences can almost certainly will be optimized so that there is no more than one null check and no more that one test of the object's type in the native code. This is a common pattern that should be easy for the JIT compiler to detect and optimize.)
Of course, "evil" is just another way of saying that you REALLY shouldn't do this.
Neither of your two added examples, make use of a castOrNull method either necessary or desirable. IMO, the "simple way" is better from both the readability and performance perspectives.

In most well written/designed Java code the use of instanceof and casts never happens. With the addition of generics many cases of casts (and thus instanceof) are not needed. They do, on occasion still occur.
The castOrNull method is evil in that you are making Java code look "unnatural". The biggest problem when changing from one language to another is adopting the conventions of the new language. Temporary variables are just fine in Java. In fact all your method is doing is really hiding the temporary variable.
If you are finding that you are writing a lot of casts you should examine your code and see why and look for ways to remove them. For example, in the case that you mention adding a "getNumberOfChildren" method would allow you to check if a node is empty and thus able to prune it without casting (that is a guess, it might not work for you in this case).
Generally speaking casts are "evil" in Java because they are usually not needed. Your method is more "evil" because it is not written in the way most people would expect Java to be written.
That being said, if you want to do it, go for it. It isn't actually "evil" just not "right" way to do it in Java.

IMHO your castOrNull is not evil, just pointless. You seem to be obsessed with getting rid of a temporary variable and one line of code, while to me the bigger question is why you need so many downcasts in your code? In OO this is almost always a symptom of suboptimal design. And I would prefer solving the root cause instead of treating the symptom.

I don't know exactly why the person said that it was evil. However one possibility for their reasoning was the fact that you were catching an exception afterwards rather than checking before you casted. This is a way to do that.
public static <T> T castOrNull(Object obj, Class<T> clazz) {
if ( clazz.isAssignableFrom(obj.getClass()) ) {
return clazz.cast(obj);
} else {
return null;
}
}

Java Exceptions are slow. If you're trying to optimize your performance by avoiding a double cast, you're shooting yourself in the foot by using exceptions in lieu of logic. Never rely on catching an exception for something you could reasonably check for and correct for (exactly what you're doing).
How slow are Java exceptions?

Related

Using optionals, is it possible to return early on "ifPresent" without adding a separate if-else statement?

public Void traverseQuickestRoute(){ // Void return-type from interface
findShortCutThroughWoods()
.map(WoodsShortCut::getTerrainDifficulty)
.ifPresent(this::walkThroughForestPath) // return in this case
if(isBikePresent()){
return cycleQuickestRoute()
}
....
}
Is there a way to exit the method at the ifPresent?
In case it is not possible, for other people with similar use-cases: I see two alternatives
Optional<MappedRoute> woodsShortCut = findShortCutThroughWoods();
if(woodsShortCut.isPresent()){
TerrainDifficulty terrainDifficulty = woodsShortCut.get().getTerrainDifficulty();
return walkThroughForrestPath(terrainDifficulty);
}
This feels more ugly than it needs to be and combines if/else with functional programming.
A chain of orElseGet(...) throughout the method does not look as nice, but is also a possibility.
return is a control statement. Neither lambdas (arrow notation), nor method refs (WoodsShortcut::getTerrainDifficulty) support the idea of control statements that move control to outside of themselves.
Thus, the answer is a rather trivial: Nope.
You have to think of the stream 'pipeline' as the thing you're working on. So, the question could be said differently: Can I instead change this code so that I can modify how this one pipeline operation works (everything starting at findShortCut() to the semicolon at the end of all the method invokes you do on the stream/optional), and then make this one pipeline operation the whole method.
Thus, the answer is: orElseGet is probably it.
Disappointing, perhaps. 'functional' does not strike me as the right answer here. The problem is, there are things for/if/while loops can do that 'functional' cannot do. So, if you are faced with a problem that is simpler to tackle using 'a thing that for/if/while is good at but functional is bad at', then it is probably a better plan to just use for/if/while then.
One of the core things lambdas can't do are about the transparencies. Lambdas are non-transparant in regards to these 3:
Checked exception throwing. try { list.forEach(x -> throw new IOException()); } catch (IOException e) {} isn't legal even though your human brain can trivially tell it should be fine.
(Mutable) local variables. int x = 5; list.forEach(y -> x += y); does not work. Often there are ways around this (list.mapToInt(Integer::intValue).sum() in this example), but not always.
Control flow. list.forEach(y -> {if (y < 0) return y;}); does not work.
So, keep in mind, you really have only 2 options:
Continually retrain yourself to not think in terms of such control flow. You find orElseGet 'not as nice'. I concur, but if you really want to blanket apply functional to as many places as you can possibly apply it, the whole notion of control flow out of a lambda needs not be your go-to plan, and you definitely can't keep thinking 'this code is not particularly nice because it would be simpler if I could control flow out', you're going to be depressed all day programming in this style. The day you never even think about it anymore is the day you have succeeded in retraining yourself to 'think more functional', so to speak.
Stop thinking that 'functional is always better'. Given that there are so many situations where their downsides are so significant, perhaps it is not a good idea to pre-suppose that the lambda/methodref based solution must somehow be superior. Apply what seems correct. That should often be "Actually just a plain old for loop is fine. Better than fine; it's the right, most elegant1 answer here".
[1] "This code is elegant" is, of course, a non-falsifiable statement. It's like saying "The Mona Lisa is a pretty painting". You can't make a logical argument to prove this and it is insanity to try. "This code is elegant" boils down to saying "I think it is prettier", it cannot boil down to an objective fact. That also means in team situations there's no point in debating such things. Either everybody gets to decide what 'elegant' is (hold a poll, maybe?), or you install a dictator that decrees what elegance is. If you want to fix that and have meaningful debate, the term 'elegant' needs to be defined in terms of objective, falsifiable statements. I would posit that things like:
in face of expectable future change requests, this style is easier to modify
A casual glance at code leaves a first impression. Whichever style has the property that this first impression is accurate - is better (in other words, code that confuses or misleads the casual glancer is bad). Said even more differently: Code that really needs comments to avoid confusion is worse than code that is self-evident.
this code looks familiar to a wide array of java programmers
this code consists of fewer AST nodes (the more accurate from of 'fewer lines = better')
this code has simpler semantic hierarchy (i.e. fewer indents)
Those are the kinds of things that should define 'elegance'. Under almost all of those definitions, 'an if statement' is as good or better in this specific case!
For example:
public Void traverseQuickestRoute() {
return findShortCutThroughWoods()
.map(WoodsShortCut::getTerrainDifficulty)
.map(this::walkThroughForestPath)
.orElseGet(() -> { if (isBikePresent()) { return cycleQuickestRoute(); } });
}
There is Optional#ifPresentOrElse with an extra Runnable for the else case. Since java 9.
public Void traverseQuickestRoute() { // Void return-type from interface
findShortCutThroughWoods()
.map(WoodsShortCut::getTerrainDifficulty)
.ifPresentOrElse(this::walkThroughForestPath,
this::alternative);
return null;
}
private void alternative() {
if (isBikePresent()) {
return cycleQuickestRoute()
}
...
}
I would split the method as above. Though for short code () -> { ... } might be readable.

Static default method for not initialized Classes

sometimes it would be convenient to have an easy way of doing the following:
Foo a = dosomething();
if (a != null){
if (a.isValid()){
...
}
}
My idea was to have some kind of static “default” methods for not initialized variables like this:
class Foo{
public boolean isValid(){
return true;
}
public static boolean isValid(){
return false;
}
}
And now I could do this…
Foo a = dosomething();
if (a.isValid()){
// In our example case -> variable is initialized and the "normal" method gets called
}else{
// In our example case -> variable is null
}
So, if a == null the static “default” methods from our class gets called, otherwise the method of our object gets called.
Is there either some keyword I’m missing to do exactly this or is there a reason why this is not already implemented in programming languages like java/c#?
Note: this example is not very breathtaking if this would work, however there are examples where this would be - indeed - very nice.
It's very slightly odd; ordinarily, x.foo() runs the foo() method as defined by the object that the x reference is pointing to. What you propose is a fallback mechanism where, if x is null (is referencing nothing) then we don't look at the object that x is pointing to (there's nothing its pointing at; hence, that is impossible), but that we look at the type of x, the variable itself, instead, and ask this type: Hey, can you give me the default impl of foo()?
The core problem is that you're assigning a definition to null that it just doesn't have. Your idea requires a redefinition of what null means which means the entire community needs to go back to school. I think the current definition of null in the java community is some nebulous ill defined cloud of confusion, so this is probably a good idea, but it is a huge commitment, and it is extremely easy for the OpenJDK team to dictate a direction and for the community to just ignore it. The OpenJDK team should be very hesitant in trying to 'solve' this problem by introducing a language feature, and they are.
Let's talk about the definitions of null that make sense, which definition of null your idea specifically is catering to (at the detriment of the other interpretations!), and how catering to that specific idea is already easy to do in current java, i.e. - what you propose sounds outright daft to me, in that it's just unneccessary and forces an opinion of what null means down everybody's throats for no reason.
Not applicable / undefined / unset
This definition of null is exactly how SQL defines it, and it has the following properties:
There is no default implementation available. By definition! How can one define what the size is of, say, an unset list? You can't say 0. You have no idea what the list is supposed to be. The very point is that interaction with an unset/not-applicable/unknown value should immediately lead to a result that represents either [A] the programmer messed up, the fact that they think they can interact with this value means they programmed a bug - they made an assumption about the state of the system which does not hold, or [B] that the unset nature is infectuous: The operation returns the notion 'unknown / unset / not applicable' as result.
SQL chose the B route: Any interaction with NULL in SQL land is infectuous. For example, even NULL = NULL in SQL is NULL, not FALSE. It also means that all booleans in SQL are tri-state, but this actually 'works', in that one can honestly fathom this notion. If I ask you: Hey, are the lights on?, then there are 3 reasonable answers: Yes, No, and I can't tell you right now; I don't know.
In my opinion, java as a language is meant for this definition as well, but has mostly chosen the [A] route: Throw an NPE to let everybody know: There is a bug, and to let the programmer get to the relevant line extremely quickly. NPEs are easy to solve, which is why I don't get why everybody hates NPEs. I love NPEs. So much better than some default behaviour that is usually but not always what I intended (objectively speaking, it is better to have 50 bugs that each takes 3 minutes to solve, than one bug that takes an an entire working day, by a large margin!) – this definition 'works' with the language:
Uninitialized fields, and uninitialized values in an array begin as null, and in the absence of further information, treating it as unset is correct.
They are, in fact, infectuously erroneous: Virtually all attempts to interact with them results in an exception, except ==, but that is intentional, for the same reason in SQL IS NULL will return TRUE or FALSE and not NULL: Now we're actually talking about the pointer nature of the object itself ("foo" == "foo" can be false if the 2 strings aren't the same ref: Clearly == in java between objects is about the references itself and not about the objects referenced).
A key aspect to this is that null has absolutely no semantic meaning, at all. Its lack of semantic meaning is the point. In other words, null doesn't mean that a value is short or long or blank or indicative of anything in particular. The only thing it does mean is that it means nothing. You can't derive any information from it. Hence, foo.size() is not 0 when foo is unset/unknown - the question 'what is the size of the object foo is pointing at' is unanswerable, in this definition, and thus NPE is exactly right.
Your idea would hurt this interpretation - it would confound matters by giving answers to unanswerable questions.
Sentinel / 'empty'
null is sometimes used as a value that does have semantic meaning. Something specific. For example, if you ever wrote this, you're using this interpretation:
if (x == null || x.isEmpty()) return false;
Here you've assigned a semantic meaning to null - the same meaning you assigned to an empty string. This is common in java and presumably stems from some bass ackwards notion of performance. For example, in the eclipse ecj java parser system, all empty arrays are done with null pointers. For example, the definition of a method has a field Argument[] arguments (for the method parameters; using argument is the slightly wrong word, but it is used to store the param definitions); however, for methods with zero parameters, the semantically correct choice is obviously new Argument[0]. However, that is NOT what ecj fills the Abstract Syntax Tree with, and if you are hacking around on the ecj code and assign new Argument[0] to this, other code will mess up as it just wasn't written to deal with this.
This is in my opinion bad use of null, but is quite common. And, in ecj's defense, it is about 4 times faster than javac, so I don't think it's fair to cast aspersions at their seemingly deplorably outdated code practices. If it's stupid and it works it isn't stupid, right? ecj also has a better track record than javac (going mostly by personal experience; I've found 3 bugs in ecj over the years and 12 in javac).
This kind of null does get a lot better if we implement your idea.
The better solution
What ecj should have done, get the best of both worlds: Make a public constant for it! new Argument[0], the object, is entirely immutable. You need to make a single instance, once, ever, for an entire JVM run. The JVM itself does this; try it: List.of() returns the 'singleton empty list'. So does Collections.emptyList() for the old timers in the crowd. All lists 'made' with Collections.emptyList() are actually just refs to the same singleton 'empty list' object. This works because the lists these methods make are entirely immutable.
The same can and generally should apply to you!
If you ever write this:
if (x == null || x.isEmpty())
then you messed up if we go by the first definition of null, and you're simply writing needlessly wordy, but correct, code if we go by the second
definition. You've come up with a solution to address this, but there's a much, much better one!
Find the place where x got its value, and address the boneheaded code that decided to return null instead of "". You should in fact emphatically NOT be adding null checks to your code, because it's far too easy to get into this mode where you almost always do it, and therefore you rarely actually have null refs, but it's just swiss cheese laid on top of each other: There may still be holes, and then you get NPEs. Better to never check so you get NPEs very quickly in the development process - somebody returned null where they should be returning "" instead.
Sometimes the code that made the bad null ref is out of your control. In that case, do the same thing you should always do when working with badly designed APIs: Fix it ASAP. Write a wrapper if you have to. But if you can commit a fix, do that instead. This may require making such an object.
Sentinels are awesome
Sometimes sentinel objects (objects that 'stand in' for this default / blank take, such as "" for strings, List.of() for lists, etc) can be a bit more fancy than this. For example, one can imagine using LocalDate.of(1800, 1, 1) as sentinel for a missing birthdate, but do note that this instance is not a great idea. It does crazy stuff. For example, if you write code to determine the age of a person, then it starts giving completely wrong answers (which is significantly worse than throwing an exception. With the exception you know you have a bug faster and you get a stacktrace that lets you find it in literally 500 milliseconds (just click the line, voila. That is the exact line you need to look at right now to fix the problem). It'll say someone is 212 years old all of a sudden.
But you could make a LocalDate object that does some things (such as: It CAN print itself; sentinel.toString() doesn't throw NPE but prints something like 'unset date'), but for other things it will throw an exception. For example, .getYear() would throw.
You can also make more than one sentinel. If you want a sentinel that means 'far future', that's trivially made (LocalDate.of(9999, 12, 31) is pretty good already), and you can also have one as 'for as long as anyone remembers', e.g. 'distant past'. That's cool, and not something your proposal could ever do!
You will have to deal with the consequences though. In some small ways the java ecosystem's definitions don't mesh with this, and null would perhaps have been a better standin. For example, the equals contract clearly states that a.equals(a) must always hold, and yet, just like in SQL NULL = NULL isn't TRUE, you probably don't want missingDate.equals(missingDate) to be true; that's conflating the meta with the value: You can't actually tell me that 2 missing dates are equal. By definition: The dates are missing. You do not know if they are equal or not. It is not an answerable question. And yet we can't implement the equals method of missingDate as return false; (or, better yet, as you also can't really know they aren't equal either, throw an exception) as that breaks contract (equals methods must have the identity property and must not throw, as per its own javadoc, so we can't do either of those things).
Dealing with null better
There are a few things that make dealing with null a lot easier:
Annotations: APIs can and should be very clear in communicating when their methods can return null and what that means. Annotations to turn that documentation into compiler-checked documentation is awesome. Your IDE can start warning you, as you type, that null may occur and what that means, and will say so in auto-complete dialogs too. And it's all entirely backwards compatible in all senses of the word: No need to start considering giant swaths of the java ecosystem as 'obsolete' (unlike Optional, which mostly sucks).
Optional, except this is a non-solution. The type isn't orthogonal (you can't write a method that takes a List<MaybeOptionalorNot<String>> that works on both List<String> and List<Optional<String>>, even though a method that checks the 'is it some or is it none?' state of all list members and doesn't add anything (except maybe shuffle things around) would work equally on both methods, and yet you just can't write it. This is bad, and it means all usages of optional must be 'unrolled' on the spot, and e.g. Optional<X> should show up pretty much never ever as a parameter type or field type. Only as return types and even that is dubious - I'd just stick to what Optional was made for: As return type of Stream terminal operations.
Adopting it also isn't backwards compatible. For example, hashMap.get(key) should, in all possible interpretations of what Optional is for, obviously return an Optional<V>, but it doesn't, and it never will, because java doesn't break backwards compatibility lightly and breaking that is obviously far too heavy an impact. The only real solution is to introduce java.util2 and a complete incompatible redesign of the collections API, which is splitting the java ecosystem in twain. Ask the python community (python2 vs. python3) how well that goes.
Use sentinels, use them heavily, make them available. If I were designing LocalDate, I'd have created LocalDate.FAR_FUTURE and LocalDate_DISTANT_PAST (but let it be clear that I think Stephen Colebourne, who designed JSR310, is perhaps the best API designer out there. But nothing is so perfect that it can't be complained about, right?)
Use API calls that allow defaulting. Map has this.
Do NOT write this code:
String phoneNr = phoneNumbers.get(userId);
if (phoneNr == null) return "Unknown phone number";
return phoneNr;
But DO write this:
return phoneNumbers.getOrDefault(userId, "Unknown phone number");
Don't write:
Map<Course, List<Student>> participants;
void enrollStudent(Student student) {
List<Student> participating = participants.get(econ101);
if (participating == null) {
participating = new ArrayList<Student>();
participants.put(econ101, participating);
}
participating.add(student);
}
instead write:
Map<Course, List<Student>> participants;
void enrollStudent(Student student) {
participants.computeIfAbsent(econ101,
k -> new ArrayList<Student>())
.add(student);
}
and, crucially, if you are writing APIs, ensure things like getOrDefault, computeIfAbsent, etc. are available so that the users of your API don't have to deal with null nearly as much.
You can write a static test() method like this:
static <T> boolean test(T object, Predicate<T> validation) {
return object != null && validation.test(object);
}
and
static class Foo {
public boolean isValid() {
return true;
}
}
static Foo dosomething() {
return new Foo();
}
public static void main(String[] args) {
Foo a = dosomething();
if (test(a, Foo::isValid))
System.out.println("OK");
else
System.out.println("NG");
}
output:
OK
If dosomething() returns null, it prints NG
Not exactly, but take a look at Optional:
Optional.ofNullable(dosomething())
.filter(Foo::isValid)
.ifPresent(a -> ...);

Why is "new String();" a statement but "new int[0];" not?

I just randomly tried seeing if new String(); would compile and it did (because according to Oracle's Java documentation on "Expressions, Statements, and Blocks", one of the valid statement types is "object creation"),
However, new int[0]; is giving me a "not a statement" error.
What's wrong with this? Aren't I creating an array object with new int[0]?
EDIT:
To clarify this question, the following code:
class Test {
void foo() {
new int[0];
new String();
}
}
causes a compiler error on new int[0];, whereas new String(); on its own is fine. Why is one not acceptable and the other one is fine?
The reason is a somewhat overengineered spec.
The idea behind expressions not being valid statements is that they accomplish nothing whatsoever. 5 + 2; does nothing on its own. You must assign it to something, or pass it to something, otherwise why write it?
There are exceptions, however: Expressions which, on their own, will (or possibly will) have side effects. For example, whilst this is illegal:
void foo(int a) {
a + 1;
}
This is not:
void foo(int a) {
a++;
}
That is because, on its own, a++ is not completely useless, it actually changes things (a is modified by doing this). Effectively, 'ignoring the value' (you do nothing with a + 1 in that first snippet) is acceptable if the act of producing the value on its own causes other stuff to happen: After all, maybe that is what you were after all along.
For that reason, invoking methods is also a legit expressionstatement, and in fact it is quite common that you invoke methods (even ones that don't return void), ignoring the return value. For void methods it's the only legal way to invoke them, even.
Constructors are technically methods and can have side effects. It is extremely unlikely, and quite bad code style, if this method:
void doStuff() {
new Something();
}
is 'sensible' code, but it could in theory be written, bad as it may be: The constructor of the Something class may do something useful and perhaps that's all you want to do here: Make that constructor run, do the useful thing, and then take the created object and immediately toss it in the garbage. Weird, but, okay. You're the programmer.
Contrast with:
new Something[10];
This is different: The compiler knows what the array 'constructor' does. And what it does is nothing useful - it creates an object and returns a reference to the object, and that is all that happens. If you then instantly toss the reference in the garbage, then the entire operation was a complete waste of time, and surely you did not intend to do nothing useful with such a bizarre statement, so the compiler designers thought it best to just straight up disallow you from writing it.
This 'oh dear that code makes no sense therefore I shall not compile it' is very limited and mostly an obsolete aspect of the original compiler spec; it's never been updated and this is not a good way to trust that code is sensible; there's all sorts of linter tools out there that go vastly further in finding you code that just cannot be right, so if you care about that sort of thing, invest in learning those.
Nevertheless, the java 1.0 spec had this stuff baked in and there is no particularly good reason to drop this aspect of the java spec, therefore, it remains, and constructing a new array is not a valid ExpressionStatement.
As JLS §14.8 states, specifically, a ClassInstanceCreationExpression is in the list of valid expressionstatements. Click that word to link to the definition of ClassInstanceCreationExpression and you'll find that it specifically refers to invoking constructors, and not to array construction.
Thus, the JLS is specific and requires this behaviour. javac is simply following the spec.

Pros and cons of casting vs. providing a method that returns the required type (Java)

I'm doing a bit of playing about to learn a framework I'm contributing to, and an interesting question came up. EDIT: I'm doing some basic filters in the Okapi Framework, as described in this guide, note that the filter must return different event types to be useful, and that resources must be used by reference (as the same resource may be used in other filters later). Here's the code I'm working with:
while (filter.hasNext()) {
Event event = filter.next();
if (event.isTextUnit()) {
TextUnit tu = (TextUnit)event.getResource();
if (tu.isTranslatable()) {
//do something with it
}
}
}
Note the cast of the resource to a TextUnit object on line 4. This works, I know it's a TextUnit because events that are isTextUnit() will always have a TextUnit resource. However, an alternative would be to add an asTextUnit() method to the IResource interface that returns the event as a TextUnit (as well as equivalent methods for each common resource type), so that the line would become:
TextUnit tu = event.getResource().asTextUnit;
Another approach might be providing a static casting method in TextUnit itself, along the lines of:
TextUnit tu = TextUnit.fromResource(event.getResource());
My question is: what are some arguments for doing it one way or the other? Are there performance differences?
The main advantage I can think of with asTextUnit() (or .fromResource) is that more appropriate exceptions could be thrown if someone tries to get a resource as the wrong type (i.e. with a message like "Cannot get this RawDocument type resource as a TextUnit - use asRawDocument()" or "The resource is not a TextUnit").
The main disadvantages I can think of with .asTextUnit() is that each resource type would then have to implement all the methods (most of which will just throw an exception), and if another major resource type is added there would be some refactoring to add the new method to every resource type (although there's no reason the .asSomething() methods would have to be defined for every possible type, the less common resources could just be cast, although this would lead to inconsistency of approach). This wouldn't be a problem with .fromResource() since it's just one method per type, and could be added or not per type depending on preference.
If the aim is to test an object's type and cast it, then I don't see any value in creating / using custom isXyz and asXyz methods. You just end up with a bunch of extra methods that make little difference to code readability.
Re: your point about appropriate exception messages, I would say that it is most likely not worth it. It is reasonable to assume that not having a TextUnit when a TextUnit is expected is symptom of a bug somewhere. IMO, it is not worthwhile trying to provide "user friendly" diagnostics for bugs. The person that the information is aimed at is a Java programmer, and for that person the default message and stacktrace for a regular ClassCastException (and the source code) provides all of the information required. (Translating it into pretty language adds no real value.)
On the flip-side, the performance differences between the two forms are not likely to be significant. But consider this:
if (x instanceof Y) {
((Y) x).someYMethod();
}
versus
if (x.isY()) {
x.asY().someYMethod();
}
boolean isY(X x) { return x instanceof Y; }
Y asY(X x) { return (Y) x; }
The optimizer might be able to do a better job of the first compared with the second.
It might not inline the method calls in the second case, especially if it is changed to use instanceof and throw a custom exception.
It is less likely to figure out that only one type test is really required in the second case. (It might not in the first case either ... but it is more likely to.)
But either way, the performance difference is going to be small.
Summary, the fancy methods are not really worth the effort, though they don't do any real harm.
Now if the isXyz or asXyz methods were testing the state of the object (not just the object's Java type), or if the asXyz was returning a wrapper, then the answers would be different ...
You could also just go
if (event instanceof TextUnit) {
// ...
}
and save yourself the trouble.
To answer your question regarding whether to go asTextUnit() vs. TextUnit.fromResource, the performance difference would depend upon how you actually implement these methods.
In the case of the static converter you would have a to create and return a new object of type TextUnit. However, in the case of the member function you could simply return this casted or you could create an return a new object - depends upon your use case.
Either ways, seems like instanceof is probably the cleanest way here.
What if your filter were extended - or wrapped - to return only text unit events? In fact, what if it returned only the resources of text unit events? Then your loop would be much simpler. I would think the clean way to do this would be a second filter, which simply returned just the text unit events, followed by, let's say, an Extractor, which returned the properly cast resource.
If you have a common base class, you can have a single asXMethod there for every derived class, and needn't refactor all derived classes:
abstract class Base {
A asA () { throw new InstantiationException ("not an A"); }
B asB () { throw new InstantiationException ("not an B"); }
C asC () { throw new InstantiationException ("not an C"); }
// much more ...
}
class A extends Base {
A asA () { /* hard work */ return new A (); }
// no asB, asC requiered
}
class B extends Base {
B asB () { /* hard work */ return new B (); }
// no asA, asC required
}
// and so on.
looks pretty clever. For a new Class N, just add a new Method to Base, and all derived classes get it. Just N needs to implement asN.
But it smells.
Why should a B have a method asA if it will always fail? That's not a good design. Exceptions in the generator are cheap, if they aren't triggered. Only thrown exceptions might be costly.
Yes, there are differences. Creating new immutable elements is better then casting. Pass all serializable data (non transient or computable data) to a Builder and build appropriate class.

Java: using a RuntimeException to escape from a Visitor

I am being powerfully tempted to use an unchecked exception as a short-circuit control-flow construct in a Java program. I hope somebody here can advise me on a better, cleaner way to handle this problem.
The idea is that I want to cut short the recursive exploration of sub-trees by a visitor without having to check a "stop" flag in every method call. Specifically, I'm building a control-flow graph using a visitor over the abstract syntax tree. A return statement in the AST should stop exploration of the sub-tree and send the visitor back to the nearest enclosing if/then or loop block.
The Visitor superclass (from the XTC library) defines
Object dispatch(Node n)
which calls back via reflection methods of the form
Object visitNodeSubtype(Node n)
dispatch is not declared to throw any exceptions, so I declared a private class that extends RuntimeException
private static class ReturnException extends RuntimeException {
}
Now, the visitor method for a return statement looks like
Object visitReturnStatement(Node n) {
// handle return value assignment...
// add flow edge to exit node...
throw new ReturnException();
}
and every compound statement needs to handle the ReturnException
Object visitIfElseStatement(Node n) {
Node test = n.getChild(0);
Node ifPart = n.getChild(1);
Node elsePart = n.getChild(2);
// add flow edges to if/else...
try{ dispatch(ifPart); } catch( ReturnException e ) { }
try{ dispatch(elsePart); } catch( ReturnException e ) { }
}
This all works fine, except:
I may forget to catch a ReturnException somewhere and the compiler won't warn me.
I feel dirty.
Is there a better way to do this? Is there a Java pattern I am unaware of to implement this kind of non-local flow-of-control?
[UPDATE] This specific example turns out to be somewhat invalid: the Visitor superclass catches and wraps exceptions (even RuntimeExceptions), so the exception throwing doesn't really help. I've implemented the suggestion to return an enum type from visitReturnStatement. Luckily, this only needs to be checked in a small number of places (e.g., visitCompoundStatement), so it's actually a bit less hassle than throwing exceptions.
In general, I think this is still a valid question. Though perhaps, if you are not tied to a third-party library, the entire problem can be avoided with sensible design.
I think this is a reasonable approach for a few reasons:
You are using a 3rd party and are unable to add the checked exception
Checking return values everywhere in a large set of visitors when it's only necessary in a few is an unnecessary burden
Also, there are those that have argued that unchecked exceptions aren't all that bad. Your usage reminds me of Eclipse's OperationCanceledException which is used to blow out of long-running background tasks.
It's not perfect, but, if well documented, it seems ok to me.
Throwing a runtime exception as control logic is definitely a bad idea. The reason you feel dirty is that you're bypassing the type system, i.e. the return type of your methods is a lie.
You have several options that are considerably more clean.
1. The Exceptions Functor
A good technique to use, when you're restricted in the exceptions you may throw, if you can't throw a checked exception, return an object that will throw a checked exception. java.util.concurrent.Callable is an instance of this functor, for example.
See here for a detailed explanation of this technique.
For example, instead of this:
public Something visit(Node n) {
if (n.someting())
return new Something();
else
throw new Error("Remember to catch me!");
}
Do this:
public Callable<Something> visit(final Node n) {
return new Callable<Something>() {
public Something call() throws Exception {
if (n.something())
return new Something();
else
throw new Exception("Unforgettable!");
}
};
}
2. Disjoint Union (a.k.a. The Either Bifunctor)
This technique lets you return one of two different types from the same method. It's a little bit like the Tuple<A, B> technique that most people are familiar with for returning more than one value from a method. However, instead of returning values of both types A and B, this involves returning a single value of either type A or B.
For example, given an enumeration Fail, which could enumerate applicable error codes, the example becomes...
public Either<Fail, Something> visit(final Node n) {
if (n.something())
return Either.<Fail, Something>right(new Something());
else
return Either.<Fail, Something>left(Fail.DONE);
}
Making the call is now much cleaner because you don't need try/catch:
Either<Fail, Something> x = node.dispatch(visitor);
for (Something s : x.rightProjection()) {
// Do something with Something
}
for (Fail f : x.leftProjection()) {
// Handle failure
}
The Either class is not very difficult to write, but a full-featured implementation is provided by the Functional Java library.
3. The Option Monad
A little bit like a type-safe null, this is a good technique to use when you do not want to return a value for some inputs, but you don't need exceptions or error codes. Commonly, people will return what's called a "sentinel value", but Option is considerably cleaner.
You now have...
public Option<Something> visit(final Node n) {
if (n.something())
return Option.some(new Something());
else
return Option.<Something>none();
}
The call is nice and clean:
Option<Something> s = node.dispatch(visitor));
if (s.isSome()) {
Something x = s.some();
// Do something with x.
}
else {
// Handle None.
}
And the fact that it's a monad lets you chain calls without handling the special None value:
public Option<Something> visit(final Node n) {
return dispatch(getIfPart(n).orElse(dispatch(getElsePart(n)));
}
The Option class is even easier to write than Either, but again, a full-featured implementation is provided by the Functional Java library.
See here for a detailed discussion of Option and Either.
Is there a reason you aren't just returning a value? Such as NULL, if you really want to return nothing? That would be a lot simpler, and wouldn't risk throwing an unchecked runtime exception.
I see the following options for you:
Go ahead and define that RuntimeException subclass. Check for serious problems by catching your exception in the most general call to dispatch and reporting that one if it gets that far.
Have the node processing code return a special object if it thinks searching should end abruptly. This still forces you to check return values instead of catching exceptions, but you might like the look of the code better that way.
If the tree walk is to be stopped by some external factor, do it all inside a subthread, and set a synchronized field in that object in order to tell the thread to stop prematurely.
Why are you returning a value from your visitor? The appropriate method of the visitor is called by classes that are being visited. All work done is encapsulated within the visitor class itself, it should return nothing and handle it's own errors. The only obligation required of the calling class is to call the appropriate visitXXX method, nothing more. (This assumes you are using overloaded methods as in your example as opposed to overriding the same visit() method for each type).
The visited class should not be changed by the visitor or have to have any knowledge of what it does, other than it allows the visit to happen. Returning a value or throwing an exception would violate this.
Visitor Pattern
Do you have to use Visitor from XTC? It's a pretty trivial interface, and you could implement your own which can throw checked ReturnException, which you would not forget to catch where needed.
I've not used the XTC library you mention. How does it supply the complementary part of the visitor pattern - the accept(visitor) method on nodes? Even if this is a reflection based dispatcher, there must still be something that handles recursion down the syntax tree?
If this structural iteration code is readily accessible, and you're not already using the return value from your visitXxx(node) methods, could you exploit a simple enumerated return value, or even a boolean flag, telling accept(visitor) not to recurse into child nodes?
If:
accept(visitor) isn't explicitly implemented by nodes (there's some field or accessor reflection going on, or nodes just implement a child-getting interface for some standard control-flow logic, or for any other reason...), and
you don't want to mess with the structural iterating part of the library, or it's not available, or it's not worth the effort...
then as a last resort I guess that exceptions might be your only option whilst still using the vanilla XTC library.
An interesting problem though, and I can understand why exception-based control flow makes you feel dirty...

Categories