Lambda expression vs method reference - java

IntelliJ keeps proposing me to replace my lambda expressions with method references.
Is there any objective difference between both of them?

Let me offer some perspective on why we added this feature to the language, when clearly we didn't strictly need to (all methods refs can be expressed as lambdas.)
Note that there is no right answer. Anyone who says "always use a method ref instead of a lambda" or "always use a lambda instead of a method ref" should be ignored.
This question is very similar in spirit to "when should I use a named class vs an anonymous class"? And the answer is the same: when you find it more readable. There are certainly cases that are definitely one or definitely the other but there's a host of grey in the middle, and judgment must be used.
The theory behind method refs is simple: names matter. If a method has a name, then referring to it by name, rather than by an imperative bag of code that ultimately just turns around and invokes it, is often (but not always!) more clear and readable.
The arguments about performance or about counting characters are mostly red herrings, and you should ignore them. The goal is writing code that is crystal clear what it does. Very often (but not always!) method refs win on this metric, so we included them as an option, to be used in those cases.
A key consideration about whether method refs clarify or obfuscate intent is whether it is obvious from context what is the shape of the function being represented. In some cases (e.g., map(Person::getLastName), it's quite clear from the context that a function that maps one thing to another is required, and in cases like this, method references shine. In others, using a method ref requires the reader to wonder about what kind of function is being described; this is a warning sign that a lambda might be more readable, even if it is longer.
Finally, what we've found is that most people at first steer away from method refs because they feel even newer and weirder than lambdas, and so initially find them "less readable", but over time, when they get used to the syntax, generally change their behavior and gravitate towards method references when they can. So be aware that your own subjective initial "less readable" reaction almost certainly entails some aspect of familiarity bias, and you should give yourself a chance to get comfortable with both before rendering a stylistic opinion.

Long lambda expressions consisting of several statements may reduce the readability of your code. In such a case, extracting those statements in a method and referencing it may be a better choice.
The other reason may be re-usability. Instead of copy&pasting your lambda expression of few statements, you can construct a method and call it from different places of your code.

As user stuchl4n3k wrote in comments to question there may exception occurs.
Lets consider that some variable field is uninitialized field, then:
field = null;
runThisLater(()->field.method());
field = new SomeObject();
will not crash, while
field = null;
runThisLater(field::method);
field = new SomeObject();
will crash with java.lang.NullPointerException: Attempt to invoke virtual method 'java.lang.Class java.lang.Object.getClass()', at a method reference statement line, at least on Android.
Todays IntelliJ notes "may change semantics" while suggesting this refactoring.
This happens when do "referencing" of instance method of a particular object. Why?
Lets check first two paragraphs of
15.13.3. Run-Time Evaluation of Method References:
At run time, evaluation of a method reference expression is similar to evaluation of a
class instance creation expression, insofar as normal completion produces a reference
to an object. Evaluation of a method reference expression is distinct from invocation
of the method itself.
First, if the method reference expression begins with an ExpressionName or a
Primary, this subexpression is evaluated. If the subexpression evaluates to null, a
NullPointerException is raised, and the method reference expression completes
abruptly. If the subexpression completes abruptly, the method reference expression
completes abruptly for the same reason.
In case of lambda expression, I'm unsure, final type is derived in compile-time from method declaration. This is just simplification of what is going exactly. But lets assume that method runThisLater has been declared as e.g.
void runThisLater(SamType obj), where SamType is some Functional interface then runThisLater(()->field.method()); translates into something like:
runThisLater(new SamType() {
void doSomething() {
field.method();
}
});
Additional info:
15.27.4. Run-Time Evaluation of Lambda Expressions
Translation of Lambda Expressions
State of the Lambda, version 3, where SAM was mentioned.
State of the Lambda, final.

While it is true that all methods references can be expressed as lambdas, there is a potential difference in semantics when side effects are involved. #areacode's example throwing an NPE in one case but not in the other is very explicit regarding the involved side effect. However, there is a more subtle case you could run into when working with CompletableFuture:
Let's simulate a task that takes a while (2 seconds) to complete via the following helper function slow:
private static <T> Supplier<T> slow(T s) {
return () -> {
try {
Thread.sleep(2000);
} catch (InterruptedException e) {}
return s;
};
}
Then
var result =
CompletableFuture.supplyAsync(slow(Function.identity()))
.thenCompose(supplyAsync(slow("foo"))::thenApply);
Effectively runs both async tasks in parallel allowing the future to complete after roughly 2 seconds.
On the other hand if we refactor the ::thenApply method reference into a lambda, both async tasks would run sequentially one after each other and the future only completes after about 4 seconds.
Side note: while the example seems contrived, it does come up when you try to regain the applicative instance hidden in the future.

Related

Why is "new String();" a statement but "new int[0];" not?

I just randomly tried seeing if new String(); would compile and it did (because according to Oracle's Java documentation on "Expressions, Statements, and Blocks", one of the valid statement types is "object creation"),
However, new int[0]; is giving me a "not a statement" error.
What's wrong with this? Aren't I creating an array object with new int[0]?
EDIT:
To clarify this question, the following code:
class Test {
void foo() {
new int[0];
new String();
}
}
causes a compiler error on new int[0];, whereas new String(); on its own is fine. Why is one not acceptable and the other one is fine?
The reason is a somewhat overengineered spec.
The idea behind expressions not being valid statements is that they accomplish nothing whatsoever. 5 + 2; does nothing on its own. You must assign it to something, or pass it to something, otherwise why write it?
There are exceptions, however: Expressions which, on their own, will (or possibly will) have side effects. For example, whilst this is illegal:
void foo(int a) {
a + 1;
}
This is not:
void foo(int a) {
a++;
}
That is because, on its own, a++ is not completely useless, it actually changes things (a is modified by doing this). Effectively, 'ignoring the value' (you do nothing with a + 1 in that first snippet) is acceptable if the act of producing the value on its own causes other stuff to happen: After all, maybe that is what you were after all along.
For that reason, invoking methods is also a legit expressionstatement, and in fact it is quite common that you invoke methods (even ones that don't return void), ignoring the return value. For void methods it's the only legal way to invoke them, even.
Constructors are technically methods and can have side effects. It is extremely unlikely, and quite bad code style, if this method:
void doStuff() {
new Something();
}
is 'sensible' code, but it could in theory be written, bad as it may be: The constructor of the Something class may do something useful and perhaps that's all you want to do here: Make that constructor run, do the useful thing, and then take the created object and immediately toss it in the garbage. Weird, but, okay. You're the programmer.
Contrast with:
new Something[10];
This is different: The compiler knows what the array 'constructor' does. And what it does is nothing useful - it creates an object and returns a reference to the object, and that is all that happens. If you then instantly toss the reference in the garbage, then the entire operation was a complete waste of time, and surely you did not intend to do nothing useful with such a bizarre statement, so the compiler designers thought it best to just straight up disallow you from writing it.
This 'oh dear that code makes no sense therefore I shall not compile it' is very limited and mostly an obsolete aspect of the original compiler spec; it's never been updated and this is not a good way to trust that code is sensible; there's all sorts of linter tools out there that go vastly further in finding you code that just cannot be right, so if you care about that sort of thing, invest in learning those.
Nevertheless, the java 1.0 spec had this stuff baked in and there is no particularly good reason to drop this aspect of the java spec, therefore, it remains, and constructing a new array is not a valid ExpressionStatement.
As JLS §14.8 states, specifically, a ClassInstanceCreationExpression is in the list of valid expressionstatements. Click that word to link to the definition of ClassInstanceCreationExpression and you'll find that it specifically refers to invoking constructors, and not to array construction.
Thus, the JLS is specific and requires this behaviour. javac is simply following the spec.

Where do void function/property calls execute to?

Here's code:
fun main(args: Array<String>){
val items = listOf(1, 2, 3, 4)
items.first()
items.last()
items.filter { it % 2 == 0 }
}
I have some extension methods like first() and last() - but they arn't doing anything (not being a assigned to a variable of anything). Does this mean the compiler just skips over them and doesn't do anything?
but they arn't doing anything (not being a assigned to a variable of anything)
So far as the compiler knows, they could have side-effects (e.g. printing something or setting a field) and in this case they'd have to be executed. If they were inline, the compiler could maybe eliminate them as Josh's answer mentions, after inlining. But they aren't, so the compiler can't rely on their definitions (as opposed to signatures): at the runtime there could be a different JAR containing these methods and defining them with side effects.
But JIT will very likely inline them and then eliminate if you run this code enough time; just not immediately.
In principle there could be contracts declaring these methods to be pure and then the compiler could eliminate them. But current contracts don't support this, as far as I know.
The methods get called because you invoked it, but the results you didn't store in a reference variable, it would still be created on the heap if I'm not wrong (immediately eligible for garbage collection) but without a variable reference linked to it.
What you're referring to is called dead code elimination. Here is one related post that addresses a similar question.

Using Anonymous Versus Member Functions [duplicate]

IntelliJ keeps proposing me to replace my lambda expressions with method references.
Is there any objective difference between both of them?
Let me offer some perspective on why we added this feature to the language, when clearly we didn't strictly need to (all methods refs can be expressed as lambdas.)
Note that there is no right answer. Anyone who says "always use a method ref instead of a lambda" or "always use a lambda instead of a method ref" should be ignored.
This question is very similar in spirit to "when should I use a named class vs an anonymous class"? And the answer is the same: when you find it more readable. There are certainly cases that are definitely one or definitely the other but there's a host of grey in the middle, and judgment must be used.
The theory behind method refs is simple: names matter. If a method has a name, then referring to it by name, rather than by an imperative bag of code that ultimately just turns around and invokes it, is often (but not always!) more clear and readable.
The arguments about performance or about counting characters are mostly red herrings, and you should ignore them. The goal is writing code that is crystal clear what it does. Very often (but not always!) method refs win on this metric, so we included them as an option, to be used in those cases.
A key consideration about whether method refs clarify or obfuscate intent is whether it is obvious from context what is the shape of the function being represented. In some cases (e.g., map(Person::getLastName), it's quite clear from the context that a function that maps one thing to another is required, and in cases like this, method references shine. In others, using a method ref requires the reader to wonder about what kind of function is being described; this is a warning sign that a lambda might be more readable, even if it is longer.
Finally, what we've found is that most people at first steer away from method refs because they feel even newer and weirder than lambdas, and so initially find them "less readable", but over time, when they get used to the syntax, generally change their behavior and gravitate towards method references when they can. So be aware that your own subjective initial "less readable" reaction almost certainly entails some aspect of familiarity bias, and you should give yourself a chance to get comfortable with both before rendering a stylistic opinion.
Long lambda expressions consisting of several statements may reduce the readability of your code. In such a case, extracting those statements in a method and referencing it may be a better choice.
The other reason may be re-usability. Instead of copy&pasting your lambda expression of few statements, you can construct a method and call it from different places of your code.
As user stuchl4n3k wrote in comments to question there may exception occurs.
Lets consider that some variable field is uninitialized field, then:
field = null;
runThisLater(()->field.method());
field = new SomeObject();
will not crash, while
field = null;
runThisLater(field::method);
field = new SomeObject();
will crash with java.lang.NullPointerException: Attempt to invoke virtual method 'java.lang.Class java.lang.Object.getClass()', at a method reference statement line, at least on Android.
Todays IntelliJ notes "may change semantics" while suggesting this refactoring.
This happens when do "referencing" of instance method of a particular object. Why?
Lets check first two paragraphs of
15.13.3. Run-Time Evaluation of Method References:
At run time, evaluation of a method reference expression is similar to evaluation of a
class instance creation expression, insofar as normal completion produces a reference
to an object. Evaluation of a method reference expression is distinct from invocation
of the method itself.
First, if the method reference expression begins with an ExpressionName or a
Primary, this subexpression is evaluated. If the subexpression evaluates to null, a
NullPointerException is raised, and the method reference expression completes
abruptly. If the subexpression completes abruptly, the method reference expression
completes abruptly for the same reason.
In case of lambda expression, I'm unsure, final type is derived in compile-time from method declaration. This is just simplification of what is going exactly. But lets assume that method runThisLater has been declared as e.g.
void runThisLater(SamType obj), where SamType is some Functional interface then runThisLater(()->field.method()); translates into something like:
runThisLater(new SamType() {
void doSomething() {
field.method();
}
});
Additional info:
15.27.4. Run-Time Evaluation of Lambda Expressions
Translation of Lambda Expressions
State of the Lambda, version 3, where SAM was mentioned.
State of the Lambda, final.
While it is true that all methods references can be expressed as lambdas, there is a potential difference in semantics when side effects are involved. #areacode's example throwing an NPE in one case but not in the other is very explicit regarding the involved side effect. However, there is a more subtle case you could run into when working with CompletableFuture:
Let's simulate a task that takes a while (2 seconds) to complete via the following helper function slow:
private static <T> Supplier<T> slow(T s) {
return () -> {
try {
Thread.sleep(2000);
} catch (InterruptedException e) {}
return s;
};
}
Then
var result =
CompletableFuture.supplyAsync(slow(Function.identity()))
.thenCompose(supplyAsync(slow("foo"))::thenApply);
Effectively runs both async tasks in parallel allowing the future to complete after roughly 2 seconds.
On the other hand if we refactor the ::thenApply method reference into a lambda, both async tasks would run sequentially one after each other and the future only completes after about 4 seconds.
Side note: while the example seems contrived, it does come up when you try to regain the applicative instance hidden in the future.

Why can I not reference the variable from within a lambda in this case?

I have got the following code, which is somewhat abstracted from a real implementation I had in a Java program:
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = bufferedReader.readLine()) != null) {
String lineReference = line;
runLater(() -> consumeString(lineReference));
}
Here I need to use a reference copy for the lambda expression, when I try to use line I get:
Local variables referenced from a lambda expression must be final or effectively final
It seems rather awkward to me, as all I do to fix it is obtain a new reference to the object, this is something the compiler could also figure out by itself.
So I would say line is effectively final here, as it only gets the assignment in the loop and nowhere else.
Could anyone shed some more light on this and explain why exactly it is needed here and why the compile cannot fix it?
So I would say line is effectively final here, as it only gets the assignment in the loop and nowhere else.
No, it's not final because during the variable's lifetime it is getting assigned a new value on every loop iteration. This is the complete opposite of final.
I get: 'Local variables referenced from a lambda expression must be final or effectively final'. It seems rather awkward to me.
Consider this: You're passing the lambda to runLater(...). When the lambda finally executes, which value of line should it use? The value it had when the lambda was created, or the value it had when the lambda executed?
The rule is that lambdas (appear to) use the current value at time of lambda execution. They do not (appear to) create a copy of the variable. Now, how is this rule implemented in practice?
If line is a static field, it's easy because there is no state for the lambda to capture. The lambda can read the current value of the field whenever it needs to, just as any other code can.
If line is an instance field, that's also fairly easy. The lambda can capture the reference to the object in a private hidden field in each lambda object, and access the line field through that.
If line is a local variable within a method (as it is in your example), this is suddenly not easy. At an implementation level, the lambda expression is in a completely different method, and there is no easy way for outside code to share access to the variable which only exists within the one method.
To enable access to the local variable, the compiler would have to box the variable into some hidden, mutable holder object (such as a 1-element array) so that the holder object could be referenced from both the enclosing method and the lambda, giving them both access to the variable within.
Although that solution would technically work, the behavior it achieves would be undesirable for a bundle of reasons. Allocating the holder object would give local variables an unnatural performance characteristic which would not be obvious from reading the code. (Merely defining a lambda that used a local variable would make the variable slower throughout the method.) Worse than that, it would introduce subtle race conditions into otherwise simple code, depending on when the lambda is executed. In your example, by the time the lambda executes, any number of loop iterations could have happened, or the method might have returned, so the line variable could have any value or no defined value, and almost certainly wouldn't have the value you wanted. So in practice you'd still need the separate, unchanging lineReference variable! The only difference is that the compiler wouldn't require you do to that, so it would allow you to write broken code. Since the lambda could ultimately execute on a different thread, this would also introduce subtle concurrency and thread visibility complexity to local variables, which would require the language to allow the volatile modifier on local variables, and other bother.
So, for the lambda to see the current changing values of local variables would introduce a lot of fuss (and no advantages since you can do the mutable holder trick manually if you ever need to). Instead, the language says no to the whole kerfuffle by simply demanding that the variable be final (or effectively final). That way, the lambda can capture the value of the local variable at lambda creation time, and it doesn't need to worry about detecting changes because it knows there can't be any.
This is something the compiler could also figure out by itself
It did figure it out, which is why it disallows it. The lineReference variable is of absolutely no benefit to the compiler, which could easily capture the current value of line for use in the lambda at each lambda object's creation time. But since the lambda wouldn't detect changes to the variable (which would be impractical and undesirable for the reasons explained above), the subtle difference between capture of fields and capture of locals would be confusing. The "final or effectively final" rule is for the programmer's benefit: it prevents you from wondering why changes to a variable don't appear within a lambda, by preventing you from changing them at all. Here's an example of what would happen without that rule:
String field = "A";
void foo() {
String local = "A";
Runnable r = () -> System.out.println(field + local);
field = "B";
local = "B";
r.run(); // output: "BA"
}
That confusion goes away if any local variables referenced within the lambda are (effectively) final.
In your code, lineReference is effectively final. Its value is assigned exactly once during its lifetime, before it goes out of scope at the end of each loop iteration, which is why you can use it in the lambda.
There is an alternative arrangement of your loop possible by declaring line inside the loop body:
for (;;) {
String line = bufferedReader.readLine();
if (line == null) break;
runLater(() -> consumeString(line));
}
This is allowed because line now goes out of scope at the end of each loop iteration. Each iteration effectively has a fresh variable, assigned exactly once. (However, at a low level the variable is still stored in the same CPU register, so it's not like it has to be repeatedly "created" and "destroyed". What I mean is, there is happily no extra cost to declaring variables inside a loop like this, so it's fine.)
Note: All this is not unique to lambdas. It also applies identically to any classes declared lexically inside the method, from which lambdas inherited the rules.
Note 2: It could be argued that lambdas would be simpler if they followed the rule of always capturing the values of variables they use at lambda creation time. Then there would be no difference in behavior between fields and locals, and no need for the "final or effectively final" rule because it would be well-established that lambdas don't see changes made after lambda creation time. But this rule would have its own uglinesses. As one example, for an instance field x accessed within a lambda, there would be a difference between the behavior of reading x (capturing final value of x) and this.x (capturing final value of this, seeing its field x changing). Language design is hard.
If you would use line instead of lineReference in the lambda expression, you would be passing to your runLater method a lambda expression that would execute consumeString on a String referred by line.
But line keeps changing as you assign new lines to it. When you finally execute the method of the functional interface returned by the lambda expression, only then will it get the current value of line and use it in the call to consumeString. At this point the value of line would not be the same as it was when you passed the lambda expression to the runLater method.

Is repeatedly instantiating an anonymous class wasteful?

I had a remark about a piece of code in the style of:
Iterable<String> upperCaseNames = Iterables.transform(
lowerCaseNames, new Function<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
});
The person said that every time I go through this code, I instantiate this anonymous Function class, and that I should rather have a single instance in, say, a static variable:
static Function<String, String> toUpperCaseFn =
new Function<String, String>() {
public String apply(String input) {
return input.toUpperCase();
}
};
...
Iterable<String> upperCaseNames =
Iterables.transform(lowerCaseNames, toUpperCaseFn);
On a very superficial level, this somehow makes sense; instantiating a class multiple times has to waste memory or something, right?
On the other hand, people instantiate anonymous classes in middle of the code like there's no tomorrow, and it would be trivial for the compiler to optimize this away.
Is this a valid concern?
Fun fact about Hot Spot JVM optimizations, if you instantiate an object that isn't passed outside of the current method, the JVM will perform optimizations at the bytecode level.
Usually, stack allocation is associated with languages that expose the memory model, like C++. You don't have to delete stack variables in C++ because they're automatically deallocated when the scope is exited. This is contrary to heap allocation, which requires you to delete the pointer when you're done with it.
In the Hot Spot JVM, the bytecode is analyzed to decide if an object can "escape" the thread. There are three levels of escape:
No escape - the object is only used within the method/scope it is created, and the object can't be accessed outside the thread.
Local/Arg escape - the object is returned by the method that creates it or passed to a method that it calls, but none of those methods will put that object somewhere that it can be accessed outside of the thread.
Global escape - the object is put somewhere that it can be accessed in another thread.
This basically is analogous to the questions, 1) do I pass it to another method or return it, and 2) do I associate it with something attached to a GC root like a ClassLoader or something stored in a static field?
In your particular case, the anonymous object will be tagged as "local escape", which only means that any locks (read: use of synchronized) on the object will be optimized away. (Why synchronize on something that won't ever be used in another thread?) This is different from "no escape", which will do allocation on the stack. It's important to note that this "allocation" isn't the same as heap allocation. What it really does is allocates space on the stack for all the variables inside the non-escaping object. If you have 3 fields, int, String, and MyObject inside the no-escape object, then three stack variables will be allocated: an int, a String reference, and a MyObject reference – the MyObject instance itself will still be stored in heap unless it is also analyzed to have "no escape". The object allocation is then optimized away and constructors/methods will run using the local stack variables instead of heap variables.
That being said, it sounds like premature optimization to me. Unless the code is later proven to be slow and is causing performance problems, you shouldn't do anything to reduce its readability. To me, this code is pretty readable, I'd leave it alone. This is totally subjective, of course, but "performance" is not a good reason to change code unless it has something to do with its actual running time. Usually, premature optimization results in code that's harder to maintain with minimal performance benefits.
Java 8+ and Lambdas
If allocating anonymous instances still bothers you, I recommend switching to using Lambdas for single abstract method (SAM) types. Lambda evaluation is performed using invokedynamic, and the implementation ends up creating only a single instance of a Lambda on the first invocation. More details can be found in my answer here and this answer here. For non-SAM types, you will still need to allocate an anonymous instance. The performance impact here will be negligible in most use cases, but IMO, it's more readable this way.
References
Escape analysis (wikipedia.org)
HotSpot escape analysis 14 | 11 | 8 (oracle.com)
What is a 'SAM type' in Java? (stackoverflow.com)
Why are Java 8 lambdas invoked using invokedynamic? (stackoverflow.com)
Short answer: No - don't worry.
Long answer: it depends how frequently you're instantiating it. If in a frequently-called tight loop, maybe - though note that when the function is applied it calls String.toUpperCase() once for every item in an Iterable - each call presumably creates a new String, which will create far more GC churn.
"Premature optimization is the root of all evil" - Knuth
Found this thread: Java anonymous class efficiency implications , you may find it interesting
Did some micro-benchmarking. The micro-benchmark was a comparison between: instantiating an (static inner) class per loop iteration, instantiating a (static inner) class once and using it in the loop, and the two similar ones but with anonymous classes. For the micro benchmarking the compiler seemed to extract the anonymous class out of loops and as predicted, promoted the anonymous class to an inner class of the caller. This meant all four methods were indistinguishable in speed. I also compared it to an outside class and again, same speed. The one with anonymous classes probably took ~128 bits of space more
You can check out my micro-benchmark at http://jdmaguire.ca/Code/Comparing.java & http://jdmaguire.ca/Code/OutsideComp.java. I ran this on various values for wordLen, sortTimes, and listLen. As well, the JVM is slow to warm-up so I shuffled the method calls around. Please don't judge me for the awful non-commented code. I program better than that in RL. And Microbenching marking is almost as evil and useless as premature optimization.

Categories