Optional doesn't play well with Java Stream? - java

I'm teaching classes on the new Java constructs. I've already introduced my students to Optional<T>. Imagining that there is a Point.getQuadrant() which returns an Optional<Quadrant> (because some points lie in no quadrant at all), we can add the quadrant to a Set<Quadrant> if the point lies in a positive X quadrant, like this:
Set<Quadrant> quadrants = ...;
Optional<Point> point = ...;
point
.filter(p -> p.getX() > 0)
.flatMap(Point::getQuadrant)
.ifPresent(quadrants::add);
(That's off the top of my head; let me know if I made a mistake.)
My next step was to tell my students, "Wouldn't it be neat if you could do the same internal filtering and decisions using functions, but with multiple objects? Well that's what streams are for!" It seemed the perfect segue. I was almost ready to write out the same form but with a list of points, using streams:
Set<Point> points = ...;
Set<Quadrant> quadrants = points.stream()
.filter(p -> p.getX() > 0)
.flatMap(Point::getQuadrant)
.collect(Collectors.toSet());
But wait! Will Stream.flatMap(Point::getQuadrant) correctly unravel the Optional<Quadrant>? It appears not...
I had already read Using Java 8's Optional with Stream::flatMap , but I had thought that what was discussed there was some esoteric side case with no real-life relevance. But now that I'm playing it out, I see that this is directly relevant to most anything we do.
Put simply, if Java now expects us to use Optional<T> as the new optional-return idiom; and if Java now encourages us to use streams for processing our objects, wouldn't we expect to encounter mapping to an optional value all over the place? Do we really have to jump through series of .filter(Optional::isPresent).map(Optional::get) hoops as a workaround until Java 9 gets here years from now?
I'm hoping I just misinterpreted the other question, that I'm just misunderstanding something simple, and that someone can set me straight. Surely Java 8 doesn't have this big of a blind spot, does it?

I’m not sure where your question is aiming at. The answer of the linked question already states that this is a flaw of Java 8 that is addressed in Java 9. So there’s no sense in asking whether this really is a flaw in Java 8.
Besides that, the answer also mentions .flatMap(o -> o.isPresent()? Stream.of(o.get()): Stream.empty()) for converting the optional, rather than your .filter(Optional::isPresent) .map(Optional::get). Nevertheless, you can do it even simpler:
Instead of
.flatMap(Point::getQuadrant)
you can write
.flatMap(p -> p.getQuadrant().map(Stream::of).orElse(null))
as an alternative to Java 9’s
.flatMap(p -> p.getQuadrant().stream())
Note that
.map(Point::getQuadrant).flatMap(Optional::stream)
isn’t so much better compared to the alternatives, unless you have an irrational affinity to method references.

Related

Why does Java use -> instead of => for lambda functions?

I am a .NET and JavaScript developer. Now I am working in Java, too.
In .NET LINQ and JavaScript arrow functions we have =>.
I know Java lambdas are not the same, but they are very similar. Are there any reasons (technical or non technical) that made java choose -> instead of =>?
On September 8, 2011, Brian Goetz of Oracle announced to the OpenJDK mailing list that the syntax for lambdas in Java had been mostly decided, but some of the "fine points" like which type of arrow to use were still up in the air:
This just in: the EG has (mostly) made a decision on syntax.
After considering a number of alternatives, we decided to essentially
adopt the C# syntax. We may still deliberate further on the fine points
(e.g., thin arrow vs fat arrow, special nilary form, etc), and have not
yet come to a decision on method reference syntax.
On September 27, 2011, Brian posted another update, announcing that the -> arrow would be used, in preference to C#'s (and the Java prototype's) usage of =>:
Update on syntax: the EG has chosen to stick with the -> form of the
arrow that the prototype currently uses, rather than adopt the =>.
He goes on to provide some description of the rationale considered by the committee:
You could think of this in two ways (I'm sure I'll hear both):
This is much better, as it avoids some really bad interactions with existing operators, such as:
x => x.age <= 0; // duelling arrows
or
Predicate p = x => x.size == 0; // duelling equals
What a bunch of idiots we are, in that we claimed the goal of doing what other languages did, and then made gratuitous changes "just for the sake of doing something different".
Obviously we don't think we're idiots, but everyone can have an opinion :)
In the end, this was viewed as a small tweak to avoid some undesirable
interactions, while preserving the overall goal of "mostly looks like
what lambdas look like in other similar languages."
Howard Lovatt replied in approval of the decision to prefer ->, writing that he "ha[s] had trouble reading Scala code". Paul Benedict of Apache concurred:
I am glad too. Being consistent with other languages is a laudable goal, but
since programming languages aren't identical, the needs for Java can lead to
a different conclusion. The fat arrow syntax does look odd; I admit it. So
in terms of vanity, I am glad to see that punted. The equals character is
just too strongly associated with assignment and equality.
Paigan Jadoth chimed in, too:
I find the "->" much better than "=>". If arrowlings at all instead of the
more regular "#(){...}" pattern, then something definitely distinct from the
gte/lte tokens is clearly better. And "because the others do that" has never
been a good argument, anyway :D.
In summary, then, after considering arguments on both sides, the committee felt that consistency with other languages (=> is used in Scala and C#) was less compelling than clear differentiation from the equality operators, which made -> win out.
But Lieven Lemiengre was skeptical:
Other languages (such as Scala or Groovy) don't have this problem because
they support some placeholder syntax.
In reality you don't write "x => x.age <= 0;"
But this is very common "someList.partition(x => x.age <= 18)" and I agree
this looks bad. Other languages make this clearer using placeholder syntax
"someList.partition(_.age <= 18)" or "someList.partition(it.age <= 18)"
I hope you are considering something like this, these little closures will
be used a lot!
(And I don't think replacing '=>' with '->' will help a lot)
Other than Lieven, I didn't see anyone who criticized the choice of -> and defended => replying on that mailing list. Of course, as Brian predicted, there were almost certainly opinions on both sides, but ultimately, a choice just has to be made in these types of matters, and the committee made the one they did for the stated reasons.

Java lambda expression best practices

Java stream operators can sometimes become very cumbersome and hard to debug. Is there a best practice guideline as to how complex your lambda expression should be beyond which it is better to write an elaborate multi-statement piece of code?
For example, I came across below two statement code for finding factorial which was hard to understand:
Stream<Pair> allFactorials = Stream.iterate(
new Pair(BigInteger.ONE, BigInteger.ONE),
x -> new Pair(
x.num.add(BigInteger.ONE),
x.value.multiply(x.num.add(BigInteger.ONE))));
return allFactorials.filter(
(x) -> x.num.equals(num)).findAny().get().value;
Java streams are not a silver bullet and you don't have to use it everywhere. There are a lot of cases where you could solve the problem using standard for loop approach.
As for your code.. It is a good example where streams complicate the code and make it hard to understand/maintain. But still there are few solutions are more readable.
Take a look:
calculating factorial using Java 8 IntStream?

Is use of AtomicInteger for indexing in Stream a legit way?

I would like to get an answer pointing out the reasons why the following idea described below on a very simple example is commonly considered bad and know its weaknesses.
I have a sentence of words and my goal is to make every second one to uppercase. My starting point for both of the cases is exactly the same:
String sentence = "Hi, this is just a simple short sentence";
String[] split = sentence.split(" ");
The traditional and procedural approach is:
StringBuilder stringBuilder = new StringBuilder();
for (int i=0; i<split.length; i++) {
if (i%2==0) {
stringBuilder.append(split[i]);
} else {
stringBuilder.append(split[i].toUpperCase());
}
if (i<split.length-1) { stringBuilder.append(" "); }
}
When want to use java-stream the use is limited due the effectively-final or final variable constraint used in the lambda expression. I have to use the workaround using the array and its first and only index, which was suggested in the first comment of my question How to increment a value in Java Stream. Here is the example:
int index[] = {0};
String result = Arrays.stream(split)
.map(i -> index[0]++%2==0 ? i : i.toUpperCase())
.collect(Collectors.joining(" "));
Yeah, it's a bad solution and I have heard few good reasons somewhere hidden in comments of a question I am unable to find (if you remind me some of them, I'd upvote twice if possible). But what if I use AtomicInteger - does it make any difference and is it a good and safe way with no side effects compared to the previous one?
AtomicInteger atom = new AtomicInteger(0);
String result = Arrays.stream(split)
.map(i -> atom.getAndIncrement()%2==0 ? i : i.toUpperCase())
.collect(Collectors.joining(" "));
Regardless of how ugly it might look for anyone, I ask for the description of possible weaknesses and their reasons. I don't care the performance but the design and possible weaknesses of the 2nd solution.
Please, don't match AtomicInteger with multi-threading issue. I used this class since it receives, increments and stores the value in the way I need for this example.
As I often say in my answers that "Java Stream-API" is not the bullet for everything. My goal is to explore and find the edge where is this sentence applicable since I find the last snippet quite clear, readable and brief compared to StringBuilder's snippet.
Edit: Does exist any alternative way applicable for the snippets above and all the issues when it’s needed to work with both item and index while iteration using Stream-API?
The documentation of the java.util.stream package states that:
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
[...]
The ordering of side-effects may be surprising. Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source (for example, IntStream.range(0,5).parallel().map(x -> x*2).toArray() must produce [0, 2, 4, 6, 8]), no guarantees are made as to the order in which the mapper function is applied to individual elements, or in what thread any behavioral parameter is executed for a given element.
This means that the elements may be processed out of order, and thus the Stream-solutions may produce wrong results.
This is (at least for me) a killer argument against the two Stream-solutions.
By the process of elimination, we only have the "traditional solution" left. And honestly, I do not see anything wrong with this solution. If we wanted to get rid of the for-loop, we could re-write this code using a foreach-loop:
boolean toUpper = false; // 1st String is not capitalized
for (String word : splits) {
stringBuilder.append(toUpper ? word.toUpperCase() : word);
toUpper = !toUpper;
}
For a streamified and (as far as I know) correct solution, take a look at Octavian R.'s answer.
Your question wrt. the "limits of streams" is opinion-based.
The answer to the question (s) ends here. The rest is my opinion and should be regarded as such.
In Octavian R.'s solution, an artificial index-set is created through a IntStream, which is then used to access the String[]. For me, this has a higher cognitive complexity than a simple for- or foreach-loop and I do not see any benefit in using streams instead of loops in this situation.
In Java, comparing with Scala, you must be inventive. One solution without mutation is this one:
String sentence = "Hi, this is just a simple short sentence";
String[] split = sentence.split(" ");
String result = IntStream.range(0, split.length)
.mapToObj(i -> i%2==0 ? split[i].toUpperCase():split[i])
.collect(Collectors.joining(" "));
System.out.println(result);
In Java streams you should avoid the mutation. Your solution with AtomicInteger it's ugly and it's a bad practice.
Kind regards!
As explained in Turing85’s answer, your stream solutions are not correct, as they rely on the processing order, which is not guaranteed. This can lead to incorrect results with parallel execution today, but even if it happens to produce the desired result with a sequential stream, that’s only an implementation detail. It’s not guaranteed to work.
Besides that, there is no advantage in rewriting code to use the Stream API with a logic that basically still is a loop, but obfuscated with a different API. The best way to describe the idea of the new APIs, is to say that you should express what to do but not how.
Starting with Java 9, you could implement the same thing as
String result = Pattern.compile("( ?+[^ ]* )([^ ]*)").matcher(sentence)
.replaceAll(m -> m.group(1)+m.group(2).toUpperCase());
which expresses the wish to replace every second word with its upper case form, but doesn’t express how to do it. That’s up to the library, which likely uses a single StringBuilder instead of splitting into an array of strings, but that’s irrelevant to the application logic.
As long as you’re using Java 8, I’d stay with the loop and even when switching to a newer Java version, I would consider replacing the loop as not being an urgent change.
The pattern in the above example has been written in a way to do exactly the same as your original code splitting at single space characters. Usually, I’d encode “replace every second word” more like
String result = Pattern.compile("(\\w+\\W+)(\\w+)").matcher(sentence)
.replaceAll(m -> m.group(1)+m.group(2).toUpperCase());
which would behave differently when encountering multiple spaces or other separators, but usually is closer to the actual intention.

In regards to for(), why use i++ rather than ++i?

Perhaps it doesn't matter to the compiler once it optimizes, but in C/C++, I see most people make a for loop in the form of:
for (i = 0; i < arr.length; i++)
where the incrementing is done with the post fix ++. I get the difference between the two forms. i++ returns the current value of i, but then adds 1 to i on the quiet. ++i first adds 1 to i, and returns the new value (being 1 more than i was).
I would think that i++ takes a little more work, since a previous value needs to be stored in addition to a next value: Push *(&i) to stack (or load to register); increment *(&i). Versus ++i: Increment *(&i); then use *(&i) as needed.
(I get that the "Increment *(&i)" operation may involve a register load, depending on CPU design. In which case, i++ would need either another register or a stack push.)
Anyway, at what point, and why, did i++ become more fashionable?
I'm inclined to believe azheglov: It's a pedagogic thing, and since most of us do C/C++ on a Window or *nix system where the compilers are of high quality, nobody gets hurt.
If you're using a low quality compiler or an interpreted environment, you may need to be sensitive to this. Certainly, if you're doing advanced C++ or device driver or embedded work, hopefully you're well seasoned enough for this to be not a big deal at all. (Do dogs have Buddah-nature? Who really needs to know?)
It doesn't matter which you use. On some extremely obsolete machines, and in certain instances with C++, ++i is more efficient, but modern compilers don't store the result if it's not stored. As to when it became popular to postincriment in for loops, my copy of K&R 2nd edition uses i++ on page 65 (the first for loop I found while flipping through.)
For some reason, i++ became more idiomatic in C, even though it creates a needless copy. (I thought that was through K&R, but I see this debated in other answers.) But I don't think there's a performance difference in C, where it's only used on built-ins, for which the compiler can optimize away the copy operation.
It does make a difference in C++, however, where i might be a user-defined type for which operator++() is overloaded. The compiler might not be able to assert that the copy operation has no visible side-effects and might thus not be able to eliminate it.
As for the reason why, here is what K&R had to say on the subject:
Brian Kernighan
you'll have to ask dennis (and it might be in the HOPL paper). i have a
dim memory that it was related to the post-increment operation in the
pdp-11, though beyond that i don't know, so don't quote me.
in c++ the preferred style for iterators is actually ++i for some subtle
implementation reason.
Dennis Ritchie
No particular reason, it just became fashionable. The code produced
is identical on the PDP-11, just an inc instruction, no autoincrement.
HOPL Paper
Thompson went a step further by inventing the ++ and -- operators, which increment or decrement; their prefix or postfix position determines whether the alteration occurs before or after noting the value of the operand. They were not in the earliest versions of B, but appeared along the way. People often guess that they were created to use the auto-increment and auto-decrement address modes provided by the DEC PDP-11 on which C and Unix first became popular. This is historically impossible, since there was no PDP-11 when B was developed. The PDP-7, however, did have a few ‘auto-increment’ memory cells, with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own. Indeed, the auto-increment cells were not used directly in implementation of the operators, and a stronger
motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1.
For integer types the two forms should be equivalent when you don't use the value of the expression. This is no longer true in the C++ world with more complicated types, but is preserved in the language name.
I suspect that "i++" became more popular in the early days because that's the style used in the original K&R "The C Programming Language" book. You'd have to ask them why they chose that variant.
Because as soon as you start using "++i" people will be confused and curios. They will halt there everyday work and start googling for explanations. 12 minutes later they will enter stack overflow and create a question like this. And voila, your employer just spent yet another $10
Going a little further back than K&R, I looked at its predecessor: Kernighan's C tutorial (~1975). Here the first few while examples use ++n. But each and every for loop uses i++. So to answer your question: Almost right from the beginning i++ became more fashionable.
My theory (why i++ is more fashionable) is that when people learn C (or C++) they eventually learn to code iterations like this:
while( *p++ ) {
...
}
Note that the post-fix form is important here (using the infix form would create a one-off type of bug).
When the time comes to write a for loop where ++i or i++ doesn't really matter, it may feel more natural to use the postfix form.
ADDED: What I wrote above applies to primitive types, really. When coding something with primitive types, you tend to do things quickly and do what comes naturally. That's the important caveat that I need to attach to my theory.
If ++ is an overloaded operator on a C++ class (the possibility Rich K. suggested in the comments) then of course you need to code loops involving such classes with extreme care as opposed to doing simple things that come naturally.
At some level it's idiomatic C code. It's just the way things are usually done. If that's your big performance bottleneck you're likely working on a unique problem.
However, looking at my K&R The C Programming Language, 1st edition, the first instance I find of i in a loop (pp 38) does use ++i rather than i++.
Im my opinion it became more fashionable with the creation of C++ as C++ enables you to call ++ on non-trivial objects.
Ok, I elaborate: If you call i++ and i is a non-trivial object, then storing a copy containing the value of i before the increment will be more expensive than for say a pointer or an integer.
I think my predecessors are right regarding the side effects of choosing postincrement over preincrement.
For it's fashonability, it may be as simple as that you start all three expressions within the for statement the same repetitive way, something the human brain seems to lean towards to.
I would add up to what other people told you that the main rule is: be consistent. Pick one, and do not use the other one unless it is a specific case.
If the loop is too long, you need to reload the value in the cache to increment it before the jump to the begining.
What you don't need with ++i, no cache move.
In C, all operators that result in a variable having a new value besides prefix inc/dec modify the left hand variable (i=2, i+=5, etc). So in situations where ++i and i++ can be interchanged, many people are more comfortable with i++ because the operator is on the right hand side, modifying the left hand variable
Please tell me if that first sentence is incorrect, I'm not an expert with C.

Using a "pseudo operator" to distinguish simple repetition from general for loops

I would like to know other people's opinion on the following style of writing a for loop:
for (int rep = numberOfReps; rep --> 0 ;) {
// do something that you simply want to repeat numberOfReps times
}
The reason why I invented this style is to distinguish it from the more general case of for loops. I only use this when I need to simply repeat something numberOfReps times and the body of the loop does not use the values of rep and numberofReps in any way.
As far as I know, standard Java for example doesn't have a simple way of saying "just repeat this N times", and that's why I came up with this. I'd even go as far as saying that the body of the loop must not continue or break, unless explicitly documented at the top of the for loop, because as I said the whole purpose is to make the code easier to understand by coming up with a distinct style to express simple repetitions.
The idea is that if what you're doing is not simple (dependency on value of an inreasing/decreasing index, breaks, continues, etc), then use the standard for loop. If what you are doing is simple repetition, on the other hand, then this distinct style communicates that "fact" (once you know the purpose of the style, of course).
I said "fact" because the style can be abused, of course. I'm operating under the assumption that you have competent programmers whose objective is to make their code easier to understand, not harder.
A comment was made that allude to the principle that for should only be used for simple iteration, and while should be used otherwise (e.g. if the loop variables are modified in the body).
If that's the case, then I'm merely extending that principle to say that if it's even simpler than your simple for loops (i.e. you don't even care about the iteration index, or whether it's increasing or decreasing, etc, you just want to repeat doing something N times), then use the winking arrow for loop construct instead.
What a coincidence, Josh Bloch just tweeted the following:
Goes-to Considered Harmful:
public static void main(String[] a) {
int i = 10;
while (i --> 0) /* i goes-to 0 */ {
System.out.println(i);
}
}
Unfortunately no explanation was given, but it seems that at least this pseudo operator has a name. It has also been discussed before on SO: What is the name of this operator: “-->”?
You have the language-agnostic tag, but this question isn't really language agnostic. That pattern would be fine if there wasn't already a well established idiom for doing something n times in your language.
You go on to mention Java, whicha already has a well-established idiom for doing something n times:
for (int i = 0; i < numberOfReps; i++) {
// do something that you simply want to repeat numberOfReps times
}
While your pattern works just as well, it's confusing to others. When I first saw it my thoughts were:
What's that weird arrow?
Why is that line winking at me?
Unless you develop a pattern that has a significant advantage over the standard idiom, it's best to stick with the standard so your fellow coders don't end up scratching their heads.
Nearly every language these days has lambda, so you can write a function like
nTimes(n, body)
that takes an int and a lambda, and more directly communicate intent. In F#, for example
let nTimes(n,f) =
for i in 1..n do f()
nTimes(3, fun() -> printfn "Hello")
or if you prefer extension methods
type System.Int32 with
member this.Times(f) =
for i in 1..this do f()
(3).Times(fun() -> printfn "Hello")

Categories