Confused on Java Stream Results using peek and findAny - java

I'm new to Java's Stream API, and I'm confused on the results of this case:
Stream<String> stream = Stream.of("A","B","C","D");
System.out.println(stream.peek(System.out::println).findAny().get());
This prints:
A
A
Why does it not print:
A
A
B
B
C
C
D
D

The findAny method doesn't find all the elements; it finds just one element.
Returns an Optional describing some element of the stream, or an empty Optional if the stream is empty.
This is a short-circuiting terminal operation.
The stream is not processed until a terminal method is called, in this case, findAny. But the peek method doesn't execute its action on an element until the element is consumed by the terminal action.
In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements.
The findAny method is short-circuiting, so peek's action will only be called for that element found by findAny.
That is why you only get two A values in the printout. One is printed by the peek method, and you print the second, which is the value inside the Optional returned by findAny.

Related

Why is IllegalStateException not thrown after calling two terminal operations on the same stream?

I was given to understand that collect() and forEach() were both stream terminal operations and that calling them both on the same stream would throw an IllegalStateException. However, the following code compiles successfully and prints the length of each String in ascending order. No exceptions are thrown. How can this be so?
List<String> list = Arrays.asList("ant", "bird", "chimpanzee", "dolphin");
list.stream().collect(Collectors.groupingBy(String::length))
.forEach((a, b) -> System.out.println(a));
The forEach method that you are calling is not the Stream::forEach method, but the Map::forEach method, as you are calling it on the return value of collect(...), which is a Map. A feature of the Map::forEach method is that it takes a BiConsumer, instead of a Consumer. A stream's forEach never takes a lambda with two arguments!
So you are only calling one terminal operation, namely collect on the stream. After that point, you never did anything with the stream again (you started working with the returned Map), which is why no IllegalStateExcepton is thrown.
To actually call two terminal operations on the same stream, you need to put the stream into a variable first:
List<String> list = Arrays.asList("ant", "bird", "chimpanzee", "dolphin");
Stream<String> stream = list.stream(); // you need this extra variable.
stream.collect(Collectors.groupingBy(String::length));
stream.forEach((a) -> System.out.println(a)); // this will throw an exception, as expected
The stream generated for list.stream() is consumed by the collect operation. But the operation, as a result of grouping, produces Map<Integer, List<String>> based on the size of strings.
The forEach then is called upon the entries of this Map produced by collect, hence there is no IllegalStateException thrown for the latter.

Java stream operation invocations

Can anyone point to a official Java documentation which describes how many times Stream will invoke each "non-interfering and stateless" intermediate operation for each element.
For example:
Arrays.asList("1", "2", "3", "4").stream()
.filter(s -> check(s))
.forEach(s -> System.out.println(s));
public boolean check(Object o) {
return true;
}
The above currently will invoke check method 4 times.
Is it possible that in the current or future versions of JDKs the check method gets executed more or less times than the number of elements in the stream created from List or any other standard Java API?
This does not have to do with the source of the stream, but rather the terminal operation and optimization done in the stream implementation itself. For example:
Stream.of(1,2,3,4)
.map(x -> x + 1)
.count();
Since java-9, map will not get executed a single time.
Or:
someTreeSet.stream()
.sorted()
.findFirst();
sorted might not get executed at all, since the source is a TreeSet and getting the first element is trivial, but if this is implemented inside stream API or not, is a different question.
So the real answer here - it depends, but I can't imagine one operation that would get executed more that the numbers of elements in the source.
From the documentation:
Laziness-seeking. Many stream operations, such as filtering, mapping, or duplicate removal, can be implemented lazily, exposing opportunities for optimization. For example, "find the first String with three consecutive vowels" need not examine all the input strings. Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.
By that virtue, because filter is an intermediate operation which creates a new Stream as part of its operation, due to its laziness, it will only ever invoke the filter predicate once per element as part of its rebuilding of the stream.
The only way that your method would possibly have a different number of invocations against it in the stream is if the stream were somehow mutated between states, which given the fact that nothing in a stream actually runs until the terminal operation, would only realistically be possible due to a bug upstream.

Stream on Map doesn't save .map changes

Can someone explain me why the first code example doesn't save the changes I've made with .map on the Map but the second code example does?
First code example:
stringIntegerMap.entrySet().stream()
.map(element -> element.setValue(100));
Second code example:
stringIntegerMap.entrySet().stream()
.map(element -> element.setValue(100))
.forEach(System.out::println);
Also, why does the second code example only print the values and not the whole element (key + value) ?
Your stream operations are lazy-evaluated.
If you do not invoke a terminal operation such as forEach (or collect, etc.), the streaming never actually occurs, hence your setValue is not executed.
Note that modifying the collection/map you are streaming is generally advised against.
Finally, the API for Map.Entry#setValue is here.
You'll notice the method returns:
old value corresponding to the entry
So, when you perform the map operation, the stream generated contains the values.
Some sources here (search for "stream operations and pipelines", and also the part about "non-interference" might help).
Streams are composed of a source, intermediate operations and terminal operations.
The terminal operations start the pipeline processing by lazily gathering elements from the source, then applying intermediate operations and finally executing the terminal operation.
Stream.map is an intermediate operation, whereas Stream.forEach is terminal. So in your first snippet the pipeline processing never starts (hence intermediate operations are never executed), because there's no terminal operation. When you use forEach in your 2nd snippet, then all the pipeline is processed.
Please take a look at the java.util.stream package docs, where there's extensive information about streams and how to use them properly (i.e. you shouldn't modify the source of the stream from within intermediate or final operations, such as you're doing in Stream.map).
Edit:
As to your final question:
why does the second code example only print the values and not the whole element (key + value) ?
Mena's answer explains it well: Map.Entry.setValue not only sets the given value to the entry, but also returns the old value. As you're using Map.Entry.setValue in a lambda expression within the Stream.map intermediate operation, you're actually transforming each Map.Entry element of the stream into the value it had before setting a new value. So, what arrives to Stream.forEach are the old values of the map, while the map has new values set by means of the side-effect produced by Map.Entry.setValue.

what does parallelstream().map().map() do?

I have a Collection with encoded objects (which are quite big when not encoded) and I was wondering what actually happens if I would do something like:
codes.parallelStream().map(code -> decode(code)).map(obj -> do1(obj) * do2(obj));
As I didn't find much more info about this kind of constructions, I suppose this first decodes all elements and only afterwards performs the real task, but on the other hand it would be more logical (and memory-friendly in case of big objects) in case of a parallelStream if it would execute both maps at once for every element, like if there was standing:
codes.parallelStream().map(code -> { obj = decode(code); return do1(obj) * do2(obj); });
Could anybody help me understand how this works?
The map operation is evaluated lazily. Therefore the decode operation in the first map call will only be performed if the encoded object is evaluated by the terminal operation of the Stream. Therefore your I suppose this first decodes all elements and only afterwards performs the real task assumption is false, since the terminal operation may require just few of the elements of the source Collection to be processed, so neither of the 2 map operations will be performed for most of the encoded elements in this case.
An intermediate Stream operation may be processed for all the elements of the Stream only if it requires all the elements (for example sorted() must iterate over all the elements), or if it precedes an intermediate operation that requires all the elements (for example in ...map().sorted()..., executing sorted() requires first executing map() on all the elements of the Stream).
Your two code snippets should behave similarly, though the first one is more readable.

Where is defined the combination order of the combiner of collect(supplier, accumulator, combiner)?

The Java API documentations states that the combiner parameter of the collect method must be:
an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function
A combiner is a BiConsumer<R,R> that receives two parameters of type R and returns void. But the documentation does not state if we should combine the elements into the first or the second parameter?
For instance the following examples may give different results, depending on the order of combination be: m1.addAll(m2) or m2.addAll(m1).
List<String> res = LongStream
.rangeClosed(1, 1_000_000)
.parallel()
.mapToObj(n -> "" + n)
.collect(ArrayList::new, ArrayList::add,(m1, m2) -> m1.addAll(m2));
I know that in this case we could simply use a method handle, such as ArrayList::addAll. Yet, there are some cases where it is required a Lambda and we must combine the items in the correct order, otherwise we could get an inconsistent result when processing in parallel.
Is this claimed in any part of the Java 8 API documentation? Or it really doesn't matter?
Of course, it matters, as when you use m2.addAll(m1) instead of m1.addAll(m2), it doesn’t just change the order of elements, but completely breaks the operation. Since a BiConsumer doesn’t return a result, you have no control over which object the caller will use as the result and since the caller will use the first one, modifying the second instead will cause data loss.
There is a hint if you look at the accumulator function which has the type BiConsumer<R,? super T>, in other words can’t do anything else than storing the element of type T, provided as second argument, into the container of type R, provided as first argument.
If you look at the documentation of Collector, which uses a BinaryOperator as combiner function, hence allows the combiner to decide which argument to return (or even an entirely different result instance), you find:
The associativity constraint says that splitting the computation must produce an equivalent result. That is, for any input elements t1 and t2, the results r1 and r2 in the computation below must be equivalent:
A a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
R r1 = finisher.apply(a1); // result without splitting
A a2 = supplier.get();
accumulator.accept(a2, t1);
A a3 = supplier.get();
accumulator.accept(a3, t2);
R r2 = finisher.apply(combiner.apply(a2, a3)); // result with splitting
So if we assume that the accumulator is applied in encounter order, the combiner has to combine the first and second argument in left-to-right order to produce an equivalent result.
Now, the three-arg version of Stream.collect has a slightly different signature, using a BiConsumer as combiner exactly for supporting method references like ArrayList::addAll. Assuming consistency throughout all these operations and considering the purpose of this signature change, we can safely assume that it has to be the first argument which is the container to modify.
But it seems that this is a late change and the documentation hasn’t adapted accordingly. If you look at the Mutable reduction section of the package documentation, you will find that it has been adapted to show the actual Stream.collect’s signature and usage examples, but repeats exactly the same definition regarding the associativity constraint as shown above, despite the fact that finisher.apply(combiner.apply(a2, a3)) doesn’t work if combiner is a BiConsumer…
The documentation issue has been reported as JDK-8164691 and addressed in Java 9. The new documentation says:
combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.
Seems that this is not explicitly stated in the documentation. However there's an ordering concept in streams API. Stream can be either ordered or not. It may be unordered from the very beginning if source spliterator is unordered (for example, if the stream source is HashSet). Or the stream may become unordered if user explicitly used unordered() operation. If the stream is ordered, then collection procedure should also be stable, thus, I guess, it's assumed that for ordered streams the combiner receives the arguments in the strict order. However it's not guaranteed for an unordered stream.

Categories