I have a Collection with encoded objects (which are quite big when not encoded) and I was wondering what actually happens if I would do something like:
codes.parallelStream().map(code -> decode(code)).map(obj -> do1(obj) * do2(obj));
As I didn't find much more info about this kind of constructions, I suppose this first decodes all elements and only afterwards performs the real task, but on the other hand it would be more logical (and memory-friendly in case of big objects) in case of a parallelStream if it would execute both maps at once for every element, like if there was standing:
codes.parallelStream().map(code -> { obj = decode(code); return do1(obj) * do2(obj); });
Could anybody help me understand how this works?
The map operation is evaluated lazily. Therefore the decode operation in the first map call will only be performed if the encoded object is evaluated by the terminal operation of the Stream. Therefore your I suppose this first decodes all elements and only afterwards performs the real task assumption is false, since the terminal operation may require just few of the elements of the source Collection to be processed, so neither of the 2 map operations will be performed for most of the encoded elements in this case.
An intermediate Stream operation may be processed for all the elements of the Stream only if it requires all the elements (for example sorted() must iterate over all the elements), or if it precedes an intermediate operation that requires all the elements (for example in ...map().sorted()..., executing sorted() requires first executing map() on all the elements of the Stream).
Your two code snippets should behave similarly, though the first one is more readable.
Related
The Javadoc states that
This is a short-circuiting stateful intermediate operation.
Definition of stateful from Javadoc:
Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.
Stateful operations may need to process the entire input before
producing a result. For example, one cannot produce any results from
sorting a stream until one has seen all elements of the stream. As a
result, under parallel computation, some pipelines containing stateful
intermediate operations may require multiple passes on the data or may
need to buffer significant data. Pipelines containing exclusively
stateless intermediate operations can be processed in a single pass,
whether sequential or parallel, with minimal data buffering.
How is default Stream<T> takeWhile​(Predicate<? super T> predicate) stateful?It does not need look at the entire input, etc...
It's almost like filter but short-circuiting.
Well, takeWhile should process the longest prefix of the Stream that satisfies the given Predicate. This means that in order to know if a given element of the Stream should be processed by takeWhile, you may have to process all the elements preceding it.
Hence, you need to know the state of the processing of the previous elements of the Stream in order to know how to process the current element.
In sequential Streams you don't have to keep state, since once you reach the first element that doesn't match the Predicate, you know you are done.
In parallel Streams, however, this becomes much trickier.
It is stateful in that it changes its behavior based on internal state (whether it has already seen an element matching the predicate). It does not process elements independently from each other. This may disable certain optimizations and may reduce the usefulness of processing in parallel.
So it is stateful in the same way limit and skip are stateful - the outcome does not (only) depend on the current element, but also on elements preceding it.
I'm new to Java's Stream API, and I'm confused on the results of this case:
Stream<String> stream = Stream.of("A","B","C","D");
System.out.println(stream.peek(System.out::println).findAny().get());
This prints:
A
A
Why does it not print:
A
A
B
B
C
C
D
D
The findAny method doesn't find all the elements; it finds just one element.
Returns an Optional describing some element of the stream, or an empty Optional if the stream is empty.
This is a short-circuiting terminal operation.
The stream is not processed until a terminal method is called, in this case, findAny. But the peek method doesn't execute its action on an element until the element is consumed by the terminal action.
In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements.
The findAny method is short-circuiting, so peek's action will only be called for that element found by findAny.
That is why you only get two A values in the printout. One is printed by the peek method, and you print the second, which is the value inside the Optional returned by findAny.
Can anyone point to a official Java documentation which describes how many times Stream will invoke each "non-interfering and stateless" intermediate operation for each element.
For example:
Arrays.asList("1", "2", "3", "4").stream()
.filter(s -> check(s))
.forEach(s -> System.out.println(s));
public boolean check(Object o) {
return true;
}
The above currently will invoke check method 4 times.
Is it possible that in the current or future versions of JDKs the check method gets executed more or less times than the number of elements in the stream created from List or any other standard Java API?
This does not have to do with the source of the stream, but rather the terminal operation and optimization done in the stream implementation itself. For example:
Stream.of(1,2,3,4)
.map(x -> x + 1)
.count();
Since java-9, map will not get executed a single time.
Or:
someTreeSet.stream()
.sorted()
.findFirst();
sorted might not get executed at all, since the source is a TreeSet and getting the first element is trivial, but if this is implemented inside stream API or not, is a different question.
So the real answer here - it depends, but I can't imagine one operation that would get executed more that the numbers of elements in the source.
From the documentation:
Laziness-seeking. Many stream operations, such as filtering, mapping, or duplicate removal, can be implemented lazily, exposing opportunities for optimization. For example, "find the first String with three consecutive vowels" need not examine all the input strings. Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.
By that virtue, because filter is an intermediate operation which creates a new Stream as part of its operation, due to its laziness, it will only ever invoke the filter predicate once per element as part of its rebuilding of the stream.
The only way that your method would possibly have a different number of invocations against it in the stream is if the stream were somehow mutated between states, which given the fact that nothing in a stream actually runs until the terminal operation, would only realistically be possible due to a bug upstream.
Can someone explain me why the first code example doesn't save the changes I've made with .map on the Map but the second code example does?
First code example:
stringIntegerMap.entrySet().stream()
.map(element -> element.setValue(100));
Second code example:
stringIntegerMap.entrySet().stream()
.map(element -> element.setValue(100))
.forEach(System.out::println);
Also, why does the second code example only print the values and not the whole element (key + value) ?
Your stream operations are lazy-evaluated.
If you do not invoke a terminal operation such as forEach (or collect, etc.), the streaming never actually occurs, hence your setValue is not executed.
Note that modifying the collection/map you are streaming is generally advised against.
Finally, the API for Map.Entry#setValue is here.
You'll notice the method returns:
old value corresponding to the entry
So, when you perform the map operation, the stream generated contains the values.
Some sources here (search for "stream operations and pipelines", and also the part about "non-interference" might help).
Streams are composed of a source, intermediate operations and terminal operations.
The terminal operations start the pipeline processing by lazily gathering elements from the source, then applying intermediate operations and finally executing the terminal operation.
Stream.map is an intermediate operation, whereas Stream.forEach is terminal. So in your first snippet the pipeline processing never starts (hence intermediate operations are never executed), because there's no terminal operation. When you use forEach in your 2nd snippet, then all the pipeline is processed.
Please take a look at the java.util.stream package docs, where there's extensive information about streams and how to use them properly (i.e. you shouldn't modify the source of the stream from within intermediate or final operations, such as you're doing in Stream.map).
Edit:
As to your final question:
why does the second code example only print the values and not the whole element (key + value) ?
Mena's answer explains it well: Map.Entry.setValue not only sets the given value to the entry, but also returns the old value. As you're using Map.Entry.setValue in a lambda expression within the Stream.map intermediate operation, you're actually transforming each Map.Entry element of the stream into the value it had before setting a new value. So, what arrives to Stream.forEach are the old values of the map, while the map has new values set by means of the side-effect produced by Map.Entry.setValue.
As long as the documentation defines the so called encounter order I think it's reasonble to ask if we can reverse that encounter order somehow. Looking at the API streams provide us with, I didn't find anything related to ordering except sorted().
If I have a stream produced say from a List can I swap two elements of that stream and therefore producing another stream with the modified encounter order.
Does it even make sense to talking about "swapping" elements in a stream or the specification say nothing about it.
Java Stream API have no dedicated operations to reverse the encounter order or swap elements in pairs or something like this. Please note that the Stream source can be once-off (like network socket or stream of generated random numbers), so in general case you cannot make it backwards without storing everything in the memory. That's actually how sorting operation works: it dumps the whole stream content into the intermediate array, sorts it, then performs a downstream computation. So were reverse operation implemented it would work in the same way.
For particular sources like random-access list you may create reversed stream using, for example, this construct
List<T> list = ...;
Stream<T> stream = IntStream.rangeClosed(1, list.size())
.mapToObj(i -> list.get(list.size()-i));