Collect both min and max in one stream - java

I need to print both min and max of a stream of int in one operation. I currently have 2 operations but the second is not allowed. Somehow collectors are not working for me:
Stream<Integer> stringInt = Stream.of(8,50,16,0,72);
System.out.println(stringInt.reduce(Math::min).get());
System.out.println(stringInt.reduce(Math::max).get());

The second is not allowed since stream can not be reused. From Stream javadoc :
A stream should be operated on (invoking an intermediate or terminal stream operation) only once. This rules out, for example, "forked" streams, where the same source feeds two or more pipelines, or multiple traversals of the same stream. A stream implementation may throw IllegalStateException if it detects that the stream is being reused.
You could use collect with Collectors.summarizingInt :
IntSummaryStatistics collect = stringInt.collect(Collectors.summarizingInt(value -> value));
System.out.println(collect.getMax());
System.out.println(collect.getMin());

Related

Streams Optional<Question> find highest scoring Question with

/Hi everyone! I am really struggeling with this methode. I have to find out the question with the highest score and have to filter it with minimumviews.
public Stream<Question> stream() {
Stream<Question> questionStream = Arrays.stream(items);
questionStream.forEach(System.out::println);
return questionStream;
}
public Optional<Question> findHighestScoringQuestionWith(int minimumViews){
return stream()
.sorted(Comparator.comparing(Question::getScore))
.filter(x -> x.getViewCount() >= minimumViews)
.findFirst();
}
//I would be very grateful if someone can help me with this issue. I thank you all in advance.
//My exception
Exception in thread "main" java.lang.IllegalStateException: stream has already been operated upon or closed
at java.base/java.util.stream.AbstractPipeline.<init>(AbstractPipeline.java:203)
at java.base/java.util.stream.ReferencePipeline.<init>(ReferencePipeline.java:94)
at java.base/java.util.stream.ReferencePipeline$StatefulOp.<init>(ReferencePipeline.java:725)
at java.base/java.util.stream.SortedOps$OfRef.<init>(SortedOps.java:126)
at java.base/java.util.stream.SortedOps.makeRef(SortedOps.java:63)
at java.base/java.util.stream.ReferencePipeline.sorted(ReferencePipeline.java:463)
at stackoverflow.Data.sortedStream(Data.java:156)
at stackoverflow.Main.main(Main.java:14)
Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.
- Package Summary for java.util.stream
Stream.forEach is a terminal operation, meaning that it completes a stream pipeline. The whole stream pipeline is evaluated when a terminal operation is invoked, it has been operated upon, as stated in the exception.
If you want to have multiple terminal operations, you need to set up multiple stream pipelines.
To perform some operation on the data mid stream, you can use Stream.peek:
public Stream<Question> stream() {
Stream<Question> questionStream = Arrays.stream(items);
return questionStream.peek(System.out::println); // <-
}
public Optional<Question> findHighestScoringQuestionWith(int minimumViews){
return stream()
.sorted(Comparator.comparing(Question::getScore))
.filter(x -> x.getViewCount() >= minimumViews)
.findFirst();
}
This will print out all items in the stream, but only once a terminal operation is called and the stream is evaluated. In your case, that terminal operation is Stream.findFirst in the findHighestScoringQuestionWith method.
Streams are one-shot objects, you can't use them more than once.
The problem is that you are calling questionStream.forEach in the stream() method, so it is already used up before you return. If you really want to print out the contents, then you could do Arrays.asList(items).forEach(System.out::println);

Difference between forEachOrdered() and sequential() methods of Java 8?

I am working on java 8 parallel stream and wanting to print the elements in parallel stream is some order (say insertion order, reverse order or sequential order).
For which i tried the following code:
System.out.println("With forEachOrdered:");
listOfIntegers
.parallelStream()
.forEachOrdered(e -> System.out.print(e + " "));
System.out.println("");
System.out.println("With Sequential:");
listOfIntegers.parallelStream()
.sequential()
.forEach(e -> System.out.print(e + " "));
And for both of these, i got the same output as follows:
With forEachOrdered:
1 2 3 4 5 6 7 8
With Sequential:
1 2 3 4 5 6 7 8
from the api documentation, i can see that:
forEachOrdered -> This is a terminal operation.
and
sequential -> This is an intermediate operation.
So my question is which one is more better to use?
and in which scenarios, one should be preferred over other?
listOfIntegers.parallelStream().sequential().forEach() creates a parallel Stream and then converts it to a sequential Stream, so you might as well use listOfIntegers.stream().forEach() instead, and get a sequential Stream in the first place.
listOfIntegers.parallelStream().forEachOrdered(e -> System.out.print(e + " ")) performs the operation on a parallel Stream, but guarantees the elements will be consumed in the encounter order of the Stream (if the Stream has a defined encounter order). However, it can be executed on multiple threads.
I don't see a reason of ever using listOfIntegers.parallelStream().sequential(). If you want a sequential Stream, why create a parallel Stream first?
You are asking somehow a misleading question, first you ask about:
.parallelStream()
.forEachOrdered(...)
This will create a parallel Stream, but elements will be consumed in order. If you add a map operation like this:
.map(...)
.parallelStream()
.forEachOrdered(...)
This will make the map very limited (from a parallel processing point of view) operations since threads have to wait for all other elements in encounter order to be processed (consumed by forEachOrdered). This regards stateless operations.
On the other hand if you have a stateful operation like:
.parallelStream()
.map()
.sorted()
.// other operations
Since sorted is stateful, the benefit of the stateless operations before it from a parallel processing will be bigger. And that happens because sorted has to gather all elements from the Stream, and Threads don't have to "wait" (at the forEachOrdered) for the elements in encounter order.
For the second example:
listOfIntegers.parallelStream()
.sequential()
.forEach(e -> System.out.print(e + " "))
you are basically saying turn parallel on and then turn it off. Streams are driven by the terminal operation, so even if you do:
.map...
.filter...
.parallel()
.map...
.sequential
This means that the entire pipeline will be executed sequentially, not that some part will be parallel and the other sequential. You are also relying on the fact that forEach preserves order and may be at the moment it does, but may be in a later release, sine you said you don't care about order (by using forEach in the first place), there will be an internal shuffling of the elements.
Stream pipelines may execute either sequentially or in parallel. This execution mode is a property of the stream. Streams are created with an initial choice of sequential or parallel execution. For example, Collection.stream() creates a sequential stream, and Collection.parallelStream() creates a parallel one. This choice of execution mode may be modified by the BaseStream.sequential() or BaseStream.parallel() methods.
So there is no need to use:
listOfIntegers.parallelStream().sequential()
You can only use:
listOfIntegers.stream()
If you are creating a parallel stream, it is possible for the elements of the stream to be processed by different threads. The difference between forEach and forEachOrdered is that forEach will allow any element of a parallel stream to be processed in any order, while forEachOrdered will always process the elements of a parallel stream in the order of their appearance in the original stream. When using parallelStream() and forEachOrdered is a very good example on how you can take advantage of multiple cores and still preserve the order of the output. Note that forEachOrdered forces the iteration of the elements of the stream in an ordered fashion. However, any operation that is chained before forEachOrdered will still happen in parallel because the stream is a parallel stream.
It is not documented by Oracle exactly what happens when you change the stream execution mode multiple times in a pipeline. It is not clear whether it is the last change that matters or whether operations invoked after calling parallel() can be executed in parallel and operations invoked after calling sequential() will be executed sequentially.

Stream.reduce always preserving order on parallel, unordered stream

I've gone through several previous questions like Encounter order preservation in java stream, this answer by Brian Goetz, as well as the javadoc for Stream.reduce(), and the java.util.stream package javadoc, and yet I still can't grasp the following:
Take this piece of code:
public static void main(String... args) {
final String[] alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".split("");
System.out.println("Alphabet: ".concat(Arrays.toString(alphabet)));
System.out.println(new HashSet<>(Arrays.asList(alphabet))
.parallelStream()
.unordered()
.peek(System.out::println)
.reduce("", (a,b) -> a + b, (a,b) -> a + b));
}
Why is the reduction always* preserving the encounter order?
So far, after several dozen runs, output is the same
First of all unordered does not imply an actual shuffling; all it does it sets a flag for the Stream pipeline - that could later be leveraged.
A shuffle of the source elements could potentially be much more expensive then the operations on the stream pipeline themselves, so the implementation might choose not to do this(like in this case).
At the moment (tested and looked at the sources) of jdk-8 and jdk-9 - reduce does not take that into account. Notice that this could very well change in a future build or release.
Also when you say unordered - you actually mean that you don't care about that order and the stream returning the same result is not a violation of that rule.
For example notice this question/answer that explains that findFirst for example (just another terminal operation) changed to take unordered into consideration in java-9 as opposed to java-8.
To help explain this, I am going to reduce the scope of this string to ABCD.
The parallel stream will divide the string into two pieces: AB and CD. When we go to combine these later, the result of the AB side will be the first argument passed into the function, while the result of the CD side will be the second argument passed into the function. This is regardless of which of the two actually finishes first.
The unordered operator will affect some operations on a stream, such as a limit operation, it does not affect a simple reduce.
TLDR: .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.
Spliterator
The encounter order of the stream depends on stream spliterator (None of the answers mentioned that before).
There are different spliterators based on the source stream. You can get the types of spliterators from the source code of those collections.
HashSet -> HashMap#KeySpliterator = Not ordered
ArrayDeque = Ordered
ArrayList = Ordered
TreeSet -> TreeMap#Spliterator = Ordered and sorted
logicbig.com - Ordering
logicbig.com - Stateful vs Stateless
Additionally you can apply .unordered() intermediate stream operation that specifies following operations in the stream should not rely on ordering.
Stream operations (mostly stateful) that are affected by spliterator and usage of .unordered() method are:
.findFirst()
.limit()
.skip()
.distinct()
Those operations will give us different results based on the order property of the stream and its spliterator.
.peek() method does not take ordering into consideration, if stream is executed in parallel it will always print/receive elements in unordered manner.
.reduce()
Now for the terminal .reduce() method. Intermediate operation .unordered() doesn't have any affect on type of spliterator (as #Eugene mentioned). But important notice, it still stays the same as it is in the source spliterator. If source spliterator is ordered, result of the .reduce() will be ordered, if source was unordered result of .reduce() will be unordered.
You are using new HashSet<>(Arrays.asList(alphabet)) to get the instance of the stream. Its spliterator is unordered. It was just a coincidence that you are getting your result ordered because you are using the single alphabet Strings as elements of the stream and unordered result is actually the same. Now if you would mix that with numbers or mix it with lower case and upper case then this doesn't hold true anymore. For example take following inputs, the first one is subset of the example you posted:
HashSet .reduce() - Unordered
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "a1Ab2Bc3C"
"Apple","Orange","Banana","Mango" -> "AppleMangoOrangeBanana"
TreeSet .reduce() - Ordered, Sorted
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "123ABCabc"
"Apple","Orange","Banana","Mango" -> "AppleBananaMangoOrange"
ArrayList .reduce() - Ordered
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "abc123ABC"
"Apple","Orange","Banana","Mango" -> "AppleOrangeBananaMango"
You see that testing .reduce() operation only with an alphabet source stream can lead to false conclusions.
The answer is .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.

Conditionally add an operation to a Java 8 stream

I'm wondering if I can add an operation to a stream, based off of some sort of condition set outside of the stream. For example, I want to add a limit operation to the stream if my limit variable is not equal to -1.
My code currently looks like this, but I have yet to see other examples of streams being used this way, where a Stream object is reassigned to the result of an intermediate operation applied on itself:
// Do some stream stuff
stream = stream.filter(e -> e.getTimestamp() < max);
// Limit the stream
if (limit != -1) {
stream = stream.limit(limit);
}
// Collect stream to list
stream.collect(Collectors.toList());
As stated in this stackoverflow post, the filter isn't actually applied until a terminal operation is called. Since I'm reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
There is no semantic difference between a chained series of invocations and a series of invocations storing the intermediate return values. Thus, the following code fragments are equivalent:
a = object.foo();
b = a.bar();
c = b.baz();
and
c = object.foo().bar().baz();
In either case, each method is invoked on the result of the previous invocation. But in the latter case, the intermediate results are not stored but lost on the next invocation. In the case of the stream API, the intermediate results must not be used after you have called the next method on it, thus chaining is the natural way of using stream as it intrinsically ensures that you don’t invoke more than one method on a returned reference.
Still, it is not wrong to store the reference to a stream as long as you obey the contract of not using a returned reference more than once. By using it they way as in your question, i.e. overwriting the variable with the result of the next invocation, you also ensure that you don’t invoke more than one method on a returned reference, thus, it’s a correct usage. Of course, this only works with intermediate results of the same type, so when you are using map or flatMap, getting a stream of a different reference type, you can’t overwrite the local variable. Then you have to be careful to not use the old local variable again, but, as said, as long as you are not using it after the next invocation, there is nothing wrong with the intermediate storage.
Sometimes, you have to store it, e.g.
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt"))) {
stream.filter(s -> !s.isEmpty()).forEach(System.out::println);
}
Note that the code is equivalent to the following alternatives:
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt")).filter(s->!s.isEmpty())) {
stream.forEach(System.out::println);
}
and
try(Stream<String> srcStream = Files.lines(Paths.get("myFile.txt"))) {
Stream<String> tmp = srcStream.filter(s -> !s.isEmpty());
// must not be use variable srcStream here:
tmp.forEach(System.out::println);
}
They are equivalent because forEach is always invoked on the result of filter which is always invoked on the result of Files.lines and it doesn’t matter on which result the final close() operation is invoked as closing affects the entire stream pipeline.
To put it in one sentence, the way you use it, is correct.
I even prefer to do it that way, as not chaining a limit operation when you don’t want to apply a limit is the cleanest way of expression your intent. It’s also worth noting that the suggested alternatives may work in a lot of cases, but they are not semantically equivalent:
.limit(condition? aLimit: Long.MAX_VALUE)
assumes that the maximum number of elements, you can ever encounter, is Long.MAX_VALUE but streams can have more elements than that, they even might be infinite.
.limit(condition? aLimit: list.size())
when the stream source is list, is breaking the lazy evaluation of a stream. In principle, a mutable stream source might legally get arbitrarily changed up to the point when the terminal action is commenced. The result will reflect all modifications made up to this point. When you add an intermediate operation incorporating list.size(), i.e. the actual size of the list at this point, subsequent modifications applied to the collection between this point and the terminal operation may turn this value to have a different meaning than the intended “actually no limit” semantic.
Compare with “Non Interference” section of the API documentation:
For well-behaved stream sources, the source can be modified before the terminal operation commences and those modifications will be reflected in the covered elements. For example, consider the following code:
List<String> l = new ArrayList(Arrays.asList("one", "two"));
Stream<String> sl = l.stream();
l.add("three");
String s = sl.collect(joining(" "));
First a list is created consisting of two strings: "one"; and "two". Then a stream is created from that list. Next the list is modified by adding a third string: "three". Finally the elements of the stream are collected and joined together. Since the list was modified before the terminal collect operation commenced the result will be a string of "one two three".
Of course, this is a rare corner case as normally, a programmer will formulate an entire stream pipeline without modifying the source collection in between. Still, the different semantic remains and it might turn into a very hard to find bug when you once enter such a corner case.
Further, since they are not equivalent, the stream API will never recognize these values as “actually no limit”. Even specifying Long.MAX_VALUE implies that the stream implementation has to track the number of processed elements to ensure that the limit has been obeyed. Thus, not adding a limit operation can have a significant performance advantage over adding a limit with a number that the programmer expects to never be exceeded.
There is two ways you can do this
// Do some stream stuff
List<E> results = list.stream()
.filter(e -> e.getTimestamp() < max);
.limit(limit > 0 ? limit : list.size())
.collect(Collectors.toList());
OR
// Do some stream stuff
stream = stream.filter(e -> e.getTimestamp() < max);
// Limit the stream
if (limit != -1) {
stream = stream.limit(limit);
}
// Collect stream to list
List<E> results = stream.collect(Collectors.toList());
As this is functional programming you should always work on the result of each function. You should specifically avoid modifying anything in this style of programming and treat everything as if it was immutable if possible.
Since I'm reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
It should work, however it reads as a mix of imperative and functional coding. I suggest writing it as a fixed stream as per my first answer.
I think your first line needs to be:
stream = stream.filter(e -> e.getTimestamp() < max);
so that your using the stream returned by filter in subsequent operations rather than the original stream.
I known it is a bit too late, but I had the same question myself and didn't find the satisfying answer, however, inspired by this question and answers I came to the following solution:
return Stream.of( ///< wrap target stream in other stream ;)
/*do regular stream stuff*/
stream.filter(e -> e.getTimestamp() < max)
).flatMap(s -> limit != -1 ? s.limit(limit) : s) ///< apply limit only if necessary and unwrap stream of stream to "normal" stream
.collect(Collectors.toList()) ///< do final stuff

Java 8 Streams peek api

I tried the following snippet of Java 8 code with peek.
List<String> list = Arrays.asList("Bender", "Fry", "Leela");
list.stream().peek(System.out::println);
However there is nothing printed out on the console. If I do this instead:
list.stream().peek(System.out::println).forEach(System.out::println);
I see the following which outputs both the peek as well as foreach invocation.
Bender
Bender
Fry
Fry
Leela
Leela
Both foreach and peek take in a (Consumer<? super T> action)
So why is the output different?
The Javadoc mentions the following:
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.
peek being an intermediate operation does nothing. On applying a terminal operation like foreach, the results do get printed out as seen.
The documentation for peek says
Returns a stream consisting of the elements of this stream, additionally performing the provided action on each element as elements are consumed from the resulting stream.
This is an intermediate operation.
You therefore have to do something with the resulting stream for System.out.println to do anything.
From the docs on Stream for the peek method:
...additionally performing the provided action on each element as elements are consumed from the resulting stream.
Streams in Java-8 are lazy, in addition, say if there are two chained operations in stream one after the other, then the second operation begins as soon as first one finishes processing a unit of data element (given there is a terminal operation in the stream).
This is the reason why you can see repeated name strings getting output.

Categories