I am working on java 8 parallel stream and wanting to print the elements in parallel stream is some order (say insertion order, reverse order or sequential order).
For which i tried the following code:
System.out.println("With forEachOrdered:");
listOfIntegers
.parallelStream()
.forEachOrdered(e -> System.out.print(e + " "));
System.out.println("");
System.out.println("With Sequential:");
listOfIntegers.parallelStream()
.sequential()
.forEach(e -> System.out.print(e + " "));
And for both of these, i got the same output as follows:
With forEachOrdered:
1 2 3 4 5 6 7 8
With Sequential:
1 2 3 4 5 6 7 8
from the api documentation, i can see that:
forEachOrdered -> This is a terminal operation.
and
sequential -> This is an intermediate operation.
So my question is which one is more better to use?
and in which scenarios, one should be preferred over other?
listOfIntegers.parallelStream().sequential().forEach() creates a parallel Stream and then converts it to a sequential Stream, so you might as well use listOfIntegers.stream().forEach() instead, and get a sequential Stream in the first place.
listOfIntegers.parallelStream().forEachOrdered(e -> System.out.print(e + " ")) performs the operation on a parallel Stream, but guarantees the elements will be consumed in the encounter order of the Stream (if the Stream has a defined encounter order). However, it can be executed on multiple threads.
I don't see a reason of ever using listOfIntegers.parallelStream().sequential(). If you want a sequential Stream, why create a parallel Stream first?
You are asking somehow a misleading question, first you ask about:
.parallelStream()
.forEachOrdered(...)
This will create a parallel Stream, but elements will be consumed in order. If you add a map operation like this:
.map(...)
.parallelStream()
.forEachOrdered(...)
This will make the map very limited (from a parallel processing point of view) operations since threads have to wait for all other elements in encounter order to be processed (consumed by forEachOrdered). This regards stateless operations.
On the other hand if you have a stateful operation like:
.parallelStream()
.map()
.sorted()
.// other operations
Since sorted is stateful, the benefit of the stateless operations before it from a parallel processing will be bigger. And that happens because sorted has to gather all elements from the Stream, and Threads don't have to "wait" (at the forEachOrdered) for the elements in encounter order.
For the second example:
listOfIntegers.parallelStream()
.sequential()
.forEach(e -> System.out.print(e + " "))
you are basically saying turn parallel on and then turn it off. Streams are driven by the terminal operation, so even if you do:
.map...
.filter...
.parallel()
.map...
.sequential
This means that the entire pipeline will be executed sequentially, not that some part will be parallel and the other sequential. You are also relying on the fact that forEach preserves order and may be at the moment it does, but may be in a later release, sine you said you don't care about order (by using forEach in the first place), there will be an internal shuffling of the elements.
Stream pipelines may execute either sequentially or in parallel. This execution mode is a property of the stream. Streams are created with an initial choice of sequential or parallel execution. For example, Collection.stream() creates a sequential stream, and Collection.parallelStream() creates a parallel one. This choice of execution mode may be modified by the BaseStream.sequential() or BaseStream.parallel() methods.
So there is no need to use:
listOfIntegers.parallelStream().sequential()
You can only use:
listOfIntegers.stream()
If you are creating a parallel stream, it is possible for the elements of the stream to be processed by different threads. The difference between forEach and forEachOrdered is that forEach will allow any element of a parallel stream to be processed in any order, while forEachOrdered will always process the elements of a parallel stream in the order of their appearance in the original stream. When using parallelStream() and forEachOrdered is a very good example on how you can take advantage of multiple cores and still preserve the order of the output. Note that forEachOrdered forces the iteration of the elements of the stream in an ordered fashion. However, any operation that is chained before forEachOrdered will still happen in parallel because the stream is a parallel stream.
It is not documented by Oracle exactly what happens when you change the stream execution mode multiple times in a pipeline. It is not clear whether it is the last change that matters or whether operations invoked after calling parallel() can be executed in parallel and operations invoked after calling sequential() will be executed sequentially.
I am trying to solve the following exercise from "Core Java for the Impatient" by Cay Horstmann:
When an encoder of a Charset with partial Unicode coverage can’t encode a
character, it replaces it with a default—usually, but not always, the encoding of "?".
Find all replacements of all available character sets that support encoding. Use the
newEncoder method to get an encoder, and call its replacement method to get
the replacement. For each unique result, report the canonical names of the charsets
that use it.
For the sake of education, I have decided to tackle the exercise with gargantuan one-liner using the streaming API, even though - in my opinion - a cleaner solution would divide the calculations into a number of steps, with intermediate variables in-between (certainly it would ease the debugging). Without further ado, here is a monster of code I have created:
Charset.availableCharsets().values().stream().filter(charset -> charset.canEncode()).collect(
Collectors.groupingBy(
charset -> charset.newEncoder().replacement(),
() -> new TreeMap<>((arr1, arr2) -> Arrays.equals(arr1, arr2) == true ? 0 : Integer.compare(arr1.hashCode(), arr2.hashCode())),
Collectors.mapping( charset -> charset.name(), Collectors.toList()))).
values().stream().map(list -> list.stream().collect(Collectors.joining(", "))).forEach(System.out::println);
Basically, we take into account only the charsets that canEncode; create a Map with replacement as key and a list of canonical names as values; because grouping didn't work for arrays with default implementation of groupingBy, which uses HashMap, I have decided to use TreeMap. We then work with the Lists of canonical names, join them with comma and print.
Unfortunately, I have found it to give incoherent results. If I run the function twice in the same program, the first instance returns results consisting of 23 Strings, the second one - just 21 Strings. I suspect it has something to do with a poor implementation of Comparator for TreeMap, which was defined as follows:
((arr1, arr2) -> Arrays.equals(arr1, arr2) == true ? 0 : Integer.compare(arr1.hashCode(), arr2.hashCode()))
If that is the cause, what should be a proper Comparator in this case? Apart from that, can the one-liner be improved in any way?
I am also curious if such convoluted constructs as the code I have written are encountered in professional programs? Maybe it's only me who find it unreadable?
There is no guarantee that the hash code of two distinct instances will be different. That would be an ideal situation, but is never guaranteed. Only the opposite is true: if two objects are equal, they have the same hash code.
So if you create a comparator that considers the objects to be the same when they have the same hash code, arbitrary objects might be considered to be the same. Since the byte[] arrays returned by replacement() are defensive copies, read temporary objects, the result may vary in every run of this code.
Further, since the hash code of an array has nothing to do with its content, your comparator violates the transitivity rule: two arrays with equal content are supposed to be the same, but since they might/very likely have different hash codes, they have a different relation when being compared with a third array, not having the same content, a == b, but a < c and b > c. This is the reason why even equal arrays, which you compare by Arrays.equals can end up in different groups, as the TreeSet failed to find the existing key when comparing with other keys then.
If you want the arrays to be compared by value, you can use:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> charset.newEncoder().replacement(),
() -> new TreeMap<>(Comparator.comparing(ByteBuffer::wrap)),
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
ByteBuffers are Comparable and consistently evaluate the contents of the wrapped array.
I moved the Collectors.joining collector into the grouping collector to avoid the creation of the temporary List whose contents you are going to join afterwards anyway.
By the way, never use code like expression == true. There is no reason to append == true as expression is already sufficient.
Since you are only interested in the values, in other words, don’t need the keys to be of a certain type, you may wrap all arrays beforehand, simplifying the operation and even make it slightly more efficient:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> ByteBuffer.wrap(charset.newEncoder().replacement()),
TreeMap::new,
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
This change even allows resorting to hashing, if no consistent iteration order is required:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> ByteBuffer.wrap(charset.newEncoder().replacement()),
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
This works, because ByteBuffer also implements equals and hashCode.
stream.parallel().skip(1)
vs
stream.skip(1).parallel()
This is about Java 8 streams.
Are both of these skipping the 1st line/entry?
The example is something like this:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.concurrent.atomic.AtomicLong;
public class Test010 {
public static void main(String[] args) {
String message =
"a,b,c\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n";
try(BufferedReader br = new BufferedReader(new StringReader(message))){
AtomicLong cnt = new AtomicLong(1);
br.lines().parallel().skip(1).forEach(
s -> {
System.out.println(cnt.getAndIncrement() + "->" + s);
}
);
}catch (IOException e) {
e.printStackTrace();
}
}
}
Earlier today, I was sometimes getting the header line "a,b,c" in the lambda expression. This was a surprise since I was expecting to have skipped it already. Now I cannot get that example to work i.e. I cannot get the header line in the lambda expression. So I am pretty confused now, maybe something else was influencing that behavior. Of course this is just an example. In the real world the message is being read from a CSV file. The message is the full content of that CSV file.
You actually have two questions in one, the first being whether it makes a difference in writing stream.parallel().skip(1) or stream.skip(1).parallel(), the second being whether either or both will always skip the first element. See also “loaded question”.
The first answer is that it makes no difference, because specifying a .sequential() or .parallel() execution policy affects the entire Stream pipeline, regardless of where you place it in the call chain—of course, unless you specify multiple contradicting policies, in which case the last one wins.
So in either case you are requesting a parallel execution which might affect the outcome of the skip operation, which is subject of the second question.
The answer is not that simple. If the Stream has no defined encounter order in the first place, an arbitrary element might get skipped, which is a consequence of the fact that there is no “first” element, even if there might be an element you encounter first when iterating over the source.
If you have an ordered Stream, skip(1) should skip the first element, but this has been laid down only recently. As discussed in “Stream.skip behavior with unordered terminal operation”, chaining an unordered terminal operation had an effect on the skip operation in earlier implementations and there was some uncertainty of whether this could even be intentional, as visible in “Is this a bug in Files.lines(), or am I misunderstanding something about parallel streams?”, which happens to be close to your code; apparently skipping the first line is a common case.
The final word is that the behavior of earlier JREs is a bug and skip(1) on an ordered stream should skip the first element, even when the stream pipeline is executed in parallel and the terminal operation is unordered. The associated bug report names jdk1.8.0_60 as first fixed version, which I could verify. So if you are using on older implementation, you might experience the Stream skipping different elements when using .parallel() and the unordered .forEach(…) terminal operation. It’s not contradicting if the implementation occasionally skips the expected element, that’s the unpredictability of multi-threading.
So the answer still is that stream.parallel().skip(1) and stream.skip(1).parallel() have the same behavior, even when being used in earlier versions, as both are equally unpredictable when being used with an unordered terminal operation like forEach. They should always skip the first element with ordered Streams and when being used with 1.8.0_60 or newer, they do.
Yes, but skip(n) is slower as n is larger with a parallel stream.
Here's the API note from skip():
While skip() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of n, since skip(n) is constrained to skip not just any n elements, but the first n elements in the encounter order. Using an unordered stream source (such as generate(Supplier)) or removing the ordering constraint with BaseStream.unordered() may result in significant speedups of skip() in parallel pipelines, if the semantics of your situation permit. If consistency with encounter order is required, and you are experiencing poor performance or memory utilization with skip() in parallel pipelines, switching to sequential execution with BaseStream.sequential() may improve performance.
So essentially, if you want better performance with skip(), don't use a parellel stream, or use an unordered stream.
As for it seeming to not work with parallel streams, perhaps you're actually seeing that the elements are no longer ordered? For example, an output of this code:
Stream.of("Hello", "How", "Are", "You?")
.parallel()
.skip(1)
.forEach(System.out::println);
Is
Are
You?
How
Ideone Demo
This is perfectly fine because forEach doesn't enforce the encounter order in a parallel stream. If you want it to enforce the encounter order, use a sequential stream (and perhaps use forEachOrdered so that your intent is obvious).
Stream.of("Hello", "How", "Are", "You?")
.skip(1)
.forEachOrdered(System.out::println);
How
Are
You?
I'm wondering if I can add an operation to a stream, based off of some sort of condition set outside of the stream. For example, I want to add a limit operation to the stream if my limit variable is not equal to -1.
My code currently looks like this, but I have yet to see other examples of streams being used this way, where a Stream object is reassigned to the result of an intermediate operation applied on itself:
// Do some stream stuff
stream = stream.filter(e -> e.getTimestamp() < max);
// Limit the stream
if (limit != -1) {
stream = stream.limit(limit);
}
// Collect stream to list
stream.collect(Collectors.toList());
As stated in this stackoverflow post, the filter isn't actually applied until a terminal operation is called. Since I'm reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
There is no semantic difference between a chained series of invocations and a series of invocations storing the intermediate return values. Thus, the following code fragments are equivalent:
a = object.foo();
b = a.bar();
c = b.baz();
and
c = object.foo().bar().baz();
In either case, each method is invoked on the result of the previous invocation. But in the latter case, the intermediate results are not stored but lost on the next invocation. In the case of the stream API, the intermediate results must not be used after you have called the next method on it, thus chaining is the natural way of using stream as it intrinsically ensures that you don’t invoke more than one method on a returned reference.
Still, it is not wrong to store the reference to a stream as long as you obey the contract of not using a returned reference more than once. By using it they way as in your question, i.e. overwriting the variable with the result of the next invocation, you also ensure that you don’t invoke more than one method on a returned reference, thus, it’s a correct usage. Of course, this only works with intermediate results of the same type, so when you are using map or flatMap, getting a stream of a different reference type, you can’t overwrite the local variable. Then you have to be careful to not use the old local variable again, but, as said, as long as you are not using it after the next invocation, there is nothing wrong with the intermediate storage.
Sometimes, you have to store it, e.g.
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt"))) {
stream.filter(s -> !s.isEmpty()).forEach(System.out::println);
}
Note that the code is equivalent to the following alternatives:
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt")).filter(s->!s.isEmpty())) {
stream.forEach(System.out::println);
}
and
try(Stream<String> srcStream = Files.lines(Paths.get("myFile.txt"))) {
Stream<String> tmp = srcStream.filter(s -> !s.isEmpty());
// must not be use variable srcStream here:
tmp.forEach(System.out::println);
}
They are equivalent because forEach is always invoked on the result of filter which is always invoked on the result of Files.lines and it doesn’t matter on which result the final close() operation is invoked as closing affects the entire stream pipeline.
To put it in one sentence, the way you use it, is correct.
I even prefer to do it that way, as not chaining a limit operation when you don’t want to apply a limit is the cleanest way of expression your intent. It’s also worth noting that the suggested alternatives may work in a lot of cases, but they are not semantically equivalent:
.limit(condition? aLimit: Long.MAX_VALUE)
assumes that the maximum number of elements, you can ever encounter, is Long.MAX_VALUE but streams can have more elements than that, they even might be infinite.
.limit(condition? aLimit: list.size())
when the stream source is list, is breaking the lazy evaluation of a stream. In principle, a mutable stream source might legally get arbitrarily changed up to the point when the terminal action is commenced. The result will reflect all modifications made up to this point. When you add an intermediate operation incorporating list.size(), i.e. the actual size of the list at this point, subsequent modifications applied to the collection between this point and the terminal operation may turn this value to have a different meaning than the intended “actually no limit” semantic.
Compare with “Non Interference” section of the API documentation:
For well-behaved stream sources, the source can be modified before the terminal operation commences and those modifications will be reflected in the covered elements. For example, consider the following code:
List<String> l = new ArrayList(Arrays.asList("one", "two"));
Stream<String> sl = l.stream();
l.add("three");
String s = sl.collect(joining(" "));
First a list is created consisting of two strings: "one"; and "two". Then a stream is created from that list. Next the list is modified by adding a third string: "three". Finally the elements of the stream are collected and joined together. Since the list was modified before the terminal collect operation commenced the result will be a string of "one two three".
Of course, this is a rare corner case as normally, a programmer will formulate an entire stream pipeline without modifying the source collection in between. Still, the different semantic remains and it might turn into a very hard to find bug when you once enter such a corner case.
Further, since they are not equivalent, the stream API will never recognize these values as “actually no limit”. Even specifying Long.MAX_VALUE implies that the stream implementation has to track the number of processed elements to ensure that the limit has been obeyed. Thus, not adding a limit operation can have a significant performance advantage over adding a limit with a number that the programmer expects to never be exceeded.
There is two ways you can do this
// Do some stream stuff
List<E> results = list.stream()
.filter(e -> e.getTimestamp() < max);
.limit(limit > 0 ? limit : list.size())
.collect(Collectors.toList());
OR
// Do some stream stuff
stream = stream.filter(e -> e.getTimestamp() < max);
// Limit the stream
if (limit != -1) {
stream = stream.limit(limit);
}
// Collect stream to list
List<E> results = stream.collect(Collectors.toList());
As this is functional programming you should always work on the result of each function. You should specifically avoid modifying anything in this style of programming and treat everything as if it was immutable if possible.
Since I'm reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
It should work, however it reads as a mix of imperative and functional coding. I suggest writing it as a fixed stream as per my first answer.
I think your first line needs to be:
stream = stream.filter(e -> e.getTimestamp() < max);
so that your using the stream returned by filter in subsequent operations rather than the original stream.
I known it is a bit too late, but I had the same question myself and didn't find the satisfying answer, however, inspired by this question and answers I came to the following solution:
return Stream.of( ///< wrap target stream in other stream ;)
/*do regular stream stuff*/
stream.filter(e -> e.getTimestamp() < max)
).flatMap(s -> limit != -1 ? s.limit(limit) : s) ///< apply limit only if necessary and unwrap stream of stream to "normal" stream
.collect(Collectors.toList()) ///< do final stuff