Doesn't Stream.parallel() update the characteristics of spliterator? - java

This question is based on the answers to this question What is the difference between Stream.of and IntStream.range?
Since the IntStream.range produces an already sorted stream, the output to the below code would only generate the output as 0:
IntStream.range(0, 4)
.peek(System.out::println)
.sorted()
.findFirst();
Also the spliterator would have SORTED characteristics. Below code returns true:
System.out.println(
IntStream.range(0, 4)
.spliterator()
.hasCharacteristics(Spliterator.SORTED)
);
Now, If I introduce a parallel() in the first code, then as expected, the output would contain all 4 numbers from 0 to 3 but in a random order, because the stream is not sorted anymore due to parallel().
IntStream.range(0, 4)
.parallel()
.peek(System.out::println)
.sorted()
.findFirst();
This would produce something like below: (in any random order)
2
0
1
3
So, I expect that the SORTED property has been removed due to parallel(). But, the below code returns true as well.
System.out.println(
IntStream.range(0, 4)
.parallel()
.spliterator()
.hasCharacteristics(Spliterator.SORTED)
);
Why doesn't the parallel() change SORTED property? And since all four numbers are printed, How does Java realize that the stream is not sorted even though the SORTED property still exists?

How exactly this is done is very much an implementation detail. You will have to dig deep inside the source code to really see why. Basically, parallel and sequential pipelines are just handled differently. Look at the AbstractPipeline.evaluate, which checks isParallel(), then does different things depending whether the pipeline is parallel.
return isParallel()
? terminalOp.evaluateParallel(this, sourceSpliterator(terminalOp.getOpFlags()))
: terminalOp.evaluateSequential(this, sourceSpliterator(terminalOp.getOpFlags()));
If you then look at SortedOps.OfInt, you'll see that it overrides two methods:
#Override
public Sink<Integer> opWrapSink(int flags, Sink sink) {
Objects.requireNonNull(sink);
if (StreamOpFlag.SORTED.isKnown(flags))
return sink;
else if (StreamOpFlag.SIZED.isKnown(flags))
return new SizedIntSortingSink(sink);
else
return new IntSortingSink(sink);
}
#Override
public <P_IN> Node<Integer> opEvaluateParallel(PipelineHelper<Integer> helper,
Spliterator<P_IN> spliterator,
IntFunction<Integer[]> generator) {
if (StreamOpFlag.SORTED.isKnown(helper.getStreamAndOpFlags())) {
return helper.evaluate(spliterator, false, generator);
}
else {
Node.OfInt n = (Node.OfInt) helper.evaluate(spliterator, true, generator);
int[] content = n.asPrimitiveArray();
Arrays.parallelSort(content);
return Nodes.node(content);
}
}
opWrapSink will be eventually called if it's a sequential pipeline, and opEvaluateParallel (as its name suggests) will be called when it's a parallel stream. Notice how opWrapSink doesn't do anything to the given sink if the pipeline is already sorted (just returns it unchanged), but opEvaluateParallel always evaluates the spliterator.
Also note that parallel-ness and sorted-ness are not mutually exclusive. You can have a stream with any combination of those characteristics.
"Sorted" is a characteristic of a Spliterator. It's not technically a characteristic of a Stream (like "parallel" is). Sure, parallel could create a stream with an entirely new spliterator (that gets elements from the original spliterator) with entirely new characteristics, but why do that, when you can just reuse the same spliterator? Id imagine you'll have to handle parallel and sequential streams differently in any case.

You need to take a step back and think of how you would solve such a problem in general, considering that ForkJoinPool is used for parallel streams and it works based on work stealing. It would be very helpful if you knew how a Spliterator works, too. Some details here.
You have a certain Stream, you "split it" (very simplified) into smaller pieces and give all those pieces to a ForkJoinPool for execution. All of those pieces are worked on independently, by individual threads. Since we are talking about threads here, there is obviously no sequence of events, things happen randomly (that is why you see a random order output).
If your stream preserves the order, terminal operation is suppose to preserve it too. So while intermediate operations are executed in any order, your terminal operation (if the stream up to that point is ordered), will handle elements in an ordered fashion. To put it slightly simplified:
System.out.println(
IntStream.of(1,2,3)
.parallel()
.map(x -> {System.out.println(x * 2); return x * 2;})
.boxed()
.collect(Collectors.toList()));
map will process elements in an unknown order (ForkJoinPool and threads, remember that), but collect will receive elements in order, "left to right".
Now, if we extrapolate that to your example: when you invoke parallel, the stream is split in small pieces and worked on. For example look how this is split (a single time).
Spliterator<Integer> spliterator =
IntStream.of(5, 4, 3, 2, 1, 5, 6, 7, 8)
.parallel()
.boxed()
.sorted()
.spliterator()
.trySplit(); // trySplit is invoked internally on parallel
spliterator.forEachRemaining(System.out::println);
On my machine it prints 1,2,3,4. This means that the internal implementation splits the stream in two Spliterators: left and right. left has [1, 2, 3, 4] and right has [5, 6, 7, 8]. But that is not it, these Spliterators can be split further. For example:
Spliterator<Integer> spliterator =
IntStream.of(5, 4, 3, 2, 1, 5, 6, 7, 8)
.parallel()
.boxed()
.sorted()
.spliterator()
.trySplit()
.trySplit()
.trySplit();
spliterator.forEachRemaining(System.out::println);
if you try to invoke trySplit again, you will get a null - meaning, that's it, I can't split anymore.
So, your Stream : IntStream.range(0, 4) is going to be split in 4 spliterators. All worked on individually, by a thread. If your first thread knows that this Spliterator it currently works on, is the "left-most one", that's it! The rest of the threads do not even need to start their work - the result is known.
On the other hand, it could be that this Spliterator that has the "left-most" element is only started last. So the first three ones, might already be done with their work (thus peek is invoked in your example), but they do not "produce" the needed result.
As a matter fact, this is how it is done internally. You do not need to understand the code - but the flow and the method names should be obvious.

Related

For Java streams, does generate + limit guarantee no additional calls to the generator function, or is there a preferred alternative?

I have a source of data that I know has n elements, which I can access by repeatedly calling a method on an object; for the sake of example, let's call it myReader.find(). I want to create a stream of data containing those n elements. Let's also say that I don't want to call the find() method more times than the amount of data I want to return, as it will throw an exception (e.g. NoSuchElementException) if the method is called after the end of the data is reached.
I know I can create this stream by using the IntStream.range method, and mapping each element using the find method. However, this feels a little weird since I'm completely ignoring the int values in the stream (I'm really just using it to produce a stream with exactly n elements).
return IntStream.range(0, n).mapToObj(i -> myReader.read());
An approach I've considered is using Stream.generate(supplier) followed by Stream.limit(maxSize). Based on my understanding of the limit function, this feels like it should work.
Stream.generate(myReader::read).limit(n)
However, nowhere in the API documentation do I see an indication that the Stream.limit() method will guarantee exactly maxSize elements are generated by the stream it's called on. It wouldn't be infeasible that a stream implementation could be allowed to call the generator function more than n times, so long as the end result was just the first n calls, and so long as it meets the API contract for being a short-circuiting intermediate operation.
Stream.limit JavaDocs
Returns a stream consisting of the elements of this stream, truncated to be no longer than maxSize in length.
This is a short-circuiting stateful intermediate operation.
Stream operations and pipelines documentation
An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. [...] Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
Is it safe to rely on Stream.generate(generator).limit(n) only making n calls to the underlying generator? If so, is there some documentation of this fact that I'm missing?
And to avoid the XY Problem: what is the idiomatic way of creating a stream by performing an operation exactly n times?
Stream.generate creates an unordered Stream. This implies that the subsequent limit operation is not required to use the first n elements, as there is no “first” when there’s no order, but may select arbitrary n elements. The implementation may exploit this permission , e.g. for higher parallel processing performance.
The following code
IntSummaryStatistics s =
Stream.generate(new AtomicInteger()::incrementAndGet)
.parallel()
.limit(100_000)
.collect(Collectors.summarizingInt(Integer::intValue));
System.out.println(s);
prints something like
IntSummaryStatistics{count=100000, sum=5000070273, min=1, average=50000,702730, max=100207}
on my machine, whereas the max number may vary. It demonstrates that the Stream has selected exactly 100000 elements, as required, but not the elements from 1 to 100000. Since the generator produces strictly ascending numbers, it’s clear that is has been called more than 100000 times to get number higher than that.
Another example
System.out.println(
Stream.generate(new AtomicInteger()::incrementAndGet)
.parallel()
.map(String::valueOf)
.limit(10)
.collect(Collectors.toList())
);
prints something like this on my machine (JDK-14)
[4, 8, 5, 6, 10, 3, 7, 1, 9, 11]
With JDK-8, it even prints something like
[4, 14, 18, 24, 30, 37, 42, 52, 59, 66]
If a construct like
IntStream.range(0, n).mapToObj(i -> myReader.read())
feels weird due to the unused i parameter, you may use
Collections.nCopies(n, myReader).stream().map(TypeOfMyReader::read)
instead. This doesn’t show an unused int parameter and works equally well, as in fact, it’s internally implemented as IntStream.range(0, n).mapToObj(i -> element). There is no way around some counter, visible or hidden, to ensure that the method will be called n times. Note that, since read likely is a stateful operation, the resulting behavior will always be like an unordered stream when enabling parallel processing, but the IntStream and nCopies approaches create a finite stream that will never invoke the method more than the specified number of times.
Only answering the XY-problem part of your question: simply create a spliterator for your reader.
class MyStreamSpliterator implements Spliterator<String> { // or whichever datatype
private final MyReaderClass reader;
public MyStramSpliterator(MyReaderClass reader) {
this.reader = reader;
}
#Override
public boolean tryAdvance(Consumer<String> action) {
try {
String nextval = reader.read();
action.accept(nextval);
return true;
} catch(NoSuchElementException e) {
// cleanup if necessary
return false;
}
// Alternative: if you really really want to use n iterations,
// add a counter and use it.
}
#Override
public Spliterator<String> trySplit() {
return null; // we don't split
}
#Override
public long estimateSize() {
return Long.MAX_VALUE; // or the correct value, if you know it before
}
#Override
public int characteristics() {
// add SIZED if you know the size
return Spliterator.IMMUTABLE | Spliterator.ORDERED;
}
}
Then, create your stream as StreamSupport.stream(new MyStreamSpliterator(reader), false)
Disclaimer: I just threw this together in the SO editor, probably there are some errors.

Create a finite stream with a generator function

I have a program to read data from multiple sources, use a tournament tree to merge sort them, pack the data into blocks and output the blocks. I have implemented this as a function, which returns null when no more block is available.
DataBlock buildBlock()
Now I want to output a stream of blocks, but the only method I have found so far is Stream.generate which generates an infinite stream. My stream is of course not infinite. What is a proper way to generate a finite stream from this function?
If you use at least Java 9, you can apply takeWhile(Objects::nonNull) on your stream. If you use the older Java version, check out this question.
You can create a stream of Optional's with Stream.iterate​(T seed, Predicate<? super T> hasNext, UnaryOperator<T> next), stopping on empty. You can then call .map(Optional::get) on the stream.
Here's an example of creating a stream that tracks the cumulative sum of a list.
public static Stream<Integer> cumulativeSum(List<Integer> nums) {
Iterator<Integer> numItr = nums.iterator();
if (!numItr.hasNext()) {
return Stream.of();
}
return Stream.iterate(
Optional.of(numItr.next()),
Optional::isPresent,
maybeSum -> maybeSum.flatMap(sum ->
numItr.hasNext() ? Optional.of(Integer.sum(sum, numItr.next())) : Optional.empty()))
.map(Optional::get);
}
If the input list is [2, 4, 3], then the output stream will be [2, 6, 9].
(Java 8 only has Stream.iterate​(T seed, UnaryOperator<T> f), so you will have to call .takeWhile(Optional::isPresent) instead to convert the infinite stream to a finite stream.)

Why do I have to chain Stream operations in Java? [duplicate]

This question already has answers here:
When is a Java 8 Stream considered to be consumed?
(2 answers)
Closed 4 years ago.
I think all of the resources I have studied one way or another emphasize that a stream can be consumed only once, and the consumption is done by so-called terminal operations (which is very clear to me).
Just out of curiosity I tried this:
import java.util.stream.IntStream;
class App {
public static void main(String[] args) {
IntStream is = IntStream.of(1, 2, 3, 4);
is.map(i -> i + 1);
int sum = is.sum();
}
}
which ends up throwing a Runtime Exception:
Exception in thread "main" java.lang.IllegalStateException: stream has already been operated upon or closed
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:229)
at java.util.stream.IntPipeline.reduce(IntPipeline.java:456)
at java.util.stream.IntPipeline.sum(IntPipeline.java:414)
at App.main(scratch.java:10)
This is usual, I am missing something, but still want to ask: As far as I know map is an intermediate (and lazy) operation and does nothing on the Stream by itself. Only when the terminal operation sum (which is an eager operation) is called, the Stream gets consumed and operated on.
But why do I have to chain them?
What is the difference between
is.map(i -> i + 1);
is.sum();
and
is.map(i -> i + 1).sum();
?
When you do this:
int sum = IntStream.of(1, 2, 3, 4).map(i -> i + 1).sum();
Every chained method is being invoked on the return value of the previous method in the chain.
So map is invoked on what IntStream.of(1, 2, 3, 4) returns and sum on what map(i -> i + 1) returns.
You don't have to chain stream methods, but it's more readable and less error-prone than using this equivalent code:
IntStream is = IntStream.of(1, 2, 3, 4);
is = is.map(i -> i + 1);
int sum = is.sum();
Which is not the same as the code you've shown in your question:
IntStream is = IntStream.of(1, 2, 3, 4);
is.map(i -> i + 1);
int sum = is.sum();
As you see, you're disregarding the reference returned by map. This is the cause of the error.
EDIT (as per the comments, thanks to #IanKemp for pointing this out): Actually, this is the external cause of the error. If you stop to think about it, map must be doing something internally to the stream itself, otherwise, how would then the terminal operation trigger the transformation passed to map on each element? I agree in that intermediate operations are lazy, i.e. when invoked, they do nothing to the elements of the stream. But internally, they must configure some state into the stream pipeline itself, so that they can be applied later.
Despite I'm not aware of the full details, what happens is that, conceptually, map is doing at least 2 things:
It's creating and returning a new stream that holds the function passed as an argument somewhere, so that it can be applied to elements later, when the terminal operation is invoked.
It is also setting a flag to the old stream instance, i.e. the one which it has been called on, indicating that this stream instance no longer represents a valid state for the pipeline. This is because the new, updated state which holds the function passed to map is now encapsulated by the instance it has returned. (I believe that this decision might have been taken by the jdk team to make errors appear as early as possible, i.e. by throwing an early exception instead of letting the pipeline go on with an invalid/old state that doesn't hold the function to be applied, thus letting the terminal operation return unexpected results).
Later on, when a terminal operation is invoked on this instance flagged as invalid, you're getting that IllegalStateException. The two items above configure the deep, internal cause of the error.
Another way to see all this is to make sure that a Stream instance is operated only once, by means of either an intermediate or a terminal operation. Here you are violating this requirement, because you are calling map and sum on the same instance.
In fact, javadocs for Stream state it clearly:
A stream should be operated on (invoking an intermediate or terminal stream operation) only once. This rules out, for example, "forked" streams, where the same source feeds two or more pipelines, or multiple traversals of the same stream. A stream implementation may throw IllegalStateException if it detects that the stream is being reused. However, since some stream operations may return their receiver rather than a new stream object, it may not be possible to detect reuse in all cases.
Imagine the IntStream is a wrapper around your data stream with an
immutable list of operations. These operations are not executed until you need the final result (sum in your case).
Since the list is immutable, you need a new instance of IntStream with a list that contains the previous items plus the new one, which is what '. map' returns.
This means that if you don't chain, you will operate on the old instance, which does not have that operation.
The stream library also keeps some internal tracking of what's going on, that's why it's able to throw the exception in the sum step.
If you don't want to chain, you can use a variable for each step:
IntStream is = IntStream.of(1, 2, 3, 4);
IntStream is2 = is.map(i -> i + 1);
int sum = is2.sum();
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate.
Taken from https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html under "Stream Operations and Pipelines"
At the lowest level, all streams are driven by a spliterator.
Taken from the same link under "Low-level stream construction"
Traversal and splitting exhaust elements; each Spliterator is useful for only a single bulk computation.
Taken from https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html

Stream.reduce always preserving order on parallel, unordered stream

I've gone through several previous questions like Encounter order preservation in java stream, this answer by Brian Goetz, as well as the javadoc for Stream.reduce(), and the java.util.stream package javadoc, and yet I still can't grasp the following:
Take this piece of code:
public static void main(String... args) {
final String[] alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".split("");
System.out.println("Alphabet: ".concat(Arrays.toString(alphabet)));
System.out.println(new HashSet<>(Arrays.asList(alphabet))
.parallelStream()
.unordered()
.peek(System.out::println)
.reduce("", (a,b) -> a + b, (a,b) -> a + b));
}
Why is the reduction always* preserving the encounter order?
So far, after several dozen runs, output is the same
First of all unordered does not imply an actual shuffling; all it does it sets a flag for the Stream pipeline - that could later be leveraged.
A shuffle of the source elements could potentially be much more expensive then the operations on the stream pipeline themselves, so the implementation might choose not to do this(like in this case).
At the moment (tested and looked at the sources) of jdk-8 and jdk-9 - reduce does not take that into account. Notice that this could very well change in a future build or release.
Also when you say unordered - you actually mean that you don't care about that order and the stream returning the same result is not a violation of that rule.
For example notice this question/answer that explains that findFirst for example (just another terminal operation) changed to take unordered into consideration in java-9 as opposed to java-8.
To help explain this, I am going to reduce the scope of this string to ABCD.
The parallel stream will divide the string into two pieces: AB and CD. When we go to combine these later, the result of the AB side will be the first argument passed into the function, while the result of the CD side will be the second argument passed into the function. This is regardless of which of the two actually finishes first.
The unordered operator will affect some operations on a stream, such as a limit operation, it does not affect a simple reduce.
TLDR: .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.
Spliterator
The encounter order of the stream depends on stream spliterator (None of the answers mentioned that before).
There are different spliterators based on the source stream. You can get the types of spliterators from the source code of those collections.
HashSet -> HashMap#KeySpliterator = Not ordered
ArrayDeque = Ordered
ArrayList = Ordered
TreeSet -> TreeMap#Spliterator = Ordered and sorted
logicbig.com - Ordering
logicbig.com - Stateful vs Stateless
Additionally you can apply .unordered() intermediate stream operation that specifies following operations in the stream should not rely on ordering.
Stream operations (mostly stateful) that are affected by spliterator and usage of .unordered() method are:
.findFirst()
.limit()
.skip()
.distinct()
Those operations will give us different results based on the order property of the stream and its spliterator.
.peek() method does not take ordering into consideration, if stream is executed in parallel it will always print/receive elements in unordered manner.
.reduce()
Now for the terminal .reduce() method. Intermediate operation .unordered() doesn't have any affect on type of spliterator (as #Eugene mentioned). But important notice, it still stays the same as it is in the source spliterator. If source spliterator is ordered, result of the .reduce() will be ordered, if source was unordered result of .reduce() will be unordered.
You are using new HashSet<>(Arrays.asList(alphabet)) to get the instance of the stream. Its spliterator is unordered. It was just a coincidence that you are getting your result ordered because you are using the single alphabet Strings as elements of the stream and unordered result is actually the same. Now if you would mix that with numbers or mix it with lower case and upper case then this doesn't hold true anymore. For example take following inputs, the first one is subset of the example you posted:
HashSet .reduce() - Unordered
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "a1Ab2Bc3C"
"Apple","Orange","Banana","Mango" -> "AppleMangoOrangeBanana"
TreeSet .reduce() - Ordered, Sorted
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "123ABCabc"
"Apple","Orange","Banana","Mango" -> "AppleBananaMangoOrange"
ArrayList .reduce() - Ordered
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "abc123ABC"
"Apple","Orange","Banana","Mango" -> "AppleOrangeBananaMango"
You see that testing .reduce() operation only with an alphabet source stream can lead to false conclusions.
The answer is .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.

How collectors are used when turning the stream in parallel

I actually tried to answer this question How to skip even lines of a Stream<String> obtained from the Files.lines. So I though this collector wouldn't work well in parallel:
private static Collector<String, ?, List<String>> oddLines() {
int[] counter = {1};
return Collector.of(ArrayList::new,
(l, line) -> {
if (counter[0] % 2 == 1) l.add(line);
counter[0]++;
},
(l1, l2) -> {
l1.addAll(l2);
return l1;
});
}
but it works.
EDIT: It didn't actually work; I got fooled by the fact that my input set was too small to trigger any parallelism; see discussion in comments.
I thought it wouldn't work because of the two following plans of executions comes to my mind.
1. The counter array is shared among all threads.
Thread t1 read the first element of the Stream, so the if condition is satisfied. It adds the first element to its list. Then the execution stops before he has the time to update the array value.
Thread t2, which says started at the 4th element of the stream add it to its list. So we end up with a non-wanted element.
Of course since this collector seems to works, I guess it doesn't work like that. And the updates are not atomic anyway.
2. Each Thread has its own copy of the array
In this case there is no more problems for the update, but nothing prevents me that the thread t2 will not start at the 4th element of the stream. So he doesn't work like that either.
So it seems that it doesn't work like that at all, which brings me to the question... how the collector is used in parallel?
Can someone explain me basically how it works and why my collector works when ran in parallel?
Thank you very much!
Passing a parallel() source stream into your collector is enough to break the logic because your shared state (counter) may be incremented from different tasks. You can verify that, because it is never returning the correct result for any finite stream input:
Stream<String> lines = IntStream.range(1, 20000).mapToObj(i -> i + "");
System.out.println(lines.isParallel());
lines = lines.parallel();
System.out.println(lines.isParallel());
List<String> collected = lines.collect(oddLines());
System.out.println(collected.size());
Note that for infinite streams (e.g. when reading from Files.lines()) you need to generate some significant amount of data in the stream, so it actually forks a task to run some chunks concurrently.
Output for me is:
false
true
12386
Which is clearly wrong.
As #Holger in the comments correctly pointed out, there is a different race that can happen when your collector is specifying CONCURRENT and UNORDERED, in which case they operate on a single shared collection across tasks (ArrayList::new called once per stream), where-as with only parallel() it will run the accumulator on a collection per task and then later combine the result using your defined combiner.
If you'd add the characteristics to the collector, you might run into the following result due to the shared state in a single collection:
false
true
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 73
at java.util.ArrayList.add(ArrayList.java:459)
at de.jungblut.stuff.StreamPallel.lambda$0(StreamPallel.java:18)
at de.jungblut.stuff.StreamPallel$$Lambda$3/1044036744.accept(Unknown Source)
at java.util.stream.ReferencePipeline.lambda$collect$207(ReferencePipeline.java:496)
at java.util.stream.ReferencePipeline$$Lambda$6/2003749087.accept(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:496)
at de.jungblut.stuff.StreamPallel.main(StreamPallel.java:32)12386
Actually it's just a coincidence that this collector work. It doesn't work with custom data source. Consider this example:
List<String> list = IntStream.range(0, 10).parallel().mapToObj(String::valueOf)
.collect(oddLines());
System.out.println(list);
This produces always different result. The real cause is just because when BufferedReader.lines() stream is split by at least java.util.Spliterators.IteratorSpliterator.BATCH_UNIT number of lines which is 1024. If you have substantially bigger number of lines, it may fail even with BufferedReader:
String data = IntStream.range(0, 10000).mapToObj(String::valueOf)
.collect(Collectors.joining("\n"));
List<String> list = new BufferedReader(new StringReader(data)).lines().parallel()
.collect(oddLines());
list.stream().mapToInt(Integer::parseInt).filter(x -> x%2 != 0)
.forEach(System.out::println);
Were collector working normally this should not print anything. But sometimes it prints.

Categories