I want to write a test that execute many parallel calls to my API.
ExecutorService executor = Executors.newCachedThreadPool();
final int numOfUsers = 10;
for (int i = 0; i < numOfUsers; i++) {
executor.execute(() -> {
final Device device1 = getFirstDevice();
final ResponseDto responseDto = devicesServiceLocal.acquireDevice(device1.uuid, 4738);
if (responseDto.status == Status.SUCCESS)
{
successCount.incrementAndGet();
}
});
}
I know I can do it using executorThreadsPool, like this:
devicesList.parallelStream()
.map(device -> do something)
I could have created it with java8 parallel stream:
How can i do it on one device?
meaning I want few calls to acquire the same device.
something like this:
{{device}}.parallelStream().execute(myAction).times(10)
Yes it can, but...
You would think
Stream.generate(() -> device)
.limit(10)
.parallel()
.forEach(device -> device.execute());
should do the job. But NO, because reason (I really do not know why, no clue).
If I let device.execute() wait a second and then let it print something. The stream prints 10 times every second something. So it isn't at all parallel, not what you want.
Google is my friend and I found a lot of articles that warn against parallelStream. But my eye fell on http://blog.jooq.org/2014/06/13/java-8-friday-10-subtle-mistakes-when-using-the-streams-api/ number 8 and 9. 8 was saying if it is backed by a collection you'll have to sort it and it will magically work so:
Stream.generate(() -> device)
.limit(10)
.sorted((a,b)->0) // Sort it (kind of), what??
.parallel()
.forEach(device -> device.execute());
And now it prints after one second 8 times and after an other second 2 times something. I have 8 cores so that is what we (kind of) expect.
I used .forEach() in my stream, but at first I was (like your example) using .map(). .map() didn't print a thing: the stream was never consumed (see 9 in the linked article).
So, beware working with streams and especially parallel ones. You have to be sure you're stream is consumed, it is finite (.limit()), it is working parallel, etc. Streams are weird, I suggest keeping your working solution.
Note: if device.execute() is a blocking operation (IO, networking...) you will never have more than your number of cores (in my case 8) tasks that will be executed at the same time.
Update (thanks to Holger):
Holger gave an elegant alternative:
IntStream.range(0,10)
.parallel()
.mapToObject(i -> getDevice())
.forEach(device -> device.execute());
// Or shorter:
IntStream.range(0,10)
.parallel()
.forEach(i -> getDevice().execute());
which is just like a parallel for-loop (and it works).
Related
This question is based on the answers to this question What is the difference between Stream.of and IntStream.range?
Since the IntStream.range produces an already sorted stream, the output to the below code would only generate the output as 0:
IntStream.range(0, 4)
.peek(System.out::println)
.sorted()
.findFirst();
Also the spliterator would have SORTED characteristics. Below code returns true:
System.out.println(
IntStream.range(0, 4)
.spliterator()
.hasCharacteristics(Spliterator.SORTED)
);
Now, If I introduce a parallel() in the first code, then as expected, the output would contain all 4 numbers from 0 to 3 but in a random order, because the stream is not sorted anymore due to parallel().
IntStream.range(0, 4)
.parallel()
.peek(System.out::println)
.sorted()
.findFirst();
This would produce something like below: (in any random order)
2
0
1
3
So, I expect that the SORTED property has been removed due to parallel(). But, the below code returns true as well.
System.out.println(
IntStream.range(0, 4)
.parallel()
.spliterator()
.hasCharacteristics(Spliterator.SORTED)
);
Why doesn't the parallel() change SORTED property? And since all four numbers are printed, How does Java realize that the stream is not sorted even though the SORTED property still exists?
How exactly this is done is very much an implementation detail. You will have to dig deep inside the source code to really see why. Basically, parallel and sequential pipelines are just handled differently. Look at the AbstractPipeline.evaluate, which checks isParallel(), then does different things depending whether the pipeline is parallel.
return isParallel()
? terminalOp.evaluateParallel(this, sourceSpliterator(terminalOp.getOpFlags()))
: terminalOp.evaluateSequential(this, sourceSpliterator(terminalOp.getOpFlags()));
If you then look at SortedOps.OfInt, you'll see that it overrides two methods:
#Override
public Sink<Integer> opWrapSink(int flags, Sink sink) {
Objects.requireNonNull(sink);
if (StreamOpFlag.SORTED.isKnown(flags))
return sink;
else if (StreamOpFlag.SIZED.isKnown(flags))
return new SizedIntSortingSink(sink);
else
return new IntSortingSink(sink);
}
#Override
public <P_IN> Node<Integer> opEvaluateParallel(PipelineHelper<Integer> helper,
Spliterator<P_IN> spliterator,
IntFunction<Integer[]> generator) {
if (StreamOpFlag.SORTED.isKnown(helper.getStreamAndOpFlags())) {
return helper.evaluate(spliterator, false, generator);
}
else {
Node.OfInt n = (Node.OfInt) helper.evaluate(spliterator, true, generator);
int[] content = n.asPrimitiveArray();
Arrays.parallelSort(content);
return Nodes.node(content);
}
}
opWrapSink will be eventually called if it's a sequential pipeline, and opEvaluateParallel (as its name suggests) will be called when it's a parallel stream. Notice how opWrapSink doesn't do anything to the given sink if the pipeline is already sorted (just returns it unchanged), but opEvaluateParallel always evaluates the spliterator.
Also note that parallel-ness and sorted-ness are not mutually exclusive. You can have a stream with any combination of those characteristics.
"Sorted" is a characteristic of a Spliterator. It's not technically a characteristic of a Stream (like "parallel" is). Sure, parallel could create a stream with an entirely new spliterator (that gets elements from the original spliterator) with entirely new characteristics, but why do that, when you can just reuse the same spliterator? Id imagine you'll have to handle parallel and sequential streams differently in any case.
You need to take a step back and think of how you would solve such a problem in general, considering that ForkJoinPool is used for parallel streams and it works based on work stealing. It would be very helpful if you knew how a Spliterator works, too. Some details here.
You have a certain Stream, you "split it" (very simplified) into smaller pieces and give all those pieces to a ForkJoinPool for execution. All of those pieces are worked on independently, by individual threads. Since we are talking about threads here, there is obviously no sequence of events, things happen randomly (that is why you see a random order output).
If your stream preserves the order, terminal operation is suppose to preserve it too. So while intermediate operations are executed in any order, your terminal operation (if the stream up to that point is ordered), will handle elements in an ordered fashion. To put it slightly simplified:
System.out.println(
IntStream.of(1,2,3)
.parallel()
.map(x -> {System.out.println(x * 2); return x * 2;})
.boxed()
.collect(Collectors.toList()));
map will process elements in an unknown order (ForkJoinPool and threads, remember that), but collect will receive elements in order, "left to right".
Now, if we extrapolate that to your example: when you invoke parallel, the stream is split in small pieces and worked on. For example look how this is split (a single time).
Spliterator<Integer> spliterator =
IntStream.of(5, 4, 3, 2, 1, 5, 6, 7, 8)
.parallel()
.boxed()
.sorted()
.spliterator()
.trySplit(); // trySplit is invoked internally on parallel
spliterator.forEachRemaining(System.out::println);
On my machine it prints 1,2,3,4. This means that the internal implementation splits the stream in two Spliterators: left and right. left has [1, 2, 3, 4] and right has [5, 6, 7, 8]. But that is not it, these Spliterators can be split further. For example:
Spliterator<Integer> spliterator =
IntStream.of(5, 4, 3, 2, 1, 5, 6, 7, 8)
.parallel()
.boxed()
.sorted()
.spliterator()
.trySplit()
.trySplit()
.trySplit();
spliterator.forEachRemaining(System.out::println);
if you try to invoke trySplit again, you will get a null - meaning, that's it, I can't split anymore.
So, your Stream : IntStream.range(0, 4) is going to be split in 4 spliterators. All worked on individually, by a thread. If your first thread knows that this Spliterator it currently works on, is the "left-most one", that's it! The rest of the threads do not even need to start their work - the result is known.
On the other hand, it could be that this Spliterator that has the "left-most" element is only started last. So the first three ones, might already be done with their work (thus peek is invoked in your example), but they do not "produce" the needed result.
As a matter fact, this is how it is done internally. You do not need to understand the code - but the flow and the method names should be obvious.
I want change my code for single subscriber. Now i have
auctionFlux.window(Duration.ofSeconds(120), Duration.ofSeconds(120)).subscribe(
s -> s.groupBy(Auction::getItem).subscribe( longAuctionGroupedFlux -> longAuctionGroupedFlux.reduce(new ItemDumpStats(), this::calculateStats )
));
This code is working correctly reduce method is very simple. I tried change my code for single subscriber
auctionFlux.window(Duration.ofSeconds(120), Duration.ofSeconds(120))
.flatMap(window -> window.groupBy(Auction::getItem))
.flatMap(longAuctionGroupedFlux -> longAuctionGroupedFlux.reduce(new ItemDumpStats(), this::calculateStats))
.subscribe(itemDumpStatsMono -> log.info(itemDumpStatsMono.toString()));
This is my code, and this code is not working. No errors and no results. After debugging i found code is stuck on second flatMap when i reducing stream. I think problem is on flatMap merging, stucking on Mono resolve. Some one now how to fix this problem and use only single subscriber?
How to replicate, you can use another class or create one. In small size is working but on bigger is dying
List<Auction> auctionList = new ArrayList<>();
for (int i = 0;i<100000;i++){
Auction a = new Auction((long) i, "test");
a.setItem((long) (i%50));
auctionList.add(a);
}
Flux.fromIterable(auctionList).groupBy(Auction::getId).flatMap(longAuctionGroupedFlux ->
longAuctionGroupedFlux.reduce(new ItemDumpStats(), (itemDumpStats, auction) -> itemDumpStats)).collectList().subscribe(itemDumpStats -> System.out.println(itemDumpStats.toString()));
On this approach is instant result but I using 3 subscribers
Flux.fromIterable(auctionList)
.groupBy(Auction::getId)
.subscribe(
auctionIdAuctionGroupedFlux -> auctionIdAuctionGroupedFlux.reduce(new ItemDumpStats(), (itemDumpStats, auction) -> itemDumpStats).subscribe(itemDumpStats -> System.out.println(itemDumpStats.toString()
)
));
I think the behavior you described is related to the interaction between groupBy chained with flatMap.
Check groupBy documentation. It states that:
The groups need to be drained and consumed downstream for groupBy to work correctly. Notably when the criteria produces a large amount of groups, it can lead to hanging if the groups are not suitably consumed downstream (eg. due to a flatMap with a maxConcurrency parameter that is set too low).
By default, maxConcurrency (flatMap) is set to 256 (i checked the source code of 3.2.2). So,
selecting more than 256 groups may cause the execution to hang (particularly when all execution happens on the same thread).
The following code helps in understanding what happens when you chain the operators groupBy and flatMap:
#Test
public void groupAndFlatmapTest() {
val groupCount = 257;
val groupSize = 513;
val list = rangeClosed(1, groupSize * groupCount).boxed().collect(Collectors.toList());
val source = Flux.fromIterable(list)
.groupBy(i -> i % groupCount)
.flatMap(Flux::collectList);
StepVerifier.create(source).expectNextCount(groupCount).expectComplete().verify();
}
The execution of this code hangs. Changing groupCount to 256 or less makes the test pass (for every value of groupSize).
So, regarding your original problem, it is very possible that you are creating a large amount of groups with your key-selector Auction::getItem.
Adding parallel fixed problem, but i looking answer why reduce dramatically slow flatMap.
I have read a lot about Java 8 streams lately, and several articles about lazy loading with Java 8 streams specifically: here and over here. I can't seem to shake the feeling that lazy loading is COMPLETELY useless (or at best, a minor syntactic convenience offering zero performance value).
Let's take this code as an example:
int[] myInts = new int[]{1,2,3,5,8,13,21};
IntStream myIntStream = IntStream.of(myInts);
int[] myChangedArray = myIntStream
.peek(n -> System.out.println("About to square: " + n))
.map(n -> (int)Math.pow(n, 2))
.peek(n -> System.out.println("Done squaring, result: " + n))
.toArray();
This will log in the console, because the terminal operation, in this case toArray(), is called, and our stream is lazy and executes only when the terminal operation is called. Of course I can also do this:
IntStream myChangedInts = myIntStream
.peek(n -> System.out.println("About to square: " + n))
.map(n -> (int)Math.pow(n, 2))
.peek(n -> System.out.println("Done squaring, result: " + n));
And nothing will be printed, because the map isn't happening, because I don't need the data. Until I call this:
int[] myChangedArray = myChangedInts.toArray();
And voila, I get my mapped data, and my console logs. Except I see zero benefit to it whatsoever. I realize I can define the filter code long before I call to toArray(), and I can pass around this "not-really-filtered stream around), but so what? Is this the only benefit?
The articles seem to imply there is a performance gain associated with laziness, for example:
In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimized to make it being capable of processing the large amount of data with high performance.
and
Java 8 Streams API optimizes stream processing with the help of short circuiting operations. Short Circuit methods ends the stream processing as soon as their conditions are satisfied. In normal words short circuit operations, once the condition is satisfied just breaks all of the intermediate operations, lying before in the pipeline. Some of the intermediate as well as terminal operations have this behavior.
It sounds literally like breaking out of a loop, and not associated with laziness at all.
Finally, there is this perplexing line in the second article:
Lazy operations achieve efficiency. It is a way not to work on stale data. Lazy operations might be useful in the situations where input data is consumed gradually rather than having whole complete set of elements beforehand. For example consider the situations where an infinite stream has been created using Stream#generate(Supplier<T>) and the provided Supplier function is gradually receiving data from a remote server. In those kind of the situations server call will only be made at a terminal operation when it's needed.
Not working on stale data? What? How does lazy loading keep someone from working on stale data?
TLDR: Is there any benefit to lazy loading besides being able to run the filter/map/reduce/whatever operation at a later time (which offers zero performance benefit)?
If so, what's a real-world use case?
Your terminal operation, toArray(), perhaps supports your argument given that it requires all elements of the stream.
Some terminal operations don't. And for these, it would be a waste if streams weren't lazily executed. Two examples:
//example 1: print first element of 1000 after transformations
IntStream.range(0, 1000)
.peek(System.out::println)
.mapToObj(String::valueOf)
.peek(System.out::println)
.findFirst()
.ifPresent(System.out::println);
//example 2: check if any value has an even key
boolean valid = records.
.map(this::heavyConversion)
.filter(this::checkWithWebService)
.mapToInt(Record::getKey)
.anyMatch(i -> i % 2 == 0)
The first stream will print:
0
0
0
That is, intermediate operations will be run just on one element. This is an important optimization. If it weren't lazy, then all the peek() calls would have to run on all elements (absolutely unnecessary as you're interested in just one element). Intermediate operations can be expensive (such as in the second example)
Short-circuiting terminal operation (of which toArray isn't) make this optimization possible.
Laziness can be very useful for the users of your API, especially when the final result of the Stream pipeline evaluation might be very large!
The simple example is the Files.lines method in the Java API itself. If you don't want to read the whole file into the memory and you only need the first N lines, then just write:
Stream<String> stream = Files.lines(path); // lazy operation
List<String> result = stream.limit(N).collect(Collectors.toList()); // read and collect
You're right that there won't be a benefit from map().reduce() or map().collect(), but there's a pretty obvious benefit with findAny() findFirst(), anyMatch(), allMatch(), etc. Basically, any operation that can be short-circuited.
Good question.
Assuming you write textbook perfect code, the difference in performance between a properly optimized for and a stream is not noticeable (streams tend to be slightly better class loading wise, but the difference should not be noticeable in most cases).
Consider the following example.
// Some lengthy computation
private static int doStuff(int i) {
try { Thread.sleep(1000); } catch (InterruptedException e) { }
return i;
}
public static OptionalInt findFirstGreaterThanStream(int value) {
return IntStream
.of(MY_INTS)
.map(Main::doStuff)
.filter(x -> x > value)
.findFirst();
}
public static OptionalInt findFirstGreaterThanFor(int value) {
for (int i = 0; i < MY_INTS.length; i++) {
int mapped = Main.doStuff(MY_INTS[i]);
if(mapped > value){
return OptionalInt.of(mapped);
}
}
return OptionalInt.empty();
}
Given the above methods, the next test should show they execute in about the same time.
public static void main(String[] args) {
long begin;
long end;
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanStream(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanFor(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
}
OptionalInt[8]
5119
OptionalInt[8]
5001
Anyway, we spend most of the time in the doStuff method. Let's say we want to add more threads to the mix.
Adjusting the stream method is trivial (considering your operations meets the preconditions of parallel streams).
public static OptionalInt findFirstGreaterThanParallelStream(int value) {
return IntStream
.of(MY_INTS)
.parallel()
.map(Main::doStuff)
.filter(x -> x > value)
.findFirst();
}
Achieving the same behavior without streams can be tricky.
public static OptionalInt findFirstGreaterThanParallelFor(int value, Executor executor) {
AtomicInteger counter = new AtomicInteger(0);
CompletableFuture<OptionalInt> cf = CompletableFuture.supplyAsync(() -> {
while(counter.get() != MY_INTS.length-1);
return OptionalInt.empty();
});
for (int i = 0; i < MY_INTS.length; i++) {
final int current = MY_INTS[i];
executor.execute(() -> {
int mapped = Main.doStuff(current);
if(mapped > value){
cf.complete(OptionalInt.of(mapped));
} else {
counter.incrementAndGet();
}
});
}
try {
return cf.get();
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
return OptionalInt.empty();
}
}
The tests execute in about the same time again.
public static void main(String[] args) {
long begin;
long end;
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanParallelStream(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
ExecutorService executor = Executors.newFixedThreadPool(10);
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanParallelFor(5678, executor));
end = System.currentTimeMillis();
System.out.println(end-begin);
executor.shutdown();
executor.awaitTermination(10, TimeUnit.SECONDS);
executor.shutdownNow();
}
OptionalInt[8]
1004
OptionalInt[8]
1004
In conclusion, although we don't squeeze a big performance benefit out of streams (considering you write excellent multi-threaded code in your for alternative), the code itself tends to be more maintainable.
A (slightly off-topic) final note:
As with programming languages, higher level abstractions (streams relative to fors) make stuff easier to develop at the cost of performance. We did not move away from assembly to procedural languages to object-oriented languages because the later offered greater performance. We moved because it made us more productive (develop the same thing at a lower cost). If you are able to get the same performance out of a stream as you would do with a for and properly written multi-threaded code, I would say it's already a win.
I have a real example from our code base, since I'm going to simplify it, not entirely sure you might like it or fully grasp it...
We have a service that needs a List<CustomService>, I am suppose to call it. Now in order to call it, I am going to a database (much simpler than reality) and obtaining a List<DBObject>; in order to obtain a List<CustomService> from that, there are some heavy transformations that need to be done.
And here are my choices, transform in place and pass the list. Simple, yet, probably not that optimal. Second option, refactor the service, to accept a List<DBObject> and a Function<DBObject, CustomService>. And this sounds trivial, but it enables laziness (among other things). That service might sometimes need only a few elements from that List, or sometimes a max by some property, etc. - thus no need for me to do the heavy transformation for all elements, this is where Stream API pull based laziness is a winner.
Before Streams existed, we used to use guava. It had Lists.transform( list, function) that was lazy too.
It's not a fundamental feature of streams as such, it could have been done even without guava, but it's s lot simpler that way. The example here provided with findFirst is great and the simplest to understand; this is the entire point of laziness, elements are pulled only when needed, they are not passed from an intermediate operation to another in chunks, but pass from one stage to another one at a time.
One interesting use case that hasn't been mentioned is arbitrary composition of operations on streams, coming from different parts of the code base, responding to different sorts of business or technical requisites.
For example, say you have an application where certain users can see all the data but certain other users can only see part of it. The part of the code that checks user permissions can simply impose a filter on whatever stream is being handed about.
Without lazy streams, that same part of the code could be filtering the already realized full collection, but that may have been expensive to obtain, for no real gain.
Alternatively, that same part of the code might want to append its filter to a data source, but now it has to know whether the data comes from a database, so it can impose an additional WHERE clause, or some other source.
With lazy streams, it's a filter that can be implemented ever which way. Filters imposed on streams from the database can translate into the aforementioned WHERE clause, with obvious performance gains over filtering in-memory collections resulting from whole table reads.
So, a better abstraction, better performance, better code readability and maintainability, sounds like a win to me. :)
Non-lazy implementation would process all input and collect output to a new collection on each operation. Obviously, it's impossible for unlimited or large enough sources, memory-consuming otherwise, and unnecessarily memory-consuming in case of reducing and short-circuiting operations, so there are great benefits.
Check the following example
Stream.of("0","0","1","2","3","4")
.distinct()
.peek(a->System.out.println("after distinct: "+a))
.anyMatch("1"::equals);
If it was not behaving as lazy you would expect that all elements would pass through the distinct filtering first. But because of lazy execution it behaves differently. It will stream the minimum amount of elements needed to calculate the result.
The above example will print
after distinct: 0
after distinct: 1
How it works analytically:
First "0" goes until the terminal operation but does not satisfy it. Another element must be streamed.
Second "0" is filtered through .distinct() and never reaches terminal operation.
Since the terminal operation is not satisfied yet, next element is streamed.
"1" goes through terminal operation and satisfies it.
No more elements need to be streamed.
Suppose I have 3 downloads, framed as completable futures:
CompletableFuture<Doc> dl1 = CompletableFuture.supplyAsync(() -> download("file1"));
CompletableFuture<Doc> dl2 = CompletableFuture.supplyAsync(() -> download("file2"));
CompletableFuture<Doc> dl3 = CompletableFuture.supplyAsync(() -> download("file3"));
Then all of them need to be handled the same way
CompletableFuture<String> s1 = dl1.thenApply(Doc::getFilename);
CompletableFuture<String> s2 = dl2.thenApply(Doc::getFilename);
CompletableFuture<String> s3 = dl3.thenApply(Doc::getFilename);
And you can imagine multiple functions to be applied, all in parallel.
According to the DRY principle, this example seems inappropriate. So I'm looking for a solution to define only 1 workflow that is executed 3 times, in parallel.
How can this be accomplished?
I tried allOf, but that has two problems 1) it starts blocking and 2) the return type can only run stuff instead of handle it.
Stream.of("file1", "file2", "file3") // or your input in any other format, that can easily be transformed to a stream...
// .parallel() // well... depends...
.map(s -> CompletableFuture.supplyAsync(() -> download(s)))
.map(dl -> dl.thenApply(Doc::getFilename))
.map(CompletableFuture::join) // if you want to have all the results collected
.collect(Collectors.toList());
Of course also the two map-calls can be combined. But at least you do not write everything x times... If you do not like the collection to List you can also call something else on it, e.g. .forEach(System.out::println). The .forEach has the benefit, that as soon as the response is available, the consumer is called.
Or the classic: just use a loop and a list/array for your input, but you may need to take care of more than you would have with streams
I actually tried to answer this question How to skip even lines of a Stream<String> obtained from the Files.lines. So I though this collector wouldn't work well in parallel:
private static Collector<String, ?, List<String>> oddLines() {
int[] counter = {1};
return Collector.of(ArrayList::new,
(l, line) -> {
if (counter[0] % 2 == 1) l.add(line);
counter[0]++;
},
(l1, l2) -> {
l1.addAll(l2);
return l1;
});
}
but it works.
EDIT: It didn't actually work; I got fooled by the fact that my input set was too small to trigger any parallelism; see discussion in comments.
I thought it wouldn't work because of the two following plans of executions comes to my mind.
1. The counter array is shared among all threads.
Thread t1 read the first element of the Stream, so the if condition is satisfied. It adds the first element to its list. Then the execution stops before he has the time to update the array value.
Thread t2, which says started at the 4th element of the stream add it to its list. So we end up with a non-wanted element.
Of course since this collector seems to works, I guess it doesn't work like that. And the updates are not atomic anyway.
2. Each Thread has its own copy of the array
In this case there is no more problems for the update, but nothing prevents me that the thread t2 will not start at the 4th element of the stream. So he doesn't work like that either.
So it seems that it doesn't work like that at all, which brings me to the question... how the collector is used in parallel?
Can someone explain me basically how it works and why my collector works when ran in parallel?
Thank you very much!
Passing a parallel() source stream into your collector is enough to break the logic because your shared state (counter) may be incremented from different tasks. You can verify that, because it is never returning the correct result for any finite stream input:
Stream<String> lines = IntStream.range(1, 20000).mapToObj(i -> i + "");
System.out.println(lines.isParallel());
lines = lines.parallel();
System.out.println(lines.isParallel());
List<String> collected = lines.collect(oddLines());
System.out.println(collected.size());
Note that for infinite streams (e.g. when reading from Files.lines()) you need to generate some significant amount of data in the stream, so it actually forks a task to run some chunks concurrently.
Output for me is:
false
true
12386
Which is clearly wrong.
As #Holger in the comments correctly pointed out, there is a different race that can happen when your collector is specifying CONCURRENT and UNORDERED, in which case they operate on a single shared collection across tasks (ArrayList::new called once per stream), where-as with only parallel() it will run the accumulator on a collection per task and then later combine the result using your defined combiner.
If you'd add the characteristics to the collector, you might run into the following result due to the shared state in a single collection:
false
true
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 73
at java.util.ArrayList.add(ArrayList.java:459)
at de.jungblut.stuff.StreamPallel.lambda$0(StreamPallel.java:18)
at de.jungblut.stuff.StreamPallel$$Lambda$3/1044036744.accept(Unknown Source)
at java.util.stream.ReferencePipeline.lambda$collect$207(ReferencePipeline.java:496)
at java.util.stream.ReferencePipeline$$Lambda$6/2003749087.accept(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:496)
at de.jungblut.stuff.StreamPallel.main(StreamPallel.java:32)12386
Actually it's just a coincidence that this collector work. It doesn't work with custom data source. Consider this example:
List<String> list = IntStream.range(0, 10).parallel().mapToObj(String::valueOf)
.collect(oddLines());
System.out.println(list);
This produces always different result. The real cause is just because when BufferedReader.lines() stream is split by at least java.util.Spliterators.IteratorSpliterator.BATCH_UNIT number of lines which is 1024. If you have substantially bigger number of lines, it may fail even with BufferedReader:
String data = IntStream.range(0, 10000).mapToObj(String::valueOf)
.collect(Collectors.joining("\n"));
List<String> list = new BufferedReader(new StringReader(data)).lines().parallel()
.collect(oddLines());
list.stream().mapToInt(Integer::parseInt).filter(x -> x%2 != 0)
.forEach(System.out::println);
Were collector working normally this should not print anything. But sometimes it prints.