Stream.reduce has 3 method overloads.
reduce(BinaryOperator<T> accumulator)
reduce(T identity, BinaryOperator<T> accumulator)
reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)
1st overload can be used to calculate sum of integer list for example.
2nd overload is the same but if the list is empty it just returns the default value.
I'm having a hard time understanding how third overload (Stream.reduce(identity, accumulator, combiner)) works and what is a use case of that. So, how does it work, and why does that exists?
If I understand correctly, your question is about the third argument combiner.
Firstly, one of the goals of Java was to have similar APIs for sequential and parallel streams.
The 3-argument version of reduce is useful for parallel streams.
Suppose you are reducing from value of Collection<T> to U type and you are using parallel stream versions. The parallel stream splits the collection T into smaller streams and generates a u' value for each by using the second function. But now these different u' values have to be combined ? How do they get combined ? The third function is the one that provides that logic.
Basically it combines a mapping function with a reduction. Most of the examples I've seen for this don't really demonstrate why it's preferrable to calling map() and a normal reduce() in separate steps. The API Note comes in handy here:
Many reductions using this form can be represented more simply by an explicit combination of map and reduce operations. The accumulator function acts as a fused mapper and accumulator, which can sometimes be more efficient than separate mapping and reduction, such as when knowing the previously reduced value allows you to avoid some computation.
So let's say we have a Stream<String> numbers, and we want to parse them to BigDecimal and calculate their product. We could do something like this:
BigDecimal product = numbers.map(BigDecimal::new)
.reduce(BigDecimal.ONE, BigDecimal::multiply);
But this has an inefficiency. If one of the numbers is "0", we're wasting cycles converting the remainder to BigDecimal. We can use the 3-arg reduce() here to bypass the mapping logic:
BigDecimal product = numbers.reduce(BigDecimal.ONE,
(d, n) -> d.equals(BigDecimal.ZERO) ? BigDecimal.ZERO : new BigDecimal(n).multiply(d),
BigDecimal::multiply);
Of course it would be even more efficient to short-circuit the stream entirely, but that's tricky to do in a stream, especially in parallel. And this is just an example to get the concept across.
Note: Some of the examples are contrived for demonstration. In some instances a simple .sum() could have been used.
The big difference, imo, is that the third form has a BiFunction as a second argument instead of a BinaryOperator. So you can use the third form to change the result type. It also has a BinaryOperator as a combiner to combine the different results from parallel operations.
Generate some data
record Data(String name, int value) {}
Random r = new Random();
List<Data> dataList = r.ints(1000, 1, 20).mapToObj(i->new Data("Item"+i, i)).toList();
No parallel operation but different types. But the third argument is required so just return the sum.
int sum = dataList.stream().reduce(0, (item, data) -> item + data.value,
(finalSum, partialSum) -> finalSum);
System.out.println(sum);
prints
10162
The second form. Use map to get the value to be summed. BinaryOperator used here since types are the same and no parallel operation.
sum = dataList.stream().map(Data::value).reduce(0, (sum1,val)->sum1+val);
System.out.println(sum); // print same as above
This shows the same as above but in parallel. The third argument accumulates partial sums. And those sums are accumulated as the next thread finishes so there may not be a sensible order to the output.
sum = dataList.parallelStream().reduce(0, (sum1, data) -> sum1 + data.value,
(finalSum, partialSum) -> {
System.out.println("Adding " + partialSum + " to " + finalSum);
finalSum += partialSum;
return finalSum;
});
System.out.println(sum);
prints something like the following
Adding 586 to 670
Adding 567 to 553
Adding 1256 to 1120
Adding 715 to 620
Adding 624 to 601
Adding 1335 to 1225
Adding 2560 to 2376
Adding 662 to 579
Adding 706 to 715
Adding 1421 to 1241
Adding 713 to 689
Adding 576 to 586
Adding 1402 to 1162
Adding 2662 to 2564
Adding 4936 to 5226
10162
One final note. None of the Collectors.reducing methods have a BiFunction to handle different types. To handle this the second argument is a Function to act as a mapper so the third argument, a BinaryOperator can collect the values.
sum = dataList.parallelStream().collect(
Collectors.reducing(0, Data::value, (finalSum, partialSum) -> {
System.out.println(
"Adding " + partialSum + " to " + finalSum);
return finalSum + partialSum;
}));
System.out.println(sum);
Related
I'm having trouble fully understanding the role that the combiner fulfills in Streams reduce method.
For example, the following code doesn't compile:
int length = asList("str1", "str2").stream()
.reduce(0, (accumulatedInt, str) -> accumulatedInt + str.length());
Compile error says :
(argument mismatch; int cannot be converted to java.lang.String)
but this code does compile:
int length = asList("str1", "str2").stream()
.reduce(0, (accumulatedInt, str ) -> accumulatedInt + str.length(),
(accumulatedInt, accumulatedInt2) -> accumulatedInt + accumulatedInt2);
I understand that the combiner method is used in parallel streams - so in my example it is adding together two intermediate accumulated ints.
But I don't understand why the first example doesn't compile without the combiner or how the combiner is solving the conversion of string to int since it is just adding together two ints.
Can anyone shed light on this?
Eran's answer described the differences between the two-arg and three-arg versions of reduce in that the former reduces Stream<T> to T whereas the latter reduces Stream<T> to U. However, it didn't actually explain the need for the additional combiner function when reducing Stream<T> to U.
One of the design principles of the Streams API is that the API shouldn't differ between sequential and parallel streams, or put another way, a particular API shouldn't prevent a stream from running correctly either sequentially or in parallel. If your lambdas have the right properties (associative, non-interfering, etc.) a stream run sequentially or in parallel should give the same results.
Let's first consider the two-arg version of reduction:
T reduce(I, (T, T) -> T)
The sequential implementation is straightforward. The identity value I is "accumulated" with the zeroth stream element to give a result. This result is accumulated with the first stream element to give another result, which in turn is accumulated with the second stream element, and so forth. After the last element is accumulated, the final result is returned.
The parallel implementation starts off by splitting the stream into segments. Each segment is processed by its own thread in the sequential fashion I described above. Now, if we have N threads, we have N intermediate results. These need to be reduced down to one result. Since each intermediate result is of type T, and we have several, we can use the same accumulator function to reduce those N intermediate results down to a single result.
Now let's consider a hypothetical two-arg reduction operation that reduces Stream<T> to U. In other languages, this is called a "fold" or "fold-left" operation so that's what I'll call it here. Note this doesn't exist in Java.
U foldLeft(I, (U, T) -> U)
(Note that the identity value I is of type U.)
The sequential version of foldLeft is just like the sequential version of reduce except that the intermediate values are of type U instead of type T. But it's otherwise the same. (A hypothetical foldRight operation would be similar except that the operations would be performed right-to-left instead of left-to-right.)
Now consider the parallel version of foldLeft. Let's start off by splitting the stream into segments. We can then have each of the N threads reduce the T values in its segment into N intermediate values of type U. Now what? How do we get from N values of type U down to a single result of type U?
What's missing is another function that combines the multiple intermediate results of type U into a single result of type U. If we have a function that combines two U values into one, that's sufficient to reduce any number of values down to one -- just like the original reduction above. Thus, the reduction operation that gives a result of a different type needs two functions:
U reduce(I, (U, T) -> U, (U, U) -> U)
Or, using Java syntax:
<U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)
In summary, to do parallel reduction to a different result type, we need two functions: one that accumulates T elements to intermediate U values, and a second that combines the intermediate U values into a single U result. If we aren't switching types, it turns out that the accumulator function is the same as the combiner function. That's why reduction to the same type has only the accumulator function and reduction to a different type requires separate accumulator and combiner functions.
Finally, Java doesn't provide foldLeft and foldRight operations because they imply a particular ordering of operations that is inherently sequential. This clashes with the design principle stated above of providing APIs that support sequential and parallel operation equally.
Since I like doodles and arrows to clarify concepts... let's start!
From String to String (sequential stream)
Suppose having 4 strings: your goal is to concatenate such strings into one. You basically start with a type and finish with the same type.
You can achieve this with
String res = Arrays.asList("one", "two","three","four")
.stream()
.reduce("",
(accumulatedStr, str) -> accumulatedStr + str); //accumulator
and this helps you to visualize what's happening:
The accumulator function converts, step by step, the elements in your (red) stream to the final reduced (green) value. The accumulator function simply transforms a String object into another String.
From String to int (parallel stream)
Suppose having the same 4 strings: your new goal is to sum their lengths, and you want to parallelize your stream.
What you need is something like this:
int length = Arrays.asList("one", "two","three","four")
.parallelStream()
.reduce(0,
(accumulatedInt, str) -> accumulatedInt + str.length(), //accumulator
(accumulatedInt, accumulatedInt2) -> accumulatedInt + accumulatedInt2); //combiner
and this is a scheme of what's happening
Here the accumulator function (a BiFunction) allows you to transform your String data to an int data. Being the stream parallel, it's splitted in two (red) parts, each of which is elaborated independently from eachother and produces just as many partial (orange) results. Defining a combiner is needed to provide a rule for merging partial int results into the final (green) int one.
From String to int (sequential stream)
What if you don't want to parallelize your stream? Well, a combiner needs to be provided anyway, but it will never be invoked, given that no partial results will be produced.
The two and three argument versions of reduce which you tried to use don't accept the same type for the accumulator.
The two argument reduce is defined as :
T reduce(T identity,
BinaryOperator<T> accumulator)
In your case, T is String, so BinaryOperator<T> should accept two String arguments and return a String. But you pass to it an int and a String, which results in the compilation error you got - argument mismatch; int cannot be converted to java.lang.String. Actually, I think passing 0 as the identity value is also wrong here, since a String is expected (T).
Also note that this version of reduce processes a stream of Ts and returns a T, so you can't use it to reduce a stream of String to an int.
The three argument reduce is defined as :
<U> U reduce(U identity,
BiFunction<U,? super T,U> accumulator,
BinaryOperator<U> combiner)
In your case U is Integer and T is String, so this method will reduce a stream of String to an Integer.
For the BiFunction<U,? super T,U> accumulator you can pass parameters of two different types (U and ? super T), which in your case are Integer and String. In addition, the identity value U accepts an Integer in your case, so passing it 0 is fine.
Another way to achieve what you want :
int length = asList("str1", "str2").stream().mapToInt (s -> s.length())
.reduce(0, (accumulatedInt, len) -> accumulatedInt + len);
Here the type of the stream matches the return type of reduce, so you can use the two parameter version of reduce.
Of course you don't have to use reduce at all :
int length = asList("str1", "str2").stream().mapToInt (s -> s.length())
.sum();
There is no reduce version that takes two different types without a combiner since it can't be executed in parallel (not sure why this is a requirement). The fact that accumulator must be associative makes this interface pretty much useless since:
list.stream().reduce(identity,
accumulator,
combiner);
Produces the same results as:
list.stream().map(i -> accumulator(identity, i))
.reduce(identity,
combiner);
I have been skimming through the news and the source code of the newest LTE Java 17 version and I have encountered with new Stream method called mapMulti. The early-access JavaDoc says it is similar to flatMap.
<R> Stream<R> mapMulti(BiConsumer<? super T,? super Consumer<R>> mapper)
How to perform one to 0..n mapping using this method?
How does the new method work and how does it differ from flatMap. When is each one preferable?
How many times the mapper can be called?
Stream::mapMulti is a new method that is classified as an intermediate operation.
It requires a BiConsumer<T, Consumer<R>> mapper of the element about to be processed a Consumer. The latter makes the method look strange at the first glance because it is different from what we are used to at the other intermediate methods such as map, filter, or peek where none of them use any variation of *Consumer.
The purpose of the Consumer provided right within the lambda expression by the API itself is to accept any number elements to be available in the subsequent pipeline. Therefore, all the elements, regardless of how many, will be propagated.
Explanation using simple snippets
One to some (0..1) mapping (similar to filter)
Using the consumer.accept(R r) for only a few selected items achieves filter-alike pipeline. This might get useful in case of checking the element against a predicate and it's mapping to a different value, which would be otherwise done using a combination of filter and map instead. The following
Stream.of("Java", "Python", "JavaScript", "C#", "Ruby")
.mapMulti((str, consumer) -> {
if (str.length() > 4) {
consumer.accept(str.length()); // lengths larger than 4
}
})
.forEach(i -> System.out.print(i + " "));
// 6 10
One to one mapping (similar to map)
Working with the previous example, when the condition is omitted and every element is mapped into a new one and accepted using the consumer, the method effectively behaves like map:
Stream.of("Java", "Python", "JavaScript", "C#", "Ruby")
.mapMulti((str, consumer) -> consumer.accept(str.length()))
.forEach(i -> System.out.print(i + " "));
// 4 6 10 2 4
One to many mapping (similar to flatMap)
Here things get interesting because one can call consumer.accept(R r) any number of times. Let's say we want to replicate the number representing the String length by itself, i.e. 2 becomes 2, 2. 4 becomes 4, 4, 4, 4. and 0 becomes nothing.
Stream.of("Java", "Python", "JavaScript", "C#", "Ruby", "")
.mapMulti((str, consumer) -> {
for (int i = 0; i < str.length(); i++) {
consumer.accept(str.length());
}
})
.forEach(i -> System.out.print(i + " "));
// 4 4 4 4 6 6 6 6 6 6 10 10 10 10 10 10 10 10 10 10 2 2 4 4 4 4
Comparison with flatMap
The very idea of this mechanism is that is can be called multiple times (including zero) and its usage of SpinedBuffer internally allows to push the elements into a single flattened Stream instance without creating a new one for every group of output elements unlike flatMap. The JavaDoc states two use-cases when using this method is preferable over flatMap:
When replacing each stream element with a small (possibly zero) number of elements. Using this method avoids the overhead of creating a new Stream instance for every group of result elements, as required by flatMap.
When it is easier to use an imperative approach for generating result elements than it is to return them in the form of a Stream.
Performance-wise, the new method mapMulti is a winner in such cases. Check out the benchmark at the bottom of this answer.
Filter-map scenario
Using this method instead of filter or map separately doesn't make sense due to its verbosity and the fact one intermediate stream is created anyway. The exception might be replacing the .filter(..).map(..) chain called together, which comes handy in the case such as checking the element type and its casting.
int sum = Stream.of(1, 2.0, 3.0, 4F, 5, 6L)
.mapMultiToInt((number, consumer) -> {
if (number instanceof Integer) {
consumer.accept((Integer) number);
}
})
.sum();
// 6
int sum = Stream.of(1, 2.0, 3.0, 4F, 5, 6L)
.filter(number -> number instanceof Integer)
.mapToInt(number -> (Integer) number)
.sum();
As seen above, its variations like mapMultiToDouble, mapMultiToInt and mapMultiToLong were introduced. This comes along the mapMulti methods within the primitive Streams such as IntStream mapMulti(IntStream.IntMapMultiConsumer mapper). Also, three new functional interfaces were introduced. Basically, they are the primitive variations of BiConsumer<T, Consumer<R>>, example:
#FunctionalInterface
interface IntMapMultiConsumer {
void accept(int value, IntConsumer ic);
}
Combined real use-case scenario
The real power of this method is in its flexibility of usage and creating only one Stream at a time, which is the major advantage over flatMap. The two below snippets represent a flatmapping of Product and its List<Variation> into 0..n offers represented by the Offer class and based on certain conditions (product category and the variation availability).
Product with String name, int basePrice, String category and List<Variation> variations.
Variation with String name, int price and boolean availability.
List<Product> products = ...
List<Offer> offers = products.stream()
.mapMulti((product, consumer) -> {
if ("PRODUCT_CATEGORY".equals(product.getCategory())) {
for (Variation v : product.getVariations()) {
if (v.isAvailable()) {
Offer offer = new Offer(
product.getName() + "_" + v.getName(),
product.getBasePrice() + v.getPrice());
consumer.accept(offer);
}
}
}
})
.collect(Collectors.toList());
List<Product> products = ...
List<Offer> offers = products.stream()
.filter(product -> "PRODUCT_CATEGORY".equals(product.getCategory()))
.flatMap(product -> product.getVariations().stream()
.filter(Variation::isAvailable)
.map(v -> new Offer(
product.getName() + "_" + v.getName(),
product.getBasePrice() + v.getPrice()
))
)
.collect(Collectors.toList());
The use of mapMulti is more imperatively inclined compared to the declarative approach of the previous-versions Stream methods combination seen in the latter snippet using flatMap, map, and filter. From this perspective, it depends on the use-case whether is easier to use an imperative approach. Recursion is a good example described in the JavaDoc.
Benchmark
As promised, I have wrote a bunch of micro-benchmarks from ideas collected from the comments. As long as there is quite a lot of code to publish, I have created a GitHub repository with the implementation details and I am about to share the results only.
Stream::flatMap(Function) vs Stream::mapMulti(BiConsumer) Source
Here we can see the huge difference and a proof the newer method actually works as described and its usage avoid the overhead of creating a new Stream instance with each processed element.
Benchmark Mode Cnt Score Error Units
MapMulti_FlatMap.flatMap avgt 25 73.852 ± 3.433 ns/op
MapMulti_FlatMap.mapMulti avgt 25 17.495 ± 0.476 ns/op
Stream::filter(Predicate).map(Function) vs Stream::mapMulti(BiConsumer) Source
Using chained pipelines (not nested, though) is fine.
Benchmark Mode Cnt Score Error Units
MapMulti_FilterMap.filterMap avgt 25 7.973 ± 0.378 ns/op
MapMulti_FilterMap.mapMulti avgt 25 7.765 ± 0.633 ns/op
Stream::flatMap(Function) with Optional::stream() vs Stream::mapMulti(BiConsumer) Source
This one is very interesting, escpecially in terms of usage (see the source code): we are now able to flatten using mapMulti(Optional::ifPresent) and as expected, the new method is a bit faster in this case.
Benchmark Mode Cnt Score Error Units
MapMulti_FlatMap_Optional.flatMap avgt 25 20.186 ± 1.305 ns/op
MapMulti_FlatMap_Optional.mapMulti avgt 25 10.498 ± 0.403 ns/op
To address the scenario
When it is easier to use an imperative approach for generating result elements than it is to return them in the form of a Stream.
We can see it as now having a limited variant of the yield statement C#. The limitations are that we always need an initial input from a stream, as this is an intermediate operation, further, there’s no short-circuiting for the elements we’re pushing in one function evaluation.
Still, it opens interesting opportunities.
E.g., implementing a stream of Fibonacci number formerly required a solution using temporary objects capable of holding two values.
Now, we can use something like:
IntStream.of(0)
.mapMulti((a,c) -> {
for(int b = 1; a >=0; b = a + (a = b))
c.accept(a);
})
/* additional stream operations here */
.forEach(System.out::println);
It stops when the int values overflow, as said, it won’t short-circuit when we use a terminal operation that does not consume all values, however, this loop producing then-ignored values might still be faster than the other approaches.
Another example inspired by this answer, to iterate over a class hierarchy from root to most specific:
Stream.of(LinkedHashMap.class).mapMulti(MapMultiExamples::hierarchy)
/* additional stream operations here */
.forEach(System.out::println);
}
static void hierarchy(Class<?> cl, Consumer<? super Class<?>> co) {
if(cl != null) {
hierarchy(cl.getSuperclass(), co);
co.accept(cl);
}
}
which unlike the old approaches does not require additional heap storage and will likely run faster (assuming reasonable class depths that do not make recursion backfire).
Also monsters like this
List<A> list = IntStream.range(0, r_i).boxed()
.flatMap(i -> IntStream.range(0, r_j).boxed()
.flatMap(j -> IntStream.range(0, r_k)
.mapToObj(k -> new A(i, j, k))))
.collect(Collectors.toList());
can now be written like
List<A> list = IntStream.range(0, r_i).boxed()
.<A>mapMulti((i,c) -> {
for(int j = 0; j < r_j; j++) {
for(int k = 0; k < r_k; k++) {
c.accept(new A(i, j, k));
}
}
})
.collect(Collectors.toList());
Compared to the nested flatMap steps, it loses some parallelism opportunity, which the reference implementation didn’t exploit anyway. For a non-short-circuiting operation like above, the new method likely will benefit from the reduced boxing and less instantiation of capturing lambda expressions. But of course, it should be used judiciously, not to rewrite every construct to an imperative version (after so many people tried to rewrite every imperative code into a functional version)…
I want to sum a list of Integers. It works as follows, but the syntax does not feel right. Could the code be optimized?
Map<String, Integer> integers;
integers.values().stream().mapToInt(i -> i).sum();
This will work, but the i -> i is doing some automatic unboxing which is why it "feels" strange. mapToInt converts the stream to an IntStream "of primitive int-valued elements". Either of the following will work and better explain what the compiler is doing under the hood with your original syntax:
integers.values().stream().mapToInt(i -> i.intValue()).sum();
integers.values().stream().mapToInt(Integer::intValue).sum();
I suggest 2 more options:
integers.values().stream().mapToInt(Integer::intValue).sum();
integers.values().stream().collect(Collectors.summingInt(Integer::intValue));
The second one uses Collectors.summingInt() collector, there is also a summingLong() collector which you would use with mapToLong.
And a third option: Java 8 introduces a very effective LongAdder accumulator designed to speed-up summarizing in parallel streams and multi-thread environments. Here, here's an example use:
LongAdder a = new LongAdder();
map.values().parallelStream().forEach(a::add);
sum = a.intValue();
From the docs
Reduction operations
A reduction operation (also called a fold) takes a sequence of input elements and combines them into a single summary result by repeated application of a combining operation, such as finding the sum or maximum of a set of numbers, or accumulating elements into a list. The streams classes have multiple forms of general reduction operations, called reduce() and collect(), as well as multiple specialized reduction forms such as sum(), max(), or count().
Of course, such operations can be readily implemented as simple sequential loops, as in:
int sum = 0;
for (int x : numbers) {
sum += x;
}
However, there are good reasons to prefer a reduce operation over a mutative accumulation such as the above. Not only is a reduction "more abstract" -- it operates on the stream as a whole rather than individual elements -- but a properly constructed reduce operation is inherently parallelizable, so long as the function(s) used to process the elements are associative and stateless. For example, given a stream of numbers for which we want to find the sum, we can write:
int sum = numbers.stream().reduce(0, (x,y) -> x+y);
or:
int sum = numbers.stream().reduce(0, Integer::sum);
These reduction operations can run safely in parallel with almost no modification:
int sum = numbers.parallelStream().reduce(0, Integer::sum);
So, for a map you would use:
integers.values().stream().mapToInt(i -> i).reduce(0, (x,y) -> x+y);
Or:
integers.values().stream().reduce(0, Integer::sum);
You can use reduce method:
long sum = result.stream().map(e -> e.getCreditAmount()).reduce(0L, (x, y) -> x + y);
or
long sum = result.stream().map(e -> e.getCreditAmount()).reduce(0L, Integer::sum);
You can use reduce() to sum a list of integers.
int sum = integers.values().stream().reduce(0, Integer::sum);
You can use collect method to add list of integers.
List<Integer> list = Arrays.asList(2, 4, 5, 6);
int sum = list.stream().collect(Collectors.summingInt(Integer::intValue));
I have declared a list of Integers.
ArrayList<Integer> numberList = new ArrayList<Integer>(Arrays.asList(1, 2, 3, 4, 5));
You can try using these different ways below.
Using mapToInt
int sum = numberList.stream().mapToInt(Integer::intValue).sum();
Using summarizingInt
int sum = numberList.stream().collect(Collectors.summarizingInt(Integer::intValue)).getSum();
Using reduce
int sum = numberList.stream().reduce(Integer::sum).get().intValue();
May this help those who have objects on the list.
If you have a list of objects and wanted to sum specific fields of this object use the below.
List<ResultSom> somList = MyUtil.getResultSom();
BigDecimal result= somList.stream().map(ResultSom::getNetto).reduce(
BigDecimal.ZERO, BigDecimal::add);
This would be the shortest way to sum up int type array (for long array LongStream, for double array DoubleStream and so forth). Not all the primitive integer or floating point types have the Stream implementation though.
IntStream.of(integers).sum();
Unfortunately looks like the Stream API only returns normal streams from, say, List<Integer>#stream(). Guess they're pretty much forced to because of how generics work.
These normal Streams are of generic objects so don't have specialized methods like sum() etc. so you have to use the weird re-stream "looks like a no-op" conversion by default to get to those methods... .mapToInt(i -> i).
Another option is using "Eclipse Collections" which are like an expanded java Stream API
IntLists.immutable.ofAll(integers.values()).sum();
There is one more option no one considered here and it reflects on usage of multi-core environment. If you want to use its advantages, then next code should be used instead of the other mentioned solutions:
int sum = integers.values().parallelStream().mapToInt(Integer::intValue)
.reduce(0, Integer::sum, Integer::sum);
This solution is similar to other ones, but please notice the third argument in reduce. It tells compiler what to do with partial summaries calculated in different chunks of the stream, by different threads. Also instead of stream(), the parallelStream() is used. In this case it would just summarize it. The other option to put as third argument is (i, j) -> i + j, which means that it would add a value of a stream chunk (j) to the current value (i) and use it as a current value for the next stream chunk until all partial results are processed.
Even when using plain stream() it is useful to tell to reduce what to do with stream chunks' summaries, just in case someone, or you, would like to parallelize it in the future. The initial development is best time for that, since later on you need to remember what this is supposed to be and need to spend some time in understanding the purpose of that code again.
And of course instead of method reference operator you can have different dialect of lambda. I prefer it this way as more compact and still easy readable.
Also remember this can be used for more complex calculations too, but always be aware there are no guarantees about sequence and deployment of stream elements to threads.
IntStream.of(1, 2, 23).sum();
IntStream.of(1, 2, 23,1, 2, 23,1, 2, 23).max().getAsInt();
Similar questions have been asked, here and here, but given the advent of Java 8, and the generally outdated nature of these questions I'm wondering if now there'd be something at least kindred to it?
This is what I'm referring to.
You can use a lambda and Stream.reduce, there is a page in the docs dedicated to reductions:
Integer totalAgeReduce = roster
.stream()
.map(Person::getAge)
.reduce(
0,
(a, b) -> a + b);
This is the example used in the Python docs implemented with Java 8 streams:
List<Integer> numbers = Arrays.asList(new Integer[] { 1, 2, 3, 4, 5 });
Optional<Integer> sum = numbers.stream().reduce((a, b) -> a + b);
System.out.println(sum.get());
The Stream.reduce Method
The Stream.reduce method is a general-purpose reduction operation. Consider the following pipeline, which calculates the sum of the male members' ages in the collection roster. It uses the Stream.sum reduction operation:
Integer totalAge = roster
.stream()
.mapToInt(Person::getAge)
.sum();
Compare this with the following pipeline, which uses the Stream.reduce operation to calculate the same value:
Integer totalAgeReduce = roster
.stream()
.map(Person::getAge)
.reduce(
0,
(a, b) -> a + b);
The reduce operation in this example takes two arguments:
identity: The identity element is both the initial value of the reduction and the default result if there are no elements in the stream. In this example, the identity element is 0; this is the initial value of the sum of ages and the default value if no members exist in the collection roster.
accumulator: The accumulator function takes two parameters: a partial result of the reduction (in this example, the sum of all processed integers so far) and the next element of the stream (in this example, an integer). It returns a new partial result. In this example, the accumulator function is a lambda expression that adds two Integer values and returns an Integer value:
(a, b) -> a + b
The reduce operation always returns a new value. However, the accumulator function also returns a new value every time it processes an element of a stream. Suppose that you want to reduce the elements of a stream to a more complex object, such as a collection. This might hinder the performance of your application. If your reduce operation involves adding elements to a collection, then every time your accumulator function processes an element, it creates a new collection that includes the element, which is inefficient. It would be more efficient for you to update an existing collection instead. You can do this with the Stream.collect method, which the next section describes.
The official oracle tutorial describes how the Stream.reduce works. Please have a look, I believe it will answer your query.
I'm having trouble fully understanding the role that the combiner fulfills in Streams reduce method.
For example, the following code doesn't compile:
int length = asList("str1", "str2").stream()
.reduce(0, (accumulatedInt, str) -> accumulatedInt + str.length());
Compile error says :
(argument mismatch; int cannot be converted to java.lang.String)
but this code does compile:
int length = asList("str1", "str2").stream()
.reduce(0, (accumulatedInt, str ) -> accumulatedInt + str.length(),
(accumulatedInt, accumulatedInt2) -> accumulatedInt + accumulatedInt2);
I understand that the combiner method is used in parallel streams - so in my example it is adding together two intermediate accumulated ints.
But I don't understand why the first example doesn't compile without the combiner or how the combiner is solving the conversion of string to int since it is just adding together two ints.
Can anyone shed light on this?
Eran's answer described the differences between the two-arg and three-arg versions of reduce in that the former reduces Stream<T> to T whereas the latter reduces Stream<T> to U. However, it didn't actually explain the need for the additional combiner function when reducing Stream<T> to U.
One of the design principles of the Streams API is that the API shouldn't differ between sequential and parallel streams, or put another way, a particular API shouldn't prevent a stream from running correctly either sequentially or in parallel. If your lambdas have the right properties (associative, non-interfering, etc.) a stream run sequentially or in parallel should give the same results.
Let's first consider the two-arg version of reduction:
T reduce(I, (T, T) -> T)
The sequential implementation is straightforward. The identity value I is "accumulated" with the zeroth stream element to give a result. This result is accumulated with the first stream element to give another result, which in turn is accumulated with the second stream element, and so forth. After the last element is accumulated, the final result is returned.
The parallel implementation starts off by splitting the stream into segments. Each segment is processed by its own thread in the sequential fashion I described above. Now, if we have N threads, we have N intermediate results. These need to be reduced down to one result. Since each intermediate result is of type T, and we have several, we can use the same accumulator function to reduce those N intermediate results down to a single result.
Now let's consider a hypothetical two-arg reduction operation that reduces Stream<T> to U. In other languages, this is called a "fold" or "fold-left" operation so that's what I'll call it here. Note this doesn't exist in Java.
U foldLeft(I, (U, T) -> U)
(Note that the identity value I is of type U.)
The sequential version of foldLeft is just like the sequential version of reduce except that the intermediate values are of type U instead of type T. But it's otherwise the same. (A hypothetical foldRight operation would be similar except that the operations would be performed right-to-left instead of left-to-right.)
Now consider the parallel version of foldLeft. Let's start off by splitting the stream into segments. We can then have each of the N threads reduce the T values in its segment into N intermediate values of type U. Now what? How do we get from N values of type U down to a single result of type U?
What's missing is another function that combines the multiple intermediate results of type U into a single result of type U. If we have a function that combines two U values into one, that's sufficient to reduce any number of values down to one -- just like the original reduction above. Thus, the reduction operation that gives a result of a different type needs two functions:
U reduce(I, (U, T) -> U, (U, U) -> U)
Or, using Java syntax:
<U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)
In summary, to do parallel reduction to a different result type, we need two functions: one that accumulates T elements to intermediate U values, and a second that combines the intermediate U values into a single U result. If we aren't switching types, it turns out that the accumulator function is the same as the combiner function. That's why reduction to the same type has only the accumulator function and reduction to a different type requires separate accumulator and combiner functions.
Finally, Java doesn't provide foldLeft and foldRight operations because they imply a particular ordering of operations that is inherently sequential. This clashes with the design principle stated above of providing APIs that support sequential and parallel operation equally.
Since I like doodles and arrows to clarify concepts... let's start!
From String to String (sequential stream)
Suppose having 4 strings: your goal is to concatenate such strings into one. You basically start with a type and finish with the same type.
You can achieve this with
String res = Arrays.asList("one", "two","three","four")
.stream()
.reduce("",
(accumulatedStr, str) -> accumulatedStr + str); //accumulator
and this helps you to visualize what's happening:
The accumulator function converts, step by step, the elements in your (red) stream to the final reduced (green) value. The accumulator function simply transforms a String object into another String.
From String to int (parallel stream)
Suppose having the same 4 strings: your new goal is to sum their lengths, and you want to parallelize your stream.
What you need is something like this:
int length = Arrays.asList("one", "two","three","four")
.parallelStream()
.reduce(0,
(accumulatedInt, str) -> accumulatedInt + str.length(), //accumulator
(accumulatedInt, accumulatedInt2) -> accumulatedInt + accumulatedInt2); //combiner
and this is a scheme of what's happening
Here the accumulator function (a BiFunction) allows you to transform your String data to an int data. Being the stream parallel, it's splitted in two (red) parts, each of which is elaborated independently from eachother and produces just as many partial (orange) results. Defining a combiner is needed to provide a rule for merging partial int results into the final (green) int one.
From String to int (sequential stream)
What if you don't want to parallelize your stream? Well, a combiner needs to be provided anyway, but it will never be invoked, given that no partial results will be produced.
The two and three argument versions of reduce which you tried to use don't accept the same type for the accumulator.
The two argument reduce is defined as :
T reduce(T identity,
BinaryOperator<T> accumulator)
In your case, T is String, so BinaryOperator<T> should accept two String arguments and return a String. But you pass to it an int and a String, which results in the compilation error you got - argument mismatch; int cannot be converted to java.lang.String. Actually, I think passing 0 as the identity value is also wrong here, since a String is expected (T).
Also note that this version of reduce processes a stream of Ts and returns a T, so you can't use it to reduce a stream of String to an int.
The three argument reduce is defined as :
<U> U reduce(U identity,
BiFunction<U,? super T,U> accumulator,
BinaryOperator<U> combiner)
In your case U is Integer and T is String, so this method will reduce a stream of String to an Integer.
For the BiFunction<U,? super T,U> accumulator you can pass parameters of two different types (U and ? super T), which in your case are Integer and String. In addition, the identity value U accepts an Integer in your case, so passing it 0 is fine.
Another way to achieve what you want :
int length = asList("str1", "str2").stream().mapToInt (s -> s.length())
.reduce(0, (accumulatedInt, len) -> accumulatedInt + len);
Here the type of the stream matches the return type of reduce, so you can use the two parameter version of reduce.
Of course you don't have to use reduce at all :
int length = asList("str1", "str2").stream().mapToInt (s -> s.length())
.sum();
There is no reduce version that takes two different types without a combiner since it can't be executed in parallel (not sure why this is a requirement). The fact that accumulator must be associative makes this interface pretty much useless since:
list.stream().reduce(identity,
accumulator,
combiner);
Produces the same results as:
list.stream().map(i -> accumulator(identity, i))
.reduce(identity,
combiner);