How to sum a list of integers with java streams? - java

I want to sum a list of Integers. It works as follows, but the syntax does not feel right. Could the code be optimized?
Map<String, Integer> integers;
integers.values().stream().mapToInt(i -> i).sum();

This will work, but the i -> i is doing some automatic unboxing which is why it "feels" strange. mapToInt converts the stream to an IntStream "of primitive int-valued elements". Either of the following will work and better explain what the compiler is doing under the hood with your original syntax:
integers.values().stream().mapToInt(i -> i.intValue()).sum();
integers.values().stream().mapToInt(Integer::intValue).sum();

I suggest 2 more options:
integers.values().stream().mapToInt(Integer::intValue).sum();
integers.values().stream().collect(Collectors.summingInt(Integer::intValue));
The second one uses Collectors.summingInt() collector, there is also a summingLong() collector which you would use with mapToLong.
And a third option: Java 8 introduces a very effective LongAdder accumulator designed to speed-up summarizing in parallel streams and multi-thread environments. Here, here's an example use:
LongAdder a = new LongAdder();
map.values().parallelStream().forEach(a::add);
sum = a.intValue();

From the docs
Reduction operations
A reduction operation (also called a fold) takes a sequence of input elements and combines them into a single summary result by repeated application of a combining operation, such as finding the sum or maximum of a set of numbers, or accumulating elements into a list. The streams classes have multiple forms of general reduction operations, called reduce() and collect(), as well as multiple specialized reduction forms such as sum(), max(), or count().
Of course, such operations can be readily implemented as simple sequential loops, as in:
int sum = 0;
for (int x : numbers) {
sum += x;
}
However, there are good reasons to prefer a reduce operation over a mutative accumulation such as the above. Not only is a reduction "more abstract" -- it operates on the stream as a whole rather than individual elements -- but a properly constructed reduce operation is inherently parallelizable, so long as the function(s) used to process the elements are associative and stateless. For example, given a stream of numbers for which we want to find the sum, we can write:
int sum = numbers.stream().reduce(0, (x,y) -> x+y);
or:
int sum = numbers.stream().reduce(0, Integer::sum);
These reduction operations can run safely in parallel with almost no modification:
int sum = numbers.parallelStream().reduce(0, Integer::sum);
So, for a map you would use:
integers.values().stream().mapToInt(i -> i).reduce(0, (x,y) -> x+y);
Or:
integers.values().stream().reduce(0, Integer::sum);

You can use reduce method:
long sum = result.stream().map(e -> e.getCreditAmount()).reduce(0L, (x, y) -> x + y);
or
long sum = result.stream().map(e -> e.getCreditAmount()).reduce(0L, Integer::sum);

You can use reduce() to sum a list of integers.
int sum = integers.values().stream().reduce(0, Integer::sum);

You can use collect method to add list of integers.
List<Integer> list = Arrays.asList(2, 4, 5, 6);
int sum = list.stream().collect(Collectors.summingInt(Integer::intValue));

I have declared a list of Integers.
ArrayList<Integer> numberList = new ArrayList<Integer>(Arrays.asList(1, 2, 3, 4, 5));
You can try using these different ways below.
Using mapToInt
int sum = numberList.stream().mapToInt(Integer::intValue).sum();
Using summarizingInt
int sum = numberList.stream().collect(Collectors.summarizingInt(Integer::intValue)).getSum();
Using reduce
int sum = numberList.stream().reduce(Integer::sum).get().intValue();

May this help those who have objects on the list.
If you have a list of objects and wanted to sum specific fields of this object use the below.
List<ResultSom> somList = MyUtil.getResultSom();
BigDecimal result= somList.stream().map(ResultSom::getNetto).reduce(
BigDecimal.ZERO, BigDecimal::add);

This would be the shortest way to sum up int type array (for long array LongStream, for double array DoubleStream and so forth). Not all the primitive integer or floating point types have the Stream implementation though.
IntStream.of(integers).sum();

Unfortunately looks like the Stream API only returns normal streams from, say, List<Integer>#stream(). Guess they're pretty much forced to because of how generics work.
These normal Streams are of generic objects so don't have specialized methods like sum() etc. so you have to use the weird re-stream "looks like a no-op" conversion by default to get to those methods... .mapToInt(i -> i).
Another option is using "Eclipse Collections" which are like an expanded java Stream API
IntLists.immutable.ofAll(integers.values()).sum();

There is one more option no one considered here and it reflects on usage of multi-core environment. If you want to use its advantages, then next code should be used instead of the other mentioned solutions:
int sum = integers.values().parallelStream().mapToInt(Integer::intValue)
.reduce(0, Integer::sum, Integer::sum);
This solution is similar to other ones, but please notice the third argument in reduce. It tells compiler what to do with partial summaries calculated in different chunks of the stream, by different threads. Also instead of stream(), the parallelStream() is used. In this case it would just summarize it. The other option to put as third argument is (i, j) -> i + j, which means that it would add a value of a stream chunk (j) to the current value (i) and use it as a current value for the next stream chunk until all partial results are processed.
Even when using plain stream() it is useful to tell to reduce what to do with stream chunks' summaries, just in case someone, or you, would like to parallelize it in the future. The initial development is best time for that, since later on you need to remember what this is supposed to be and need to spend some time in understanding the purpose of that code again.
And of course instead of method reference operator you can have different dialect of lambda. I prefer it this way as more compact and still easy readable.
Also remember this can be used for more complex calculations too, but always be aware there are no guarantees about sequence and deployment of stream elements to threads.

IntStream.of(1, 2, 23).sum();
IntStream.of(1, 2, 23,1, 2, 23,1, 2, 23).max().getAsInt();

Related

Create collection of N identical elements from one element using Java 8 streams [duplicate]

In many other languages, eg. Haskell, it is easy to repeat a value or function multiple times, eg. to get a list of 8 copies of the value 1:
take 8 (repeat 1)
but I haven't found this yet in Java 8. Is there such a function in Java 8's JDK?
Or alternatively something equivalent to a range like
[1..8]
It would seem an obvious replacement for a verbose statement in Java like
for (int i = 1; i <= 8; i++) {
System.out.println(i);
}
to have something like
Range.from(1, 8).forEach(i -> System.out.println(i))
though this particular example doesn't look much more concise actually... but hopefully it's more readable.
For this specific example, you could do:
IntStream.rangeClosed(1, 8)
.forEach(System.out::println);
If you need a step different from 1, you can use a mapping function, for example, for a step of 2:
IntStream.rangeClosed(1, 8)
.map(i -> 2 * i - 1)
.forEach(System.out::println);
Or build a custom iteration and limit the size of the iteration:
IntStream.iterate(1, i -> i + 2)
.limit(8)
.forEach(System.out::println);
Here's another technique I ran across the other day:
Collections.nCopies(8, 1)
.stream()
.forEach(i -> System.out.println(i));
The Collections.nCopies call creates a List containing n copies of whatever value you provide. In this case it's the boxed Integer value 1. Of course it doesn't actually create a list with n elements; it creates a "virtualized" list that contains only the value and the length, and any call to get within range just returns the value. The nCopies method has been around since the Collections Framework was introduced way back in JDK 1.2. Of course, the ability to create a stream from its result was added in Java SE 8.
Big deal, another way to do the same thing in about the same number of lines.
However, this technique is faster than the IntStream.generate and IntStream.iterate approaches, and surprisingly, it's also faster than the IntStream.range approach.
For iterate and generate the result is perhaps not too surprising. The streams framework (really, the Spliterators for these streams) is built on the assumption that the lambdas will potentially generate different values each time, and that they will generate an unbounded number of results. This makes parallel splitting particularly difficult. The iterate method is also problematic for this case because each call requires the result of the previous one. So the streams using generate and iterate don't do very well for generating repeated constants.
The relatively poor performance of range is surprising. This too is virtualized, so the elements don't actually all exist in memory, and the size is known up front. This should make for a fast and easily parallelizable spliterator. But it surprisingly didn't do very well. Perhaps the reason is that range has to compute a value for each element of the range and then call a function on it. But this function just ignores its input and returns a constant, so I'm surprised this isn't inlined and killed.
The Collections.nCopies technique has to do boxing/unboxing in order to handle the values, since there are no primitive specializations of List. Since the value is the same every time, it's basically boxed once and that box is shared by all n copies. I suspect boxing/unboxing is highly optimized, even intrinsified, and it can be inlined well.
Here's the code:
public static final int LIMIT = 500_000_000;
public static final long VALUE = 3L;
public long range() {
return
LongStream.range(0, LIMIT)
.parallel()
.map(i -> VALUE)
.map(i -> i % 73 % 13)
.sum();
}
public long ncopies() {
return
Collections.nCopies(LIMIT, VALUE)
.parallelStream()
.mapToLong(i -> i)
.map(i -> i % 73 % 13)
.sum();
}
And here are the JMH results: (2.8GHz Core2Duo)
Benchmark Mode Samples Mean Mean error Units
c.s.q.SO18532488.ncopies thrpt 5 7.547 2.904 ops/s
c.s.q.SO18532488.range thrpt 5 0.317 0.064 ops/s
There is a fair amount of variance in the ncopies version, but overall it seems comfortably 20x faster than the range version. (I'd be quite willing to believe that I've done something wrong, though.)
I'm surprised at how well the nCopies technique works. Internally it doesn't do very much special, with the stream of the virtualized list simply being implemented using IntStream.range! I had expected that it would be necessary to create a specialized spliterator to get this to go fast, but it already seems to be pretty good.
For completeness, and also because I couldn't help myself :)
Generating a limited sequence of constants is fairly close to what you would see in Haskell, only with Java level verboseness.
IntStream.generate(() -> 1)
.limit(8)
.forEach(System.out::println);
Once a repeat function is somewhere defined as
public static BiConsumer<Integer, Runnable> repeat = (n, f) -> {
for (int i = 1; i <= n; i++)
f.run();
};
You can use it now and then this way, e.g.:
repeat.accept(8, () -> System.out.println("Yes"));
To get and equivalent to Haskell's
take 8 (repeat 1)
You could write
StringBuilder s = new StringBuilder();
repeat.accept(8, () -> s.append("1"));
Another alternative is to use the Stream.generate() method. For example the snippet below will create a list with 5 instances of MyClass:
List<MyClass> timezones = Stream
.generate(MyClass::createInstance)
.limit(5)
.collect(Collectors.toList());
From java doc:
generate(Supplier s)
Returns an infinite sequential unordered
stream where each element is generated by the provided Supplier.
This is my solution to implementing the times function. I'm a junior so I admit it could be not ideal, I'd be glad to hear if this is not a good idea for whatever reason.
public static <T extends Object, R extends Void> R times(int count, Function<T, R> f, T t) {
while (count > 0) {
f.apply(t);
count--;
}
return null;
}
Here's some example usage:
Function<String, Void> greet = greeting -> {
System.out.println(greeting);
return null;
};
times(3, greet, "Hello World!");

Find first index of matching character from two strings using parallel streams

Trying to figure out whether it is possible to find what the first index of a matching character that is within one string that is also in another string. So for example:
String first = "test";
String second = "123er";
int value = get(test, other);
// method would return 1, as the first matching character in
// 123er, e is at index 1 of test
So I'm trying to accomplish this using parallel streams. I know I can find whether there is a matching character fairly simply like such:
test.chars().parallel().anyMatch(other::contains);
How would I use this to find the exact index?
If you really care for performance, you should try to avoid the O(n × m) time complexity of iterating over one string for every character of the other. So, first iterate over one string to get a data structure supporting efficient (O(1)) lookup, then iterate over the other utilizing this.
BitSet encountered = new BitSet();
test.chars().forEach(encountered::set);
int index = IntStream.range(0, other.length())
.filter(ix->encountered.get(other.charAt(ix)))
.findFirst().orElse(-1);
If the strings are sufficiently large, the O(n + m) time complexity of this solution will turn to much shorter execution times. For smaller strings, it’s irrelevant anyway.
If you really think, the strings are large enough to benefit from parallel processing (which is very unlikely), you can perform both operations in parallel, with small adaptions:
BitSet encountered = CharBuffer.wrap(test).chars().parallel()
.collect(BitSet::new, BitSet::set, BitSet::or);
int index = IntStream.range(0, other.length()).parallel()
.filter(ix -> encountered.get(other.charAt(ix)))
.findFirst().orElse(-1);
The first operation uses the slightly more complicated, parallel compatible collect now and it contains a not-so-obvious change for the Stream creation.
The problem is described in bug report JDK-8071477. Simply said, the stream returned by String.chars() has a poor splitting capability, hence a poor parallel performance. The code above wraps the string in a CharBuffer, whose chars() method returns a different implementation, having the same semantics, but a good parallel performance. This work-around should become obsolete with Java 9.
Alternatively, you could use IntStream.range(0, test.length()).map(test::charAt) to create a stream with a good parallel performance. The second operation already works that way.
But, as said, for this specific task it’s rather unlikely that you ever encounter strings large enough to make parallel processing beneficial.
You can do it by relying on String#indexOf(int ch), keeping only values >= 0 to remove non existing characters then get the first value.
// Get the index of each characters of test in other
// Keep only the positive values
// Then return the first match
// Or -1 if we have no match
int result = test.chars()
.parallel()
.map(other::indexOf)
.filter(i -> i >= 0)
.findFirst()
.orElse(-1);
System.out.println(result);
Output:
1
NB 1: The result is 1 not 2 because indexes start from 0 not 1.
NB 2: Unless you have very very long String, using a parallel Stream in this case should not help much in term of performances because the tasks are not complexes and creating, starting and synchronizing threads has a very high cost so you will probably get your result much slower than with a normal stream.
Upgrading Nicolas' answer here. min() method enforces consumption of the whole Stream. In such cases, it's better to use findFirst() which stops the whole execution after finding the first matching element and not the minimum of all:
test.chars().parallel()
.map(other::indexOf)
.filter(i -> i >= 0)
.findFirst()
.ifPresent(System.out::println);

Given million number in a list. How to mutiply each element by a constant with minimum time complexity

I have written a java program, Which mutiply the each number in a list by some constants. Below is the program. I have created a myNewNumbers list to store those mutiplied numbers. Below are my doubts
Is there a better way to write this in minimum time complexity?
Currently In my for loop there are 10 elements. How to handle if user wants do it for 1 million elements?
I am beginner to muti threading. How do I make sure that it works in mutithreading
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class MultiplyHugeNumber {
static List<Integer> mynumbers;
static List<Integer> myNewnumbers;
static Integer MUTIPLY_ELEMENT=2;
public static void main(String[] args) {
// TODO Auto-generated method stub
mynumbers= new ArrayList<Integer>();
for (int i = 0; i < 10; i++) {
mynumbers.add(i);
}
System.out.println(Arrays.toString(mynumbers.toArray()));
myNewnumbers= new ArrayList<>();
for (Integer mynumber : mynumbers) {
myNewnumbers.add(mynumber*MUTIPLY_ELEMENT);
}
System.out.println(Arrays.toString(myNewnumbers.toArray()));
}
o/p:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
(credits to #STaefi) Multiplying a list of input values by a constant value has a time complexity of O(n). This complexity is not related to a multi-threaded implementation.
Your program: multiply a list of numbers by a constant falls into the category of so called "Embarrassingly Parallel" problems:
"[...] one where little or no effort is needed to separate the problem into a number of parallel tasks [...]"
Each item in your input list can be multiplied with the constant without regarding any other input item or global state.
There are various ways to parallelize the given task. Be aware that the overhead for setting the threads up might not be worth it for a small number of input values or at all in this particular example.
Example using Java 8 streams:
mynumbers.parallelStream()
.map(in -> in * MUTIPLY_ELEMENT)
.collect(Collectors.toList());
If you have a million elements, and you need to multiply each by a constant, then you're going to need to perform a million operations. There's no way around it -- it's O(n).
That said, creating the list can be O(1) (constant time), if you're okay with a lazily-evaluated list. Guava's Lists.transform does just that:
List<Integer> myNumbers = ...
List<Integer> myNewNumbers = Lists.transform(myNumbers, i -> i * MULTIPLY_ELEMENT);
This doesn't reduce the overall time it takes to do all the multiplications; in fact, it'll probably take a bit more time overall, since it's harder for the JVM to optimize. It'll also be slower if you access the same element multiple times, since the transformation will be applied each time you do. That said, it's unlikely that it'll be slower by an amount you'll notice.
This approach also carries the limitation that you can't add new elements to the transformed list (as explained in the JavaDocs). But, depending on your scenario, there may be other benefits to getting that initial list created quickly.
This is 100% a premature optimization, there's very likely no optimizations worth making in this case.
HOWEVER..purely "academically speaking" you would need to ask yourself a question here, namely, does it matter what order the results are in?
if not, then with Streams, this is simple, here's an example with 100 numbers:
Integer MUTIPLY_ELEMENT = 2;
List<Integer> resultNumbers = IntStream.range(0,100)
.parallel()
.map(i->i*MUTIPLY_ELEMENT)
.boxed()
.collect(Collectors.toList());
if you do care about ordering, but still want to gain the benefit of parallel processing, you can take advantage of the fact that your operation (multiplying by 2) is simple enough that the resulting numbers will still be in the same relative "natural" order and just call sorted() on the stream after the map() call. However, the sorting operation could very well take just as long as if you just did it single threaded.
Also, understand that this is by NO MEANS a "real world" scenario, you will almost never come across an actual problem like this. Hopefully you're just trying to get your head around parallelism in general, because you'd never actually want to do this type of optimization until you have tried a single-threaded model and it proves insufficient.

Collectors.summingInt() vs mapToInt().sum()

When you want to sum an integer value from a stream, there are two main ways of doing it:
ToIntFunction<...> mapFunc = ...
int sum = stream().collect(Collectors.summingInt(mapFunc))
int sum = stream().mapToInt(mapFunc).sum()
The first involves boxing the returned integer & unboxing it, but there's an extra step involved in the second.
Which is more efficient/clearer?
You are looking at the intersection of two otherwise distinct use cases. Using mapToInt(…) allows you to chain other IntStream operations before the terminal operation. In contrast, Collectors.summingInt(…) can be combined with other collectors, e.g. used as downstream collector in a groupingBy collector. For these use cases, there is no question about which to use.
In your special case, when you are not chaining more operations nor dealing with collectors in the first place, there is no fundamental difference between these two approaches. Still, using the one which is more readable has a point. Usually, you don’t use a collector, when there is a predefined operation on the stream doing the same. You wouldn’t use collect(Collectors.reducing(…)) when you can just use .reduce(…), would you?
Not only is mapToInt(mapFunc).sum() shorted, it also follows the usual left-to-right order for what happens conceptionally, first convert to an int, then sum these ints up. I think this justifies preferring this alternative over .collect(Collectors.summingInt(mapFunc)).

What is the Java equivalent to Python's reduce function?

Similar questions have been asked, here and here, but given the advent of Java 8, and the generally outdated nature of these questions I'm wondering if now there'd be something at least kindred to it?
This is what I'm referring to.
You can use a lambda and Stream.reduce, there is a page in the docs dedicated to reductions:
Integer totalAgeReduce = roster
.stream()
.map(Person::getAge)
.reduce(
0,
(a, b) -> a + b);
This is the example used in the Python docs implemented with Java 8 streams:
List<Integer> numbers = Arrays.asList(new Integer[] { 1, 2, 3, 4, 5 });
Optional<Integer> sum = numbers.stream().reduce((a, b) -> a + b);
System.out.println(sum.get());
The Stream.reduce Method
The Stream.reduce method is a general-purpose reduction operation. Consider the following pipeline, which calculates the sum of the male members' ages in the collection roster. It uses the Stream.sum reduction operation:
Integer totalAge = roster
.stream()
.mapToInt(Person::getAge)
.sum();
Compare this with the following pipeline, which uses the Stream.reduce operation to calculate the same value:
Integer totalAgeReduce = roster
.stream()
.map(Person::getAge)
.reduce(
0,
(a, b) -> a + b);
The reduce operation in this example takes two arguments:
identity: The identity element is both the initial value of the reduction and the default result if there are no elements in the stream. In this example, the identity element is 0; this is the initial value of the sum of ages and the default value if no members exist in the collection roster.
accumulator: The accumulator function takes two parameters: a partial result of the reduction (in this example, the sum of all processed integers so far) and the next element of the stream (in this example, an integer). It returns a new partial result. In this example, the accumulator function is a lambda expression that adds two Integer values and returns an Integer value:
(a, b) -> a + b
The reduce operation always returns a new value. However, the accumulator function also returns a new value every time it processes an element of a stream. Suppose that you want to reduce the elements of a stream to a more complex object, such as a collection. This might hinder the performance of your application. If your reduce operation involves adding elements to a collection, then every time your accumulator function processes an element, it creates a new collection that includes the element, which is inefficient. It would be more efficient for you to update an existing collection instead. You can do this with the Stream.collect method, which the next section describes.
The official oracle tutorial describes how the Stream.reduce works. Please have a look, I believe it will answer your query.

Categories