Collectors.summingInt() vs mapToInt().sum() - java

When you want to sum an integer value from a stream, there are two main ways of doing it:
ToIntFunction<...> mapFunc = ...
int sum = stream().collect(Collectors.summingInt(mapFunc))
int sum = stream().mapToInt(mapFunc).sum()
The first involves boxing the returned integer & unboxing it, but there's an extra step involved in the second.
Which is more efficient/clearer?

You are looking at the intersection of two otherwise distinct use cases. Using mapToInt(…) allows you to chain other IntStream operations before the terminal operation. In contrast, Collectors.summingInt(…) can be combined with other collectors, e.g. used as downstream collector in a groupingBy collector. For these use cases, there is no question about which to use.
In your special case, when you are not chaining more operations nor dealing with collectors in the first place, there is no fundamental difference between these two approaches. Still, using the one which is more readable has a point. Usually, you don’t use a collector, when there is a predefined operation on the stream doing the same. You wouldn’t use collect(Collectors.reducing(…)) when you can just use .reduce(…), would you?
Not only is mapToInt(mapFunc).sum() shorted, it also follows the usual left-to-right order for what happens conceptionally, first convert to an int, then sum these ints up. I think this justifies preferring this alternative over .collect(Collectors.summingInt(mapFunc)).

Related

Arrays.sort() vs sorting using map

I have a requirement where I have to loop through an array which has list of strings:
String[] arr = {"abc","cda","cka","snd"}
and match the string "bca", ignoring the order of the characters, which will return true as it’s present in the array ("abc").
To solve this I have two approaches:
Use Arrays.sort() to sort both the strings and then use Arrays.equals to compare them.
create 2 hashmaps and add frequency of each letter in string and then finally compare two map of char using equals method.
I read that complexity of using Arrays.sort() method is more. So, thought of working on 2nd approach but when I am running both the code 1st approach is taking very less time to execute program.
Any suggestions why this is happening?
The Time Complexity only tells you, how the approach will scale with (significantly) larger input. It doesn’t tell you which approach is faster.
It’s perfectly possible that a solution is faster for small input sizes (string lengths and/or array length) but scales badly for larger sizes, due to its Time Complexity. But it’s even possible that you never encounter the point where an algorithm with a better Time Complexity becomes faster, when natural limits to the input sizes prevent it.
You didn’t show the code of your approaches, but it’s likely that your first approach calls a method like toCharArray() on the strings, followed by Arrays.sort(char[]). This implies that sort operates on primitive data.
In contrast, when your second approach uses a HashMap<Character,Integer> to record frequencies, it will be subject to boxing overhead, for the characters and the counts, and also use a significantly larger data structure that needs to be processed.
So it’s not surprising that the hash approach is slower for small strings and arrays, as it has a significantly larger fixed overhead and also a size dependent (O(n)) overhead.
So first approach had to suffer from the O(n log n) time complexity significantly to turn this result. But this won’t happen. That time complexity is a worst case of sorting in general. As explained in this answer, the algorithms specified in the documentation of Arrays.sort should not be taken for granted. When you call Arrays.sort(char[]) and the array size crosses a certain threshold, the implementation will turn to Counting Sort with an O(n) time complexity (but use more memory temporarily).
So even with large strings, you won’t suffer from a worse time complexity. In fact, the Counting Sort shares similarities with the frequency map, but usually is more efficient, as it avoids the boxing overhead, using an int[] array instead of a HashMap<Character,Integer>.
Approach 1: will be O(NlogN)
Approach 2: will be O(N*M), where M is the length of each string in your array.
You should search linearly in O(N):
for (String str : arr) {
if (str.equals(target)) return true;
}
return false;
Let's decompose the problem:
You need a function to sort a string by its chars (bccabc -> abbccc) to be able to compare a given string with the existing ones.
Function<String, String> sortChars = s -> s.chars()
.sorted()
.mapToObj(i -> (char) i)
.map(String::valueOf)
.collect(Collectors.joining());
Instead of sorting the chars of the given strings anytime you compare them, you can precompute the set of unique tokens (values from your array, sorted chars):
Set<String> tokens = Arrays.stream(arr)
.map(sortChars)
.collect(Collectors.toSet());
This will result in the values "abc","acd","ack","dns".
Afterwards you can create a function which checks if a given string, when sorted by chars, matches any of the given tokens:
Predicate<String> match = s -> tokens.contains(sortChars.apply(s));
Now you can easily check any given string as follows:
boolean matches = match.test("bca");
Matching will only need to sort the given input and do a hash set lookup to check if it matches, so it's very efficient.
You can of course write the Function and Predicate as methods instead (String sortChars(String s) and boolean matches(String s) if you're unfamiliar with functional programming.
More of an addendum to the other answers. Of course, your two options have different performance characteristics. But: understand that performance is not necessarily the only factor to make a decision!
Meaning: if you are talking about a search that runs hundreds or thousands of time per minute, on large data sets: then for sure, you should invest a lot of time to come up with a solution that gives you best performance. Most likely, that includes doing various experiments with actual measurements when processing real data. Time complexity is a theoretical construct, in the real world, there are also elements such as CPU cache sizes, threading issues, IO bottlenecks, and whatnot that can have significant impact on real numbers.
But: when your code will doing its work just once a minute, even on a few dozen or hundred MB of data ... then it might not be worth to focus on performance.
In other words: the "sort" solution sounds straight forward. It is easy to understand, easy to implement, and hard to get wrong (with some decent test cases). If that solution gets the job done "good enough", then consider to use use that: the simple solution.
Performance is a luxury problem. You only address it if there is a reason to.

Is it possible to compare two DFEVar values on Kernel side

I am using Maxeler, MaxIDE.
I would like to use my input stream as output stream on the next cycle. I was hoping to decide this under an if condition. But the if condition won't allow me to compare two DFEVar(s). I was wondering is it possible?
Type mismatch: cannot convert from DFEVar to boolean
You can not use regular if statement to compare two DFE
Vars.
You should use ternary operator instead. See point 2. below for more details.
You can find the detailed explanation in the Maxeler tutorials.
From the: MaxCompiler: Dataflow Programming Tutorial
Conditionals in dataflow computing
There are three main methods of controlling conditionals that affect dataflow computation:
Global conditionals: These are typically large scale modes of operation depending on input pa-
rameters with a relatively small number of options. If we need to select different computations
based on input parameters, and these conditionals affect the dataflow portion of the design, we
simply create multiple .max files for each case. Some applications may require certain transformation to get them into the optimal structure for supporting multiple .max files.
if (mode==1) p1(x); else p2(x);
where p1 and p2 are programs that use different .max files.
Local Conditionals: Conditionals depending on local state of a computation.
if (a > b) x=x+1; else x=x − 1;
These can be transformed into dataflow computation as
x = (a > b) ? (x+1) : (x − 1);
Conditional Loops: If we do not know how long we need to iterate around a loop, we need to know
a bit about the loop’s behavior and typically values for the number of loop iterations. Once we
know the distribution of values we can expect, a dataflow implementation pipelines the optimal
number of iterations and treats each of the block of iterations as an action for the SLiC interface,
controlled by the CPU (or some other kernel).
The ternary-if operator ( ?: ) selects between two input streams. To select between more than
W two streams, the control.mux method is easier to use and read than nested ternary-if statements.

How to sum a list of integers with java streams?

I want to sum a list of Integers. It works as follows, but the syntax does not feel right. Could the code be optimized?
Map<String, Integer> integers;
integers.values().stream().mapToInt(i -> i).sum();
This will work, but the i -> i is doing some automatic unboxing which is why it "feels" strange. mapToInt converts the stream to an IntStream "of primitive int-valued elements". Either of the following will work and better explain what the compiler is doing under the hood with your original syntax:
integers.values().stream().mapToInt(i -> i.intValue()).sum();
integers.values().stream().mapToInt(Integer::intValue).sum();
I suggest 2 more options:
integers.values().stream().mapToInt(Integer::intValue).sum();
integers.values().stream().collect(Collectors.summingInt(Integer::intValue));
The second one uses Collectors.summingInt() collector, there is also a summingLong() collector which you would use with mapToLong.
And a third option: Java 8 introduces a very effective LongAdder accumulator designed to speed-up summarizing in parallel streams and multi-thread environments. Here, here's an example use:
LongAdder a = new LongAdder();
map.values().parallelStream().forEach(a::add);
sum = a.intValue();
From the docs
Reduction operations
A reduction operation (also called a fold) takes a sequence of input elements and combines them into a single summary result by repeated application of a combining operation, such as finding the sum or maximum of a set of numbers, or accumulating elements into a list. The streams classes have multiple forms of general reduction operations, called reduce() and collect(), as well as multiple specialized reduction forms such as sum(), max(), or count().
Of course, such operations can be readily implemented as simple sequential loops, as in:
int sum = 0;
for (int x : numbers) {
sum += x;
}
However, there are good reasons to prefer a reduce operation over a mutative accumulation such as the above. Not only is a reduction "more abstract" -- it operates on the stream as a whole rather than individual elements -- but a properly constructed reduce operation is inherently parallelizable, so long as the function(s) used to process the elements are associative and stateless. For example, given a stream of numbers for which we want to find the sum, we can write:
int sum = numbers.stream().reduce(0, (x,y) -> x+y);
or:
int sum = numbers.stream().reduce(0, Integer::sum);
These reduction operations can run safely in parallel with almost no modification:
int sum = numbers.parallelStream().reduce(0, Integer::sum);
So, for a map you would use:
integers.values().stream().mapToInt(i -> i).reduce(0, (x,y) -> x+y);
Or:
integers.values().stream().reduce(0, Integer::sum);
You can use reduce method:
long sum = result.stream().map(e -> e.getCreditAmount()).reduce(0L, (x, y) -> x + y);
or
long sum = result.stream().map(e -> e.getCreditAmount()).reduce(0L, Integer::sum);
You can use reduce() to sum a list of integers.
int sum = integers.values().stream().reduce(0, Integer::sum);
You can use collect method to add list of integers.
List<Integer> list = Arrays.asList(2, 4, 5, 6);
int sum = list.stream().collect(Collectors.summingInt(Integer::intValue));
I have declared a list of Integers.
ArrayList<Integer> numberList = new ArrayList<Integer>(Arrays.asList(1, 2, 3, 4, 5));
You can try using these different ways below.
Using mapToInt
int sum = numberList.stream().mapToInt(Integer::intValue).sum();
Using summarizingInt
int sum = numberList.stream().collect(Collectors.summarizingInt(Integer::intValue)).getSum();
Using reduce
int sum = numberList.stream().reduce(Integer::sum).get().intValue();
May this help those who have objects on the list.
If you have a list of objects and wanted to sum specific fields of this object use the below.
List<ResultSom> somList = MyUtil.getResultSom();
BigDecimal result= somList.stream().map(ResultSom::getNetto).reduce(
BigDecimal.ZERO, BigDecimal::add);
This would be the shortest way to sum up int type array (for long array LongStream, for double array DoubleStream and so forth). Not all the primitive integer or floating point types have the Stream implementation though.
IntStream.of(integers).sum();
Unfortunately looks like the Stream API only returns normal streams from, say, List<Integer>#stream(). Guess they're pretty much forced to because of how generics work.
These normal Streams are of generic objects so don't have specialized methods like sum() etc. so you have to use the weird re-stream "looks like a no-op" conversion by default to get to those methods... .mapToInt(i -> i).
Another option is using "Eclipse Collections" which are like an expanded java Stream API
IntLists.immutable.ofAll(integers.values()).sum();
There is one more option no one considered here and it reflects on usage of multi-core environment. If you want to use its advantages, then next code should be used instead of the other mentioned solutions:
int sum = integers.values().parallelStream().mapToInt(Integer::intValue)
.reduce(0, Integer::sum, Integer::sum);
This solution is similar to other ones, but please notice the third argument in reduce. It tells compiler what to do with partial summaries calculated in different chunks of the stream, by different threads. Also instead of stream(), the parallelStream() is used. In this case it would just summarize it. The other option to put as third argument is (i, j) -> i + j, which means that it would add a value of a stream chunk (j) to the current value (i) and use it as a current value for the next stream chunk until all partial results are processed.
Even when using plain stream() it is useful to tell to reduce what to do with stream chunks' summaries, just in case someone, or you, would like to parallelize it in the future. The initial development is best time for that, since later on you need to remember what this is supposed to be and need to spend some time in understanding the purpose of that code again.
And of course instead of method reference operator you can have different dialect of lambda. I prefer it this way as more compact and still easy readable.
Also remember this can be used for more complex calculations too, but always be aware there are no guarantees about sequence and deployment of stream elements to threads.
IntStream.of(1, 2, 23).sum();
IntStream.of(1, 2, 23,1, 2, 23,1, 2, 23).max().getAsInt();

What data structure to use

I'm looking for a data structure that has the following properties.
Stores a list of tuple<Double,Integer,Integer>. Order is only on double. Two tuples with the same double value are consider the same.
Supports duplicates.
Needs to be able to traverse in ascending order. If there are duplicates, the one added later should have higher order.
Find/Insert fast
Remove fast, note that remove always follows this pattern
Method contains remove:
for(int i=list.size()-1;i>=0;i--){// assume list is in ascending order
if(list[j:i] can be merged){
remove list[j:i-1];
update list[i]'s two integers;
i = j-1;
}
}
I currently use ArrayList and keep it sorted. Finding is fast with binary search. However insertion and deletion will involve lots of copy in memory, e.g. insertion in the front of the list shifts all the elements.
One solution would be to have a sorted map to lists of tuples:
SortedMap<Double,List<Tuple<Integer,Integer>>>
The declaration line is bit ugly, but it will work. I've used maps to lists many times before. The nice thing about it is that you can then delete items from the lists and as long as your lists are each short, you have a smaller number of moves. To iterate over the entire structure, you'd need to create your own iterator, or adapt your original code.
If you decide to write your own Double comparator there are a few things to be aware of.
The first is that floating point equality is a very tricky area. By default Java does not ensure consisten floating point math when your code is running in different virtual machines although this can be done by using the strictfp keyword. This inconsistency in floating point arithmetic can cause issues in applications that are unaware of this and run on multiple virtual machines communicating with one another, such as servers and client communication.
The second tricky bit is that Comparators operate on Objects which means you will be working with Doubles not doubles. The following four operations cause Double to be unboxed into double: <, <=, >, and >=. The following two do not cause unboxing: == and !=. These two operators perform Object memory pointer comparisons. Bottom line, manually unbox the Doubles into doubles before performing comparisons; it will greatly reduce bugs.

What data-structure should I use to create my own "BigInteger" class?

As an optional assignment, I'm thinking about writing my own implementation of the BigInteger class, where I will provide my own methods for addition, subtraction, multiplication, etc.
This will be for arbitrarily long integer numbers, even hundreds of digits long.
While doing the math on these numbers, digit by digit isn't hard, what do you think the best datastructure would be to represent my "BigInteger"?
At first I was considering using an Array but then I was thinking I could still potentially overflow (run out of array slots) after a large add or multiplication. Would this be a good case to use a linked list, since I can tack on digits with O(1) time complexity?
Is there some other data-structure that would be even better suited than a linked list? Should the type that my data-structure holds be the smallest possible integer type I have available to me?
Also, should I be careful about how I store my "carry" variable? Should it, itself, be of my "BigInteger" type?
Check out the book C Interfaces and Implementations by David R. Hanson. It has 2 chapters on the subject, covering the vector structure, word size and many other issues you are likely to encounter.
It's written for C, but most of it is applicable to C++ and/or Java. And if you use C++ it will be a bit simpler because you can use something like std::vector to manage the array allocation for you.
Always use the smallest int type that will do the job you need (bytes). A linked list should work well, since you won't have to worry about overflowing.
If you use binary trees (whose leaves are ints), you get all the advantages of the linked list (unbounded number of digits, etc) with simpler divide-and-conquer algorithms. You do not have in this case a single base but many depending the level at which you're working.
If you do this, you need to use a BigInteger for the carry. You may consider it an advantage of the "linked list of ints" approach that the carry can always be represented as an int (and this is true for any base, not just for base 10 as most answers seem to assume that you should use... In any base, the carry is always a single digit)
I might as well say it: it would be a terrible waste to use base 10 when you can use 2^30 or 2^31.
Accessing elements of linked lists is slow. I think arrays are the way to go, with lots of bound checking and run time array resizing as needed.
Clarification: Traversing a linked list and traversing an array are both O(n) operations. But traversing a linked list requires deferencing a pointer at each step. Just because two algorithms both have the same complexity it doesn't mean that they both take the same time to run. The overhead of allocating and deallocating n nodes in a linked list will also be much heavier than memory management of a single array of size n, even if the array has to be resized a few times.
Wow, there are some… interesting answers here. I'd recommend reading a book rather than try to sort through all this contradictory advice.
That said, C/C++ is also ill-suited to this task. Big-integer is a kind of extended-precision math. Most CPUs provide instructions to handle extended-precision math at comparable or same speed (bits per instruction) as normal math. When you add 2^32+2^32, the answer is 0… but there is also a special carry output from the processor's ALU which a program can read and use.
C++ cannot access that flag, and there's no way in C either. You have to use assembler.
Just to satisfy curiosity, you can use the standard Boolean arithmetic to recover carry bits etc. But you will be much better off downloading an existing library.
I would say an array of ints.
An Array is indeed a natural fit. I think it is acceptable to throw OverflowException, when you run out of place in your memory. The teacher will see attention to detail.
A multiplication roughly doubles digit numbers, addition increases it by at most 1. It is easy to create a sufficiently big Array to store the result of your operation.
Carry is at most a one-digit long number in multiplication (9*9 = 1, carry 8). A single int will do.
std::vector<bool> or std::vector<unsigned int> is probably what you want. You will have to push_back() or resize() on them as you need more space for multiplies, etc. Also, remember to push_back the correct sign bits if you're using two-compliment.
i would say a std::vector of char (since it has to hold only 0-9) (if you plan to work in BCD)
If not BCD then use vector of int (you didnt make it clear)
Much less space overhead that link list
And all advice says 'use vector unless you have a good reason not too'
As a rule of thumb, use std::vector instead of std::list, unless you need to insert elements in the middle of the sequence very often. Vectors tend to be faster, since they are stored contiguously and thus benefit from better spatial locality (a major performance factor on modern platforms).
Make sure you use elements that are natural for the platform. If you want to be platform independent, use long. Remember that unless you have some special compiler intrinsics available, you'll need a type at least twice as large to perform multiplication.
I don't understand why you'd want carry to be a big integer. Carry is a single bit for addition and element-sized for multiplication.
Make sure you read Knuth's Art of Computer Programming, algorithms pertaining to arbitrary precision arithmetic are described there to a great extent.

Categories