Java Lambda re-use Stream - java

i wanted to try out some of the functionality of lambdas and wanted to write filter an ArrayList and use the methods of IntStream to calculate the average and maximum of an ArrayList of numbers
My first thought was to just filter the ArrayList, save the stream and then use the methods to calculate:
ArrayList<Integer> arr = new ArrayList<>();
arr.add(5);
arr.add(7);
arr.add(11);
IntStream s = arr.stream().filter(i -> i < 10).mapToInt(i -> (int)i);
int maxBelowTen = s.max().getAsInt();
double avgBelowTen = s.average().getAsDouble();
System.out.println("Maximum below ten: " + maxBelowTen);
System.out.println("Average below ten: " + avgBelowTen);
However, this throws an java.lang.IllegalStateException: stream has already been operated upon or closed
With this information, i brought it to work of course, by opening two streams and filtering twice
int maxBelowTen = arr.stream().filter(i -> i < 10).mapToInt(i -> (int) i).max().getAsInt();
double avgBelowTen = arr.stream().filter(i -> i < 10).mapToInt(i -> (int) i).average().getAsDouble();
But my question now is about performance. Isn't that pretty slow, if i have to filter and map the stream twice. Why can't I operate more than once on a stream, i fail to understand why they implemented it this way.
Wouldn't it be possible to leave a stream open after an operation, because every operator method returns a new stream or a single value.
What is the reason for this, or am I just using it wrong?

If you did it the good old way, by using a loop, you would compute the max element and the average using a single loop. So you just need to do the same thing here. And fortunately, the Stream API can do it for you :
IntStream s = arr.stream().mapToInt(i -> Integer::intValue).filter(i < 10);
IntSummaryStatistics stats = s.summaryStatistics();
double average = stats.getAverage();
int max = stats.getMax();
Reading the javadoc of IntSummaryStatistics should help you understand how you could implement such an operation by yourself.

If you want to "save" the result of intermediate stream operations, you can do that; you just have to do it explicitly: you'd have to do arr.stream().filter(i -> i < 10).mapToInt(i -> (int) i).toArray(), store that int[], and then do the operations on that.
Streams are not a data structure, they're a pending computation that hasn't been run yet, and might merge future operations together in nonstandard ways. If you want to store intermediate results in a proper data structure, you have to do it yourself.

Related

Iterate list of objects and get average

Is there any way to get an average here via one iteration? I can do it with regular "For loop" but want to use stream instead.
final Double ratingSum = ratingCount.stream().mapToDouble(RecommendRatingCount::getRatingSum).sum();
final Double countSum = ratingCount.stream().mapToDouble(RecommendRatingCount::getCount).sum();
return ratingSum /countSum;
Assuming Java 12 or higher is used a teeing collector
return
ratingCount.stream()
.collect(Collectors.teeing(
Collectors.summingDouble(RecommendRatingCount::getRatingSum),
Collectors.summingDouble(RecommendRatingCount::getCount),
(sum, count) -> sum / count));
Decompose each object into separate ratings, each value being rating/count, by first expanding out each object count times, then converting each to its discounted value, then summarise all such values:
double average = ratingCount.stream()
.flatMap(rrc -> generate(() -> rrc).limit(rrc.getCount()))
.mapToDouble(rcc -> rcc.getRatingSum() / rcc.getCount())
.summaryStatistics().getAverage();
Assuming your RatingCount is a natural number.
return ratingCount.stream()
.flatMapToDouble(a -> DoubleStream.concat(DoubleStream.of(a.getRatingSum()),
DoubleStream.generate(() -> 0).limit((long) a.getCount() - 1)))
.average().orElse(0);

Common element in Java infinite streams

I have three infinite Java IntStream objects. I want to find the smallest element that is present in all three of them.
IntStream a = IntStream.iterate(286, i->i+1).map(i -> (Integer)i*(i+1)/2);
IntStream b = IntStream.iterate(166, i->i+1).map(i -> (Integer)i*(3*i-1)/2);
IntStream c = IntStream.iterate(144, i->i+1).map(i -> i*(2*i-1));
I can always employ a brute force solution (without streams) which involves iterating in nested loops, but I was wondering if we can do it more efficiently with streams?
You need to iterate all 3 in parallel, advancing the one with the lowest value, checking if all 3 are equal.
You code will not find an answer for next value after 40755, because the next value is 1_533_776_805, which has intermediate value (before division by 2) higher than Integer.MAX_VALUE (2_147_483_647).
So, here is one way to use your streams, after changing them to long and guarding against overflow.
LongStream a = LongStream.iterate(286, i->i+1).map(i -> Math.multiplyExact(i, i+1)/2);
LongStream b = LongStream.iterate(166, i->i+1).map(i -> Math.multiplyExact(i, 3*i-1)/2);
LongStream c = LongStream.iterate(144, i->i+1).map(i -> Math.multiplyExact(i, 2*i-1));
OfLong aIter = a.iterator();
OfLong bIter = b.iterator();
OfLong cIter = c.iterator();
long aVal = aIter.nextLong();
long bVal = bIter.nextLong();
long cVal = cIter.nextLong();
while (aVal != bVal || bVal != cVal) {
long min = Math.min(Math.min(aVal, bVal), cVal);
if (aVal == min)
aVal = aIter.nextLong();
if (bVal == min)
bVal = bIter.nextLong();
if (cVal == min)
cVal = cIter.nextLong();
}
System.out.println(aVal);
These functions are always increasing. So the code should stop when the magic equal triplet is found.
The thing to code is:
a) when a stream's current value is below any other, it can iterate next for itself.
b) when it meets the same candidate value, it waits for the 3rd stream to take a decision.
c) when it has a higher value than the all others, it changes the candidate and waits for both others.
Reference juggling.
There may not be a solution too (at least in short time).
Notice that stream c can only produce even numbers (when seeded with even). There might be some optimization there to skip a and b faster.
I don't think there is anything smart possible with stream API. The main reason is that you can't really go over one stream until some condition is met - instead, you look at current 3 elements and pick the next element from one of the streams before comparing the elements again.
The most efficient (and might be also the cleanest) solution is to use iterators and keep calling next() method on the right streams until the answer is found.
To start with, you can focus on two streams only and find their first common value:
while (elementA != elementB) {
if (elementA < elementB) {
elementA = iteratorA.next();
} else {
elementB = iteratorB.next();
}
}
Then you need to do make third stream catch up with these two:
while (elementC < elementA) {
elementC = iteratorC.next();
}
At this point there are two options:
either elementC == elementA in which case you have the answer
or elementC > elementA in which case you can go to next value on all three streams and start over
One thing to remember is the max value of integer. Because you have i^2, this means that it will overflow for i about 46k, so you need to change streams of ints to streams of longs (the answer is about 1.5 billion - and that's after division by 2 in these functions).
Since you are doing exercises for practice, I don't think it's right to give you the full working code, but let me know if you still struggle with it ;)

Using the stream api of java for a feed forward computation

For a college project of mine i needed to implement a deeplearning neural network in plain java. After profiling the application i wanted to see if the automatic parallelization using java's stream api would lead to a significant improvement in performance, but i am struggling to transform my old code to a stream based approach.
The method takes a vector (double array), performs a matrix multiplication, then adds a value to each element and finally applies a lambda function (DoubleFunction) to every element.
Here is the old code that i want to replace:
/* e.g.
double[] x = double[100]
int inputNeurons = 100
int outputNeurons = 200
double[][] weights = double[200][100]
double[] biases = double[200]
*/
private double[] output(double[] x) {
double[] y = new double[outputNeurons];
for (int i = 0; i < outputNeurons; i++) {
double preActivation = 0.;
for (int j = 0; j < inputNeurons; j++) {
preActivation += weights[i][j] * x[j];
}
preActivation += biases[i];
y[i] = activation.apply(preActivation);
}
}
This is what i came up with so far (it does not work):
private double[] output(double[] x) {
return Arrays.stream(weights).parallel()
.map(outputNeuron -> IntStream.range(0, outputNeurons)
.mapToDouble(i -> IntStream.range(0, inputNeurons)
.mapToDouble(j -> x[i] * outputNeuron[i]).sum()
).map(activation::apply)
).toArray();
Since i don't know streams good enough, i would really appreciate any help!
Good attempt but your stream approach is quite off the imperative one. the exact equivalent of your imperative approach is:
return IntStream.range(0, outputNeurons)
//.parallel() uncomment to see difference in performance
.mapToDouble(i -> IntStream.range(0, inputNeurons)
.mapToDouble(j -> weights[i][j] * x[j]).sum() + biases[i])
.map(activation::apply)
.toArray();
Note, there are many factors that influence whether parallel streams will make your code faster or slower than your imperative approach or sequential streams. Thus, you'll need to consider some factors before going parallel.
Data size
Number of cores
Cost per element (meaning time spent executing in parallel and overhead of decomposition and merging)
Source data structure
Packing (meaning primitive types are faster to operate on than boxed values).
You should also consider reading Should I always use a parallel stream when possible?

Determinism of Java 8 streams

Motivation
I've just rewritten some 30 mostly trivial parsers and I need that the new versions behave exactly like the old ones. Therefore, I stored their example input files and some signature of the outputs produced by the old parsers for comparison with the new ones. This signature contains the counts of successfully parsed items, sums of some hash codes and up to 10 pseudo-randomly chosen items.
I thought this was a good idea as the equality of the hash code sums sort of guarantee that the outputs are exactly the same and the samples allow me to see what's wrong. I'm only using samples as otherwise it'd get really big.
The problem
Basically, given an unordered collection of strings, I want to get a list of up to 10 of them, so that when the collection changes a bit, I still get mostly the same samples in the same positions (the input is unordered, but the output is a list). This should work also when something is missing, so ideas like taking the 100th smallest element don't work.
ImmutableList<String> selectSome(Collection<String> list) {
if (list.isEmpty()) return ImmutableList.of();
return IntStream.range(1, 20)
.mapToObj(seed -> selectOne(list, seed))
.distinct()
.limit(10)
.collect(ImmutableList.toImmutableList());
}
So I start with numbers from 1 to 20 (so that after distinct I still most probably have my 10 samples), call a stateless deterministic function selectOne (defined below) returning one string which is maximal according to some funny criteria, remove duplicates, limit the result and collect it using Guava. All steps should be IMHO deterministic and "ordered", but I may be overlooking something. The other possibility would be that all my 30 new parsers are wrong, but this is improbable given that the hashes are correct. Moreover, the results of the parsing look correct.
String selectOne(Collection<String> list, int seed) {
// some boring mixing, definitely deterministic
for (int i=0; i<10; ++i) {
seed *= 123456789;
seed = Integer.rotateLeft(seed, 16);
}
// ensure seed is odd
seed = 2*seed + 1;
// first element is the candidate result
String result = list.iterator().next();
// the value is the hash code multiplied by the seed
// overflow is fine
int value = seed * result.hashCode();
// looking for s maximizing seed * s.hashCode()
for (final String s : list) {
final int v = seed * s.hashCode();
if (v < value) continue;
// tiebreaking by taking the bigger or smaller s
// this is needed for determinism
if (s.compareTo(result) * seed < 0) continue;
result = s;
value = v;
}
return result;
}
This sampling doesn't seem to work. I get a sequence like
"9224000", "9225000", "4165000", "9200000", "7923000", "8806000", ...
with one old parser and
"9224000", "9225000", "4165000", "3030000", "1731000", "8806000", ...
with a new one. Both results are perfectly repeatable. For other parsers, it looks very similar.
Is my usage of streams wrong? Do I have to add .sequential() or alike?
Update
Sorting the input collection has solved the problem:
ImmutableList<String> selectSome(Collection<String> collection) {
final List<String> list = Lists.newArrayList(collection);
Collections.sort(list);
.... as before
}
What's still missing is an explanation why.
The explanation
As stated in the answers, my tiebreaker was an all-breaker as I missed to check for a tie. Something like
if (v==value && s.compareTo(result) < 0) continue;
works fine.
I hope that my confused question may be at least useful for someone looking for "consistent sampling". It wasn't really Java 8 related.
I should've used Guava ComparisonChain or better Java 8 arg max to avoid my stupid mistake:
String selectOne(Collection<String> list, int seed) {
.... as before
final int multiplier = 2*seed + 1;
return list.stream()
.max(Comparator.comparingInt(s -> multiplier * s.hashCode())
.thenComparing(s -> s)) // <--- FOOL-PROOF TIEBREAKER
.get();
}
The mistake is that your tiebreaker is not in fact breaking a tie. We should be selecting s when v > value, but instead we're falling back to compareTo(). This breaks comparison symmetry, making your algorithm dependent on encounter order.
As a bonus, here's a simple test case to reproduce the bug:
System.out.println(selectOne(Arrays.asList("1", "2"), 4)); // 1
System.out.println(selectOne(Arrays.asList("2", "1"), 4)); // 2
In selectOne you just want to select String s with max rank of value = seed * s.hashCode(); for that given seed.
The problem is with the "tiebreaking" line:
if (s.compareTo(result) * seed < 0) continue;
It is not deterministic - for different order of elements it omits different elements from being check, and thus change in order of elements is changing the result.
Remove the tiebreaking if and the result will be insensitive to the order of elements in input list.

Java 8 Stream and operation on arrays

I have just discovered the new Java 8 stream capabilities. Coming from Python, I was wondering if there was now a neat way to do operations on arrays like summing, multiplying two arrays in a "one line pythonic" way ?
Thanks
There are new methods added to java.util.Arrays to convert an array into a Java 8 stream which can then be used for summing etc.
int sum = Arrays.stream(myIntArray).sum();
Multiplying two arrays is a little more difficult because I can't think of a way to get the value AND the index at the same time as a Stream operation. This means you probably have to stream over the indexes of the array.
//in this example a[] and b[] are same length
int[] a = ...
int[] b = ...
int[] result = new int[a.length];
IntStream.range(0, a.length).forEach(i -> result[i] = a[i] * b[i]);
Commenter #Holger points out you can use the map method instead of forEach like this:
int[] result = IntStream.range(0, a.length).map(i -> a[i] * b[i]).toArray();
You can turn an array into a stream by using Arrays.stream():
int[] ns = new int[] {1,2,3,4,5};
Arrays.stream(ns);
Once you've got your stream, you can use any of the methods described in the documentation, like sum() or whatever. You can map or filter like in Python by calling the relevant stream methods with a Lambda function:
Arrays.stream(ns).map(n -> n * 2);
Arrays.stream(ns).filter(n -> n % 4 == 0);
Once you're done modifying your stream, you then call toArray() to convert it back into an array to use elsewhere:
int[] ns = new int[] {1,2,3,4,5};
int[] ms = Arrays.stream(ns).map(n -> n * 2).filter(n -> n % 4 == 0).toArray();
Be careful if you have to deal with large numbers.
int[] arr = new int[]{Integer.MIN_VALUE, Integer.MIN_VALUE};
long sum = Arrays.stream(arr).sum(); // Wrong: sum == 0
The sum above is not 2 * Integer.MIN_VALUE.
You need to do this in this case.
long sum = Arrays.stream(arr).mapToLong(Long::valueOf).sum(); // Correct
Please note that Arrays.stream(arr) create a LongStream (or IntStream, ...) instead of Stream so the map function cannot be used to modify the type. This is why .mapToLong, mapToObject, ... functions are provided.
Take a look at why-cant-i-map-integers-to-strings-when-streaming-from-an-array

Categories