Implementing a reduction operation for finding the maximum difference of numbers? - java

I am searching for a suitable stream-based reduction operation to find the maximum difference of a double-list. (Please no solutions with old-style nested for-loops...)
Lets say my double list is
List<Double> list = List.of(1.1, 0.3, 7.8, 1.0, 9.1, 2.3);
then the maximum difference would result to the value
9.1 - 0.3 = 8.8
My first working approach with the use of java-streams is:
List<Double> list = List.of(1.1, 0.3, 7.8, 1.0, 9.1, 2.3);
double maxDiff = list.stream().max(Double::compareTo).get()
- list.stream().min(Double::compareTo).get();
How to implement a custom accumulator and combiner for the reduction operation Stream.reduce() to achieve the same?

You can avoid streaming over the list twice by using the teeing collector:
double diff = list.stream()
.collect(Collectors.teeing(
Collectors.reducing(Double::max),
Collectors.reducing(Double::min),
(max,min) -> max.get() - min.get()));
Or using summaryStatistics
DoubleSummaryStatistics statistics = list.stream()
.mapToDouble(Double::doubleValue)
.summaryStatistics();
double diff = statistics.getMax() - statistics.getMin();

For small list you could possibly make a Cartesian product of the list with itself and calculate the difference of every combination before reducing:
double maxDiff = list.stream()
.flatMap(a -> list.stream().map(b -> a - b))
.reduce(Double::max)
.get();
System.out.println(maxDiff);

Related

Iterate list of objects and get average

Is there any way to get an average here via one iteration? I can do it with regular "For loop" but want to use stream instead.
final Double ratingSum = ratingCount.stream().mapToDouble(RecommendRatingCount::getRatingSum).sum();
final Double countSum = ratingCount.stream().mapToDouble(RecommendRatingCount::getCount).sum();
return ratingSum /countSum;
Assuming Java 12 or higher is used a teeing collector
return
ratingCount.stream()
.collect(Collectors.teeing(
Collectors.summingDouble(RecommendRatingCount::getRatingSum),
Collectors.summingDouble(RecommendRatingCount::getCount),
(sum, count) -> sum / count));
Decompose each object into separate ratings, each value being rating/count, by first expanding out each object count times, then converting each to its discounted value, then summarise all such values:
double average = ratingCount.stream()
.flatMap(rrc -> generate(() -> rrc).limit(rrc.getCount()))
.mapToDouble(rcc -> rcc.getRatingSum() / rcc.getCount())
.summaryStatistics().getAverage();
Assuming your RatingCount is a natural number.
return ratingCount.stream()
.flatMapToDouble(a -> DoubleStream.concat(DoubleStream.of(a.getRatingSum()),
DoubleStream.generate(() -> 0).limit((long) a.getCount() - 1)))
.average().orElse(0);

Best way to generate a List<Double> sequence of values given start, end, and step?

I'm actually very surprised I was unable to find the answer to this here, though maybe I'm just using the wrong search terms or something. Closest I could find is this, but they ask about generating a specific range of doubles with a specific step size, and the answers treat it as such. I need something that will generate the numbers with arbitrary start, end and step size.
I figure there has to be some method like this in a library somewhere already, but if so I wasn't able to find it easily (again, maybe I'm just using the wrong search terms or something). So here's what I've cooked up on my own in the last few minutes to do this:
import java.lang.Math;
import java.util.List;
import java.util.ArrayList;
public class DoubleSequenceGenerator {
/**
* Generates a List of Double values beginning with `start` and ending with
* the last step from `start` which includes the provided `end` value.
**/
public static List<Double> generateSequence(double start, double end, double step) {
Double numValues = (end-start)/step + 1.0;
List<Double> sequence = new ArrayList<Double>(numValues.intValue());
sequence.add(start);
for (int i=1; i < numValues; i++) {
sequence.add(start + step*i);
}
return sequence;
}
/**
* Generates a List of Double values beginning with `start` and ending with
* the last step from `start` which includes the provided `end` value.
*
* Each number in the sequence is rounded to the precision of the `step`
* value. For instance, if step=0.025, values will round to the nearest
* thousandth value (0.001).
**/
public static List<Double> generateSequenceRounded(double start, double end, double step) {
if (step != Math.floor(step)) {
Double numValues = (end-start)/step + 1.0;
List<Double> sequence = new ArrayList<Double>(numValues.intValue());
double fraction = step - Math.floor(step);
double mult = 10;
while (mult*fraction < 1.0) {
mult *= 10;
}
sequence.add(start);
for (int i=1; i < numValues; i++) {
sequence.add(Math.round(mult*(start + step*i))/mult);
}
return sequence;
}
return generateSequence(start, end, step);
}
}
These methods run a simple loop multiplying the step by the sequence index and adding to the start offset. This mitigates compounding floating-point errors which would occur with continuous incrementation (such as adding the step to a variable on each iteration).
I added the generateSequenceRounded method for those cases where a fractional step size can cause noticeable floating-point errors. It does require a bit more arithmetic, so in extremely performance sensitive situations such as ours, it's nice to have the option of using the simpler method when the rounding is unnecessary. I suspect that in most general use cases the rounding overhead would be negligible.
Note that I intentionally excluded logic for handling "abnormal" arguments such as Infinity, NaN, start > end, or a negative step size for simplicity and desire to focus on the question at hand.
Here's some example usage and corresponding output:
System.out.println(DoubleSequenceGenerator.generateSequence(0.0, 2.0, 0.2))
System.out.println(DoubleSequenceGenerator.generateSequenceRounded(0.0, 2.0, 0.2));
System.out.println(DoubleSequenceGenerator.generateSequence(0.0, 102.0, 10.2));
System.out.println(DoubleSequenceGenerator.generateSequenceRounded(0.0, 102.0, 10.2));
[0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0, 1.2000000000000002, 1.4000000000000001, 1.6, 1.8, 2.0]
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]
[0.0, 10.2, 20.4, 30.599999999999998, 40.8, 51.0, 61.199999999999996, 71.39999999999999, 81.6, 91.8, 102.0]
[0.0, 10.2, 20.4, 30.6, 40.8, 51.0, 61.2, 71.4, 81.6, 91.8, 102.0]
Is there an existing library that provides this kind of functionality already?
If not, are there any issues with my approach?
Does anyone have a better approach to this?
Sequences can be easily generated using Java 11 Stream API.
The straightforward approach is to use DoubleStream:
public static List<Double> generateSequenceDoubleStream(double start, double end, double step) {
return DoubleStream.iterate(start, d -> d <= end, d -> d + step)
.boxed()
.collect(toList());
}
On ranges with a large number of iterations, double precision error could accumulate resulting in bigger error closer to the end of the range.
The error can be minimised by switching to IntStream and using integers and single double multiplier:
public static List<Double> generateSequenceIntStream(int start, int end, int step, double multiplier) {
return IntStream.iterate(start, i -> i <= end, i -> i + step)
.mapToDouble(i -> i * multiplier)
.boxed()
.collect(toList());
}
To get rid of a double precision error at all, BigDecimal can be used:
public static List<Double> generateSequenceBigDecimal(BigDecimal start, BigDecimal end, BigDecimal step) {
return Stream.iterate(start, d -> d.compareTo(end) <= 0, d -> d.add(step))
.mapToDouble(BigDecimal::doubleValue)
.boxed()
.collect(toList());
}
Examples:
public static void main(String[] args) {
System.out.println(generateSequenceDoubleStream(0.0, 2.0, 0.2));
//[0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0, 1.2, 1.4, 1.5999999999999999, 1.7999999999999998, 1.9999999999999998]
System.out.println(generateSequenceIntStream(0, 20, 2, 0.1));
//[0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0, 1.2000000000000002, 1.4000000000000001, 1.6, 1.8, 2.0]
System.out.println(generateSequenceBigDecimal(new BigDecimal("0"), new BigDecimal("2"), new BigDecimal("0.2")));
//[0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]
}
Method iterate with this signature (3 parameters) was added in Java 9. So, for Java 8 the code looks like
DoubleStream.iterate(start, d -> d + step)
.limit((int) (1 + (end - start) / step))
Me personally, I would shorten the DoubleSequenceGenerator class up a bit for other goodies and use only one sequence generator method that contains the option to utilize whatever desired precision wanted or utilize no precision at all:
In the generator method below, if nothing (or any value less than 0) is supplied to the optional setPrecision parameter then no decimal precision rounding is carried out. If 0 is supplied for a precision value then the numbers are rounded to their nearest whole number (ie: 89.674 is rounded to 90.0). If a specific precision value greater than 0 is supplied then values are converted to that decimal precision.
BigDecimal is used here for...well....precision:
import java.util.List;
import java.util.ArrayList;
import java.math.BigDecimal;
import java.math.RoundingMode;
public class DoubleSequenceGenerator {
public static List<Double> generateSequence(double start, double end,
double step, int... setPrecision) {
int precision = -1;
if (setPrecision.length > 0) {
precision = setPrecision[0];
}
List<Double> sequence = new ArrayList<>();
for (double val = start; val < end; val+= step) {
if (precision > -1) {
sequence.add(BigDecimal.valueOf(val).setScale(precision, RoundingMode.HALF_UP).doubleValue());
}
else {
sequence.add(BigDecimal.valueOf(val).doubleValue());
}
}
if (sequence.get(sequence.size() - 1) < end) {
sequence.add(end);
}
return sequence;
}
// Other class goodies here ....
}
And in main():
System.out.println(generateSequence(0.0, 2.0, 0.2));
System.out.println(generateSequence(0.0, 2.0, 0.2, 0));
System.out.println(generateSequence(0.0, 2.0, 0.2, 1));
System.out.println();
System.out.println(generateSequence(0.0, 102.0, 10.2, 0));
System.out.println(generateSequence(0.0, 102.0, 10.2, 0));
System.out.println(generateSequence(0.0, 102.0, 10.2, 1));
And the console displays:
[0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0, 1.2, 1.4, 1.5999999999999999, 1.7999999999999998, 1.9999999999999998, 2.0]
[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0]
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]
[0.0, 10.2, 20.4, 30.599999999999998, 40.8, 51.0, 61.2, 71.4, 81.60000000000001, 91.80000000000001, 102.0]
[0.0, 10.0, 20.0, 31.0, 41.0, 51.0, 61.0, 71.0, 82.0, 92.0, 102.0]
[0.0, 10.2, 20.4, 30.6, 40.8, 51.0, 61.2, 71.4, 81.6, 91.8, 102.0]
Is there an existing library that provides this kind of functionality already?
Sorry, I don't know, but judging by other answers, and their relative simplicity - no, there isn't. No need. Well, almost...
If not, are there any issues with my approach?
Yes and no. You have at least one bug, and some room for performance boost, but the approach itself is correct.
Your bug: rounding error (just change while (mult*fraction < 1.0) to while (mult*fraction < 10.0) and that should fix it)
All the others do not reach the end... well, maybe they just weren't observant enough to read comments in your code
All the others are slower.
Just changing condition in the main loop from int < Double to int < int will noticeably increase the speed of your code
Does anyone have a better approach to this?
Hmm... In what way?
Simplicity? generateSequenceDoubleStream of #Evgeniy Khyst looks quite simple. And should be used... but maybe no, because of next two points
Precise? generateSequenceDoubleStream is not! But still can be saved with the pattern start + step*i.
And start + step*i pattern is precise. Only BigDouble and fixed-point arithmetic can beat it. But BigDoubles are slow, and manual fixed-point arithmetic is tedious and may be inappropriate for your data.
By the way, on the matters of precision, you can entertain yourself with this: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Speed... well now we are on shaky grounds.
Check out this repl https://repl.it/repls/RespectfulSufficientWorker
I do not have a decent test stand right now, so I used repl.it... which is totally inadequate for performance testing, but it's not the main point. The point is - there is no definite answer. Except that maybe in your case, which is not totally clear from you question, you definitely should not use BigDecimal (read further).
I've tried to play and optimize for big inputs. And your original code, with some minor changes - the fastest. But maybe you need enormous amounts of small Lists? Then that can be a totally different story.
This code is quite simple to my taste, and fast enough:
public static List<Double> genNoRoundDirectToDouble(double start, double end, double step) {
int len = (int)Math.ceil((end-start)/step) + 1;
var sequence = new ArrayList<Double>(len);
sequence.add(start);
for (int i=1 ; i < len ; ++i) sequence.add(start + step*i);
return sequence;
}
If you prefer a more elegant way (or we should call it idiomatic), I, personally, would suggest:
public static List<Double> gen_DoubleStream_presice(double start, double end, double step) {
return IntStream.range(0, (int)Math.ceil((end-start)/step) + 1)
.mapToDouble(i -> start + i * step)
.boxed()
.collect(Collectors.toList());
}
Anyway, possible performance boosts are:
Try switching from Double to double, and if you really need them, you can switch back again, judging by the tests, it still may be faster. (But don't trust my, try it yourself with your data in your environment. As I said - repl.it sucks for benchmarks)
A little magic: separate loop for Math.round()... maybe it has something to do with data locality. I do not recommend this - result is very unstable. But it's fun.
double[] sequence = new double[len];
for (int i=1; i < len; ++i) sequence[i] = start + step*i;
List<Double> list = new ArrayList<Double>(len);
list.add(start);
for (int i=1; i < len; ++i) list.add(Math.round(sequence[i])/mult);
return list;
You should definitely consider to be more lazy and generate numbers on demand without storing then in Lists
I suspect that in most general use cases the rounding overhead would be negligible.
If you suspect something - test it :-) My answer is "Yes", but again... don't believe me. Test it.
So, back to the main question: Is there an better way?
Yes, of course!
But it depends.
Choose BigDecimal if you need very big numbers and very small numbers. But if you cast them back to Double, and more than that, use it with numbers of "close" magnitude - no need for them! Checkout the same repl: https://repl.it/repls/RespectfulSufficientWorker - last test shows that there will be no difference in results, but a dig loss in speed.
Make some micro-optimizations based on your data properties, your task, and your environment.
Prefer short and simple code if there is not to much to gain from performance boost of 5-10%. Don't waist your time
Maybe use fixed-point arithmetic if you can and if it's worth it.
Other than that, you are fine.
PS. There's also a Kahan Summation Formula implementation in the repl... just for fun. https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1346 and it works - you can mitigate summation errors
Try this.
public static List<Double> generateSequenceRounded(double start, double end, double step) {
long mult = (long) Math.pow(10, BigDecimal.valueOf(step).scale());
return DoubleStream.iterate(start, d -> (double) Math.round(mult * (d + step)) / mult)
.limit((long) (1 + (end - start) / step)).boxed().collect(Collectors.toList());
}
Here,
int java.math.BigDecimal.scale()
Returns the scale of this BigDecimal. If zero or positive, the scale is the number of digits to the right ofthe decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. For example, a scale of -3 means the unscaled value is multiplied by 1000.
In main()
System.out.println(generateSequenceRounded(0.0, 102.0, 10.2));
System.out.println(generateSequenceRounded(0.0, 102.0, 10.24367));
And Output:
[0.0, 10.2, 20.4, 30.6, 40.8, 51.0, 61.2, 71.4, 81.6, 91.8, 102.0]
[0.0, 10.24367, 20.48734, 30.73101, 40.97468, 51.21835, 61.46202, 71.70569, 81.94936, 92.19303]

Find maximum and minimum double value in a list

I have the following list of double values:
items {9.0, 4.0, 16.0, -6.0, 5.0}
I want to find the maximum and minimum values and for that I did:
double max = items.stream().max(Comparator.comparing(String::valueOf)).get();
double min = items.stream().min(Comparator.comparing(String::valueOf)).get();
The result that I got is max=9.0 and min=-6.0. I was expecting the maximum to be 16.0. Later, I changed 16.0 to 92.0 and it worked; it gave me max=92.0
Do you know how to solve that?
You don't want to compare using strings but by the natural order of your double elements, i.e. Comparator.naturalOrder() instead of Comparator.comparing(String::valueOf).
Comparing via strings will result in the characters being compared and since the character value of 9 (of "9.0") is greater than 1 (of "16.0") you get the result you see. Changing "16.0" to "92.0" will result in . to be compared with 2 (since the first character is equal) and thus "92xx" is greater than "9.xx".
What about :
double max = items.stream().mapToDouble(Double::doubleValue).max().getAsDouble();//16.0
double min = items.stream().mapToDouble(Double::doubleValue).min().getAsDouble();//-6.0
It appears you want to compare the number numerically instead of their String representation e.g. "16.0" < "9.0" as '1' < '9'
List<Double> items = Arrays.asList(9.0, 4.0, 16.0, -6.0, 5.0);
double max = items.stream().max(Comparator.naturalOrder()).get();
double min = items.stream().min(Comparator.naturalOrder()).get();
System.out.println(min + " " + max);
prints
-6.0 16.0
There's a more appropriate stream type for doubles. Using it, you can get min and max in one terminal operation (eliminating the need to supply a comparator in the process):
DoubleSummaryStatistics stats = items.stream().mapToDouble(d -> d)
.summaryStatistics();
//even better: DoubleStream.of(9.0, 4.0, 16.0, -6.0, 5.0).summaryStatistics()
And stats will have:
count=5, sum=28.000000, min=-6.000000, average=5.600000, max=16.000000
Your are comparing them as String so by alphabetical order : 1xx is before 9xx
You need to compare then as Double or Integer, so use Comparator.comparing(Double::valueOf) (or Integer::valueOf)
items
.stream()
.sorted()
.limit(1)
.collect(Colletors.toList();

how to iteratively subtract and compare values in java

there are n metric values like below
metric of feature clear 0.4054651081081644
metric of board various 0.6931471805599453
metric of design few 0.025975486403260736
metric of call end 0.13353139262452257
metric of bag other 0.1823215567939546
Now, the highest value has to be taken(board various). From highest value how subtract all the other values one by one iteratively and retrieve values that are above 0.02
List<Double> vals = <list of your values>
double max = Collections.max(vals);
vals.remove(max);
double result = vals.stream()
.filter(v -> v > .02)
.reduce(max, (v1, v2) -> v1 - v2);
Pseudo code (Python, actually):
metrics = [0.4054651081081644, 0.6931471805599453, ... 0.1823215567939546]
mx = max(metrics)
for m in metrics:
if m != mx and m > 0.02:
mx = mx - m
The translation to Java is left as an exercise to the reader.
Caution: Comparing floating point numbers can be tricky.

is there a faster way to search through cumulative distribution?

I have a List<Double> that holds probabilities (weights) for sampling an item. For example, the List holds 5 values as follows.
0.1, 0.4, 0.2, 0.1, 0.2
Each i-th Double value is the probability of sampling the i-th item of another List<Object>.
How can I construct an algorithm to perform the sampling according to these probabilities?
I tried something like this, where I first made the list of probabilities into a cumulative form.
0.1, 0.5, 0.7, 0.8, 1.0
Then my approach is as follows. I generate a random double, and iterate over the list to find the first item that is larger than the random double, and then return its index.
Random r = new Random();
double p = r.nextDouble();
int total = list.size();
for(int i=0; i < total; i++) {
double d = list.get(i);
if(d > p) {
return i;
}
}
return total-1;
This approach is slow as I am crawling through the list sequentially. In reality, my list is of 800,000 items associated with weights (probabilities) that I need to sample from. So, needless to say, this sequential approach is slow.
I'm not sure how binary search can help. Let's say I generated p = 0.01. Then, a binary search can use recursion as follows with the list.
compare 0.01 to 0.7, repeat with L = 0.1, 0.5
compare 0.01 to 0.1, stop
compare 0.01 to 0.5, stop
0.01 is smaller than 0.7, 0.5, and 0.1, but I obviously only want 0.1. So the stopping criteria is still not clear to me when using binary search.
If there's a library to help with this type of thing I'd also be interested.
Here is how you could do it using binary search, starting with the cumulative probabilities:
public static void main (String[] args) {
double[] cdf = {0.1, 0.5, 0.7, 0.8, 1.0};
double random = 0.75; // generate randomly between zero and one
int el = Arrays.binarySearch(cdf, random);
if (el < 0) {
el = -(el + 1);
}
System.out.println(el);
}
P.S. When the list of probabilities is short, a simple linear scan might turn out to be as efficient as binary search.
This isn't the most memory-efficient approach, but use a NavigableMap where your cumulative list's values are the keys. Then you can just use floorEntry(randon.nextDouble()). Like the binary search, it's log(n) space and n memory.
So...
NavigableMap<Double, Object> pdf = new TreeMap<>();
pdf.put(0.0, "foo");
pdf.put(0.1, "bar");
pdf.put(0.5, "baz");
pdf.put(0.7, "quz");
pdf.put(0.8, "quuz");
Random random = new Random();
pdf.floorEntry(random.nextDouble()).getValue();

Categories