I'm actually very surprised I was unable to find the answer to this here, though maybe I'm just using the wrong search terms or something. Closest I could find is this, but they ask about generating a specific range of doubles with a specific step size, and the answers treat it as such. I need something that will generate the numbers with arbitrary start, end and step size.
I figure there has to be some method like this in a library somewhere already, but if so I wasn't able to find it easily (again, maybe I'm just using the wrong search terms or something). So here's what I've cooked up on my own in the last few minutes to do this:
import java.lang.Math;
import java.util.List;
import java.util.ArrayList;
public class DoubleSequenceGenerator {
/**
* Generates a List of Double values beginning with `start` and ending with
* the last step from `start` which includes the provided `end` value.
**/
public static List<Double> generateSequence(double start, double end, double step) {
Double numValues = (end-start)/step + 1.0;
List<Double> sequence = new ArrayList<Double>(numValues.intValue());
sequence.add(start);
for (int i=1; i < numValues; i++) {
sequence.add(start + step*i);
}
return sequence;
}
/**
* Generates a List of Double values beginning with `start` and ending with
* the last step from `start` which includes the provided `end` value.
*
* Each number in the sequence is rounded to the precision of the `step`
* value. For instance, if step=0.025, values will round to the nearest
* thousandth value (0.001).
**/
public static List<Double> generateSequenceRounded(double start, double end, double step) {
if (step != Math.floor(step)) {
Double numValues = (end-start)/step + 1.0;
List<Double> sequence = new ArrayList<Double>(numValues.intValue());
double fraction = step - Math.floor(step);
double mult = 10;
while (mult*fraction < 1.0) {
mult *= 10;
}
sequence.add(start);
for (int i=1; i < numValues; i++) {
sequence.add(Math.round(mult*(start + step*i))/mult);
}
return sequence;
}
return generateSequence(start, end, step);
}
}
These methods run a simple loop multiplying the step by the sequence index and adding to the start offset. This mitigates compounding floating-point errors which would occur with continuous incrementation (such as adding the step to a variable on each iteration).
I added the generateSequenceRounded method for those cases where a fractional step size can cause noticeable floating-point errors. It does require a bit more arithmetic, so in extremely performance sensitive situations such as ours, it's nice to have the option of using the simpler method when the rounding is unnecessary. I suspect that in most general use cases the rounding overhead would be negligible.
Note that I intentionally excluded logic for handling "abnormal" arguments such as Infinity, NaN, start > end, or a negative step size for simplicity and desire to focus on the question at hand.
Here's some example usage and corresponding output:
System.out.println(DoubleSequenceGenerator.generateSequence(0.0, 2.0, 0.2))
System.out.println(DoubleSequenceGenerator.generateSequenceRounded(0.0, 2.0, 0.2));
System.out.println(DoubleSequenceGenerator.generateSequence(0.0, 102.0, 10.2));
System.out.println(DoubleSequenceGenerator.generateSequenceRounded(0.0, 102.0, 10.2));
[0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0, 1.2000000000000002, 1.4000000000000001, 1.6, 1.8, 2.0]
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]
[0.0, 10.2, 20.4, 30.599999999999998, 40.8, 51.0, 61.199999999999996, 71.39999999999999, 81.6, 91.8, 102.0]
[0.0, 10.2, 20.4, 30.6, 40.8, 51.0, 61.2, 71.4, 81.6, 91.8, 102.0]
Is there an existing library that provides this kind of functionality already?
If not, are there any issues with my approach?
Does anyone have a better approach to this?
Sequences can be easily generated using Java 11 Stream API.
The straightforward approach is to use DoubleStream:
public static List<Double> generateSequenceDoubleStream(double start, double end, double step) {
return DoubleStream.iterate(start, d -> d <= end, d -> d + step)
.boxed()
.collect(toList());
}
On ranges with a large number of iterations, double precision error could accumulate resulting in bigger error closer to the end of the range.
The error can be minimised by switching to IntStream and using integers and single double multiplier:
public static List<Double> generateSequenceIntStream(int start, int end, int step, double multiplier) {
return IntStream.iterate(start, i -> i <= end, i -> i + step)
.mapToDouble(i -> i * multiplier)
.boxed()
.collect(toList());
}
To get rid of a double precision error at all, BigDecimal can be used:
public static List<Double> generateSequenceBigDecimal(BigDecimal start, BigDecimal end, BigDecimal step) {
return Stream.iterate(start, d -> d.compareTo(end) <= 0, d -> d.add(step))
.mapToDouble(BigDecimal::doubleValue)
.boxed()
.collect(toList());
}
Examples:
public static void main(String[] args) {
System.out.println(generateSequenceDoubleStream(0.0, 2.0, 0.2));
//[0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0, 1.2, 1.4, 1.5999999999999999, 1.7999999999999998, 1.9999999999999998]
System.out.println(generateSequenceIntStream(0, 20, 2, 0.1));
//[0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0, 1.2000000000000002, 1.4000000000000001, 1.6, 1.8, 2.0]
System.out.println(generateSequenceBigDecimal(new BigDecimal("0"), new BigDecimal("2"), new BigDecimal("0.2")));
//[0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]
}
Method iterate with this signature (3 parameters) was added in Java 9. So, for Java 8 the code looks like
DoubleStream.iterate(start, d -> d + step)
.limit((int) (1 + (end - start) / step))
Me personally, I would shorten the DoubleSequenceGenerator class up a bit for other goodies and use only one sequence generator method that contains the option to utilize whatever desired precision wanted or utilize no precision at all:
In the generator method below, if nothing (or any value less than 0) is supplied to the optional setPrecision parameter then no decimal precision rounding is carried out. If 0 is supplied for a precision value then the numbers are rounded to their nearest whole number (ie: 89.674 is rounded to 90.0). If a specific precision value greater than 0 is supplied then values are converted to that decimal precision.
BigDecimal is used here for...well....precision:
import java.util.List;
import java.util.ArrayList;
import java.math.BigDecimal;
import java.math.RoundingMode;
public class DoubleSequenceGenerator {
public static List<Double> generateSequence(double start, double end,
double step, int... setPrecision) {
int precision = -1;
if (setPrecision.length > 0) {
precision = setPrecision[0];
}
List<Double> sequence = new ArrayList<>();
for (double val = start; val < end; val+= step) {
if (precision > -1) {
sequence.add(BigDecimal.valueOf(val).setScale(precision, RoundingMode.HALF_UP).doubleValue());
}
else {
sequence.add(BigDecimal.valueOf(val).doubleValue());
}
}
if (sequence.get(sequence.size() - 1) < end) {
sequence.add(end);
}
return sequence;
}
// Other class goodies here ....
}
And in main():
System.out.println(generateSequence(0.0, 2.0, 0.2));
System.out.println(generateSequence(0.0, 2.0, 0.2, 0));
System.out.println(generateSequence(0.0, 2.0, 0.2, 1));
System.out.println();
System.out.println(generateSequence(0.0, 102.0, 10.2, 0));
System.out.println(generateSequence(0.0, 102.0, 10.2, 0));
System.out.println(generateSequence(0.0, 102.0, 10.2, 1));
And the console displays:
[0.0, 0.2, 0.4, 0.6000000000000001, 0.8, 1.0, 1.2, 1.4, 1.5999999999999999, 1.7999999999999998, 1.9999999999999998, 2.0]
[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0]
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]
[0.0, 10.2, 20.4, 30.599999999999998, 40.8, 51.0, 61.2, 71.4, 81.60000000000001, 91.80000000000001, 102.0]
[0.0, 10.0, 20.0, 31.0, 41.0, 51.0, 61.0, 71.0, 82.0, 92.0, 102.0]
[0.0, 10.2, 20.4, 30.6, 40.8, 51.0, 61.2, 71.4, 81.6, 91.8, 102.0]
Is there an existing library that provides this kind of functionality already?
Sorry, I don't know, but judging by other answers, and their relative simplicity - no, there isn't. No need. Well, almost...
If not, are there any issues with my approach?
Yes and no. You have at least one bug, and some room for performance boost, but the approach itself is correct.
Your bug: rounding error (just change while (mult*fraction < 1.0) to while (mult*fraction < 10.0) and that should fix it)
All the others do not reach the end... well, maybe they just weren't observant enough to read comments in your code
All the others are slower.
Just changing condition in the main loop from int < Double to int < int will noticeably increase the speed of your code
Does anyone have a better approach to this?
Hmm... In what way?
Simplicity? generateSequenceDoubleStream of #Evgeniy Khyst looks quite simple. And should be used... but maybe no, because of next two points
Precise? generateSequenceDoubleStream is not! But still can be saved with the pattern start + step*i.
And start + step*i pattern is precise. Only BigDouble and fixed-point arithmetic can beat it. But BigDoubles are slow, and manual fixed-point arithmetic is tedious and may be inappropriate for your data.
By the way, on the matters of precision, you can entertain yourself with this: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Speed... well now we are on shaky grounds.
Check out this repl https://repl.it/repls/RespectfulSufficientWorker
I do not have a decent test stand right now, so I used repl.it... which is totally inadequate for performance testing, but it's not the main point. The point is - there is no definite answer. Except that maybe in your case, which is not totally clear from you question, you definitely should not use BigDecimal (read further).
I've tried to play and optimize for big inputs. And your original code, with some minor changes - the fastest. But maybe you need enormous amounts of small Lists? Then that can be a totally different story.
This code is quite simple to my taste, and fast enough:
public static List<Double> genNoRoundDirectToDouble(double start, double end, double step) {
int len = (int)Math.ceil((end-start)/step) + 1;
var sequence = new ArrayList<Double>(len);
sequence.add(start);
for (int i=1 ; i < len ; ++i) sequence.add(start + step*i);
return sequence;
}
If you prefer a more elegant way (or we should call it idiomatic), I, personally, would suggest:
public static List<Double> gen_DoubleStream_presice(double start, double end, double step) {
return IntStream.range(0, (int)Math.ceil((end-start)/step) + 1)
.mapToDouble(i -> start + i * step)
.boxed()
.collect(Collectors.toList());
}
Anyway, possible performance boosts are:
Try switching from Double to double, and if you really need them, you can switch back again, judging by the tests, it still may be faster. (But don't trust my, try it yourself with your data in your environment. As I said - repl.it sucks for benchmarks)
A little magic: separate loop for Math.round()... maybe it has something to do with data locality. I do not recommend this - result is very unstable. But it's fun.
double[] sequence = new double[len];
for (int i=1; i < len; ++i) sequence[i] = start + step*i;
List<Double> list = new ArrayList<Double>(len);
list.add(start);
for (int i=1; i < len; ++i) list.add(Math.round(sequence[i])/mult);
return list;
You should definitely consider to be more lazy and generate numbers on demand without storing then in Lists
I suspect that in most general use cases the rounding overhead would be negligible.
If you suspect something - test it :-) My answer is "Yes", but again... don't believe me. Test it.
So, back to the main question: Is there an better way?
Yes, of course!
But it depends.
Choose BigDecimal if you need very big numbers and very small numbers. But if you cast them back to Double, and more than that, use it with numbers of "close" magnitude - no need for them! Checkout the same repl: https://repl.it/repls/RespectfulSufficientWorker - last test shows that there will be no difference in results, but a dig loss in speed.
Make some micro-optimizations based on your data properties, your task, and your environment.
Prefer short and simple code if there is not to much to gain from performance boost of 5-10%. Don't waist your time
Maybe use fixed-point arithmetic if you can and if it's worth it.
Other than that, you are fine.
PS. There's also a Kahan Summation Formula implementation in the repl... just for fun. https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1346 and it works - you can mitigate summation errors
Try this.
public static List<Double> generateSequenceRounded(double start, double end, double step) {
long mult = (long) Math.pow(10, BigDecimal.valueOf(step).scale());
return DoubleStream.iterate(start, d -> (double) Math.round(mult * (d + step)) / mult)
.limit((long) (1 + (end - start) / step)).boxed().collect(Collectors.toList());
}
Here,
int java.math.BigDecimal.scale()
Returns the scale of this BigDecimal. If zero or positive, the scale is the number of digits to the right ofthe decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. For example, a scale of -3 means the unscaled value is multiplied by 1000.
In main()
System.out.println(generateSequenceRounded(0.0, 102.0, 10.2));
System.out.println(generateSequenceRounded(0.0, 102.0, 10.24367));
And Output:
[0.0, 10.2, 20.4, 30.6, 40.8, 51.0, 61.2, 71.4, 81.6, 91.8, 102.0]
[0.0, 10.24367, 20.48734, 30.73101, 40.97468, 51.21835, 61.46202, 71.70569, 81.94936, 92.19303]
We are two students who want to use one-class svm for dectection of summary worthy sentences in text documents. We have already implemented sentence similarity functions for sentences, which we have used for another algorithm. We would now want to use the same functions as kernels for a one-class svm in libsvm for java.
We are using the PRECOMPUTED enum for the kernel_type field in our svm_parameter (param). In the x field of our svm_problem (prob) we have the kernel matrix on the form:
0:i 1:K(xi,x1) ... L:K(xi,xL)
where K(x,y) is the kernel value for the similarity of x and y, L is the number of sentences to compare and i is the current row index (0 to L).
The training of the kernel (svm.svm_train(prob, param)) seems to get sometimes get "stuck" in what seems like a infinite loop.
Have we missunderstood how to use the PRECOMPUTED enum, or does the problem lay elsewhere?
We solved this problem
It turns out that the "series numbers" in the first column needs to go from 1 to L, not 0 to L-1, which was our initial numbering. We found this out by inspecting the source in svm.java:
double kernel_function(int i, int j)
{
switch(kernel_type)
{
/* ... snip ...*/
case svm_parameter.PRECOMPUTED:
return x[i][(int)(x[j][0].value)].value;
/* ... snip ...*/
}
}
The reason for starting the numbering at 1 instead of 0, is that the first column of a row is used as column index when returning the value K(i,j).
Example
Consider this Java matrix:
double[][] K = new double[][] {
double[] { 1, 1.0, 0.1, 0.0, 0.2 },
double[] { 2, 0.5, 1.0, 0.1, 0.4 },
double[] { 3, 0.2, 0.3, 1.0, 0.7 },
double[] { 4, 0.6, 0.5, 0.5, 1.0 }
};
Now, libsvm needs the kernel value K(i,j) for say i=1 and j=3. The expression x[i][(int)(x[j][0].value)].value will break down to:
x[i] -> x[1] -> second row in K -> [2, 0.5, 1.0, 0.1, 0.4]
x[j][0] -> x[3][0] -> fourth row, first column -> 4
x[i][(int)(x[j][0].value)].value -> x[1][4] -> 0.4
This was a bit messy to realize at first, but changing the indexing solved our problem. Hopefully this might help someone else with similar problems.
I feel like there should be an available library to more simply do two things, A) Find the mode to an array, in the case of doubles and B) gracefully degrade the precision until you reach a particular frequency.
So imagine an array like this:
double[] a = {1.12, 1.15, 1.13, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4};
If I was looking for a frequency of 3 then it would go from 2 decimal positions to 1 decimal, and finally return 1.1 as my mode. If I had a frequency requirement of 4 it would return 4 as my mode.
I do have a set of code that is working the way I want, and returning what I am expecting, but I feel like there should be a more efficient way to accomplish this, or an existing library that would help me do the same. Attached is my code, I'd be interested in thoughts / comments on different approaches I should have taken....I have the iterations listed to limit how far the precision can degrade.
public static double findMode(double[] r, int frequencyReq)
{
double mode = 0d;
int frequency = 0;
int iterations = 4;
HashMap<Double, BigDecimal> counter = new HashMap<Double, BigDecimal>();
while(frequency < frequencyReq && iterations > 0){
String roundFormatString = "#.";
for(int j=0; j<iterations; j++){
roundFormatString += "#";
}
DecimalFormat roundFormat = new DecimalFormat(roundFormatString);
for(int i=0; i<r.length; i++){
double element = Double.valueOf(roundFormat.format(r[i]));
if(!counter.containsKey(element))
counter.put(element, new BigDecimal(0));
counter.put(element,counter.get(element).add(new BigDecimal(1)));
}
for(Double key : counter.keySet()){
if(counter.get(key).compareTo(new BigDecimal(frequency))>0){
mode = key;
frequency = counter.get(key).intValue();
log.debug("key: " + key + " Count: " + counter.get(key));
}
}
iterations--;
}
return mode;
}
Edit
Another way to rephrase the question, per Paulo's comment: the goal is to locate a number where in the neighborhood are at least frequency array elements, with the radius of the neighborhood being as small as possible.
Here a solution to the reformulated question:
The goal is to locate a number where in the neighborhood are at least frequency array elements, with the radius of the neighborhood being as small as possible.
(I took the freedom of switching the order of 1.15 and 1.13 in the input array.)
The basic idea is: We have the input already sorted (i.e. neighboring elements are consecutive), and we know how many elements we want in our neighborhood. So we loop once over this array, measuring the distance between the left element and the element frequency elements more to the right. Between them are frequency elements, so this forms a neighbourhood. Then we simply take the minimum such distance. (My method has a complicated way to return the results, you may want to do it better.)
This is not completely equivalent to your original question (does not work by fixed steps of digits), but maybe this is more what you really want :-)
You'll have to find a better way of formatting the results, though.
package de.fencing_game.paul.examples;
import java.util.Arrays;
/**
* searching of dense points in a distribution.
*
* Inspired by http://stackoverflow.com/questions/5329628/finding-a-mode-with-decreasing-precision.
*/
public class InpreciseMode {
/** our input data, should be sorted ascending. */
private double[] data;
public InpreciseMode(double ... data) {
this.data = data;
}
/**
* searchs the smallest neighbourhood (by diameter) which
* contains at least minSize elements.
*
* #return an array of two arrays:
* { { the middle point of the neighborhood,
* the diameter of the neighborhood },
* all the elements of the neigborhood }
*
* TODO: better return an object of a class encapsuling these.
*/
public double[][] findSmallNeighbourhood(int minSize) {
int currentLeft = -1;
int currentRight = -1;
double currentMinDiameter = Double.POSITIVE_INFINITY;
for(int i = 0; i + minSize-1 < data.length; i++) {
double diameter = data[i+minSize-1] - data[i];
if(diameter < currentMinDiameter) {
currentMinDiameter = diameter;
currentLeft = i;
currentRight = i + minSize-1;
}
}
return
new double[][] {
{
(data[currentRight] + data[currentLeft])/2.0,
currentMinDiameter
},
Arrays.copyOfRange(data, currentLeft, currentRight+1)
};
}
public void printSmallNeighbourhoods() {
for(int frequency = 2; frequency <= data.length; frequency++) {
double[][] found = findSmallNeighbourhood(frequency);
System.out.printf("There are %d elements in %f radius "+
"around %f:%n %s.%n",
frequency, found[0][1]/2, found[0][0],
Arrays.toString(found[1]));
}
}
public static void main(String[] params) {
InpreciseMode m =
new InpreciseMode(1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1,
4.2, 4.3, 4.4);
m.printSmallNeighbourhoods();
}
}
The output is
There are 2 elements in 0,005000 radius around 1,125000:
[1.12, 1.13].
There are 3 elements in 0,015000 radius around 1,135000:
[1.12, 1.13, 1.15].
There are 4 elements in 0,150000 radius around 4,250000:
[4.1, 4.2, 4.3, 4.4].
There are 5 elements in 0,450000 radius around 3,850000:
[3.4, 3.44, 4.1, 4.2, 4.3].
There are 6 elements in 0,500000 radius around 3,900000:
[3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
There are 7 elements in 1,200000 radius around 3,200000:
[2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
There are 8 elements in 1,540000 radius around 2,660000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2].
There are 9 elements in 1,590000 radius around 2,710000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3].
There are 10 elements in 1,640000 radius around 2,760000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
I think there's nothing wrong with your code and I doubt that you will find a library that does something so specific. But if still you want an idea to approach this problem using a more OOP approach that reuses Java collections, here it comes another approach:
Create a class to represent numbers with different number of decimals. It would have something like VariableDecimal(double d,int ndecimals) as constructor.
In that class override the object methods equals and hashCode. Your implementation of equals will test if two instances of VariableDecimal are the same taking into account the value d and the number of decimals. hashCode can simple return d*exp(10,ndecimals) casted to Integer.
In your logic use HashMaps so that they reuse your object:
HashMap<VariableDecimal, AtomicInteger> counters = new HashMap<VariableDecimal, AtomicInteger>();
for (double d : a) {
VariableDecimal vd = new VariableDecimal(d,ndecimals);
if (counters.get(vd)!=null)
counters.set(vd,new AtomicInteger(0));
counters.get(vd).incrementAndGet();
}
/* at the end of this loop counters should hold a map with frequencies of
each double for the selected precision so that you can simply traverse and
get the max */
This piece of code doesn't show the iteration to decrement the number of decimals, which is trivial.