Ordering operation to maximize double precision - java

I'm working on some tool that gets to compute numbers that can get close to 1e-25 in the worst cases, and compare them together, in Java. I'm obviously using double precision.
I have read in another answer that I shouldn't expect more than 1e-15 to 1e-17 precision, and this other question deals with getting better precision when ordering operations in a "better" order.
Which double precision operations are more keen to loose precision along the way? Should I try to work with number as big as possible or as small as possible? Do divisions first before multiplications?
I'd rather not use the BigDecimal classes or equivalent, as the code is already slow enough ;) (unless they don't impact speed too much, of course).
Any information will be greatly appreciated!
EDIT: The fact that numbers are "small" in absolute value (1e-25) does not matter, as double can go down to 1e-324. But what matters is that, when they are very similar (both in 1e-25), I have to compare, let's say 4.64563824048517606458e-21 to 4.64563824048517606472e-21 (difference is the 19th and 20th digits). When computing these numbers, the difference is so small that I might hit the "rounding error", where remainder is filled with random numbers.
The question is: "how to order computation so that this loss of precision is minimized?". It might be doing divisions before multiplications, or additions first.

If it is important to get the correct answer, you should use BigDecimal. It is slower than double, but for most cases it is fast enough. I can't think of a lot of cases where you do a lot of calculations with such small numbers where it does not matter if the answer is correct - at least with Java.
If this is a super performance sensitive application, I would consider using a different language.

Thanks to #John for pointing out a very complete article about floating point arithmetics.
It turns out that, when precision is needed, operations should be re-ordered, and formulas adapted to avoid loss of precision, as explained in the Cancellation chapter: when comparing numbers that are very close to each other (which is my case), "catastrophic cancellation" may occur, inducing a huge loss of precision. Often, re-writing the formula, or re-ordering operations according to your à-priori knowledge of the operands values can lead to achieving greater accuracy in calculus.
What I'll remember from this article is:
be careful when substracting two nearly-identical quantities
try to re-arrange operations to avoid catastrophic cancellation
For the latter case, remember that computing (x - y) * (x + y) gives more accurate results than x * x - y * y.

Related

is there a fused instruction that can perform a * 10^n in a single instruction?

Calling a * pow(10.0, n), in any C derived language, results in two rounding errors, but I require only one rounding error as can be provided by a single fused instruction.
For example a=39762108874335653 n=-297 should give 3.9762108874335653E-281 (as can be validated by parsing the string "3.9762108874335653E-281" or using arbitrary precision arithmetic, and outputting a floating point number) but the quoted code returns 3.976210887433566E-281.
That leads back to the question: is there a fused instruction (or C function) that can perform this calculation to the highest possible precision for the machine?
I am only concerned with the case where a and n are integers, and the result is a double precision floating point number. Note that a may not be exactly representable as a floating point number.
Converting to a string and parsing is not an acceptable solution (it's far too slow), nor is using an arbitrary precision library (such as Java's BigDecimal, also because it's too slow).
Note that there is a similar C stdlib function, ldexp (related to scalbn) which uses base2 and therefore calculates a * 2^n as a single fused instruction.
I am aware of the instructions listed in https://en.wikipedia.org/wiki/C_mathematical_functions
Not a dupe of Is floating point math broken? this is not about trying to understand how rounding works. It is just asking: does this function exist anywhere, ideally in a CPU fused instruction for maximum performance? Obviously the workaround in the absense of such a thing is to use arbitrary precision, but that is quite heavyweight.
The answer is "no" there is not a single instruction that can perform this operation, as answered by #eric-postpischil in chat and the evidence for this can be backed up by the fact that standard libraries would use it if it were available, e.g. glibc strtod
However, although there is no single CPU instruction, it is possible to extract the strtod implementation to operate on the integer input as requested in this question, as was done by Alex Huszagh in Rust. The algorithms discussed here effectively use a * pow(10.0, n) when the significand is estimated to be small enough (preferring division for negative n) and falling back to converge algorithms to eliminate the rounding errors.

Java - Which data type for physical calculations?

I'm trying to create a physical calculation program in Java. Therefore I used some formulas, but they always returned a wrong value. I split them and and found: (I used long so far.)
8 * 830584000 = -1945262592
which is obviously wrong. There are fractions and very high numbers in the formulas, such as 6.095E23 and 4.218E-10 for example.
So what datatype would fit best to get a correct result?
Unless you have a very good reason not to, double is the best type for physical calculations. It was good enough for the wormhole modelling in the film Interstellar so, dare I say it, is probably good enough for you. Note well though that, as a rough guide, it only gives you only 15 decimal significant figures of precision.
But you need to help the Java compiler:
Write 8.0 * 830584000 for that expression to be evaluated in double precision. 8.0 is a double literal and causes the other arguments to be promoted to a similar type.
Currently you are using integer arithmetic, and are observing wrap-around effects.
Reference: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
If you need perfect accuracy for large decimal numbers, BigDecimal is the way to go, though it will be at the cost of performance. If you know you numbers are not that large, you can use long instead which should be faster, but has a much more limited range and will require you to convert from and to decimal numbers.
As physics calculations involves a lot of floating point operations, float data type can be a good option in such calculations. I Hope it will help. :)

Algorithm speed with double and int?

how do algorithms with double compete compared to int values? Is there much difference, or is it neglectable?
In my case, I have a canvas that uses Integers so far. But now as I'm implementing a scaling, I'm probably going to switch everything to Double. Would this have a big impact on calculations?
If so, would maybe rounding doubles to only a few fractions optimize performance?
Or am I totally on the path of over-optimization and should just use doubles without any headache?
You're in GWT, so ultimately your code will be JavaScript, and JavaScript has a single type for numeric data: Number, which corresponds to Java's Double.
Using integers in GWT can either mean (I have no idea what the GWT compiler exactly does, it might also be dependent on context, such as crossing JSNI boundaries) that the generated code is doing more work than with doubles (doing a narrowing conversion of numbers to integer values), or that the code won't change at all.
All in all, expect the same or slightly better performance using doubles (unless you have to later do conversions to integers, of course); but generally speaking you're over-optimizing (also: optimization needs measurement/metrics; if you don't have them, you're on the “premature optimization” path)
There is a sizable difference between integers and doubles, however generally doubles are also very fast.
The difference is that integers are still faster than doubles, because it takes very few clock cycles to do arithmetic operations on integers.
Doubles are also fast because they are generally natively supported by a floating-point unit, which means that it is calculated by dedicated hardware. Unfortunately, it generally is usually 2x to many 40x slower.
Having said this, the CPU will usually spend quite a bit of time on housekeeping like loops and function calls, so if it is fast enough with integers, most of the time (perhaps even 99% of the time), it will be fast enough with doubles.
The only time floating point numbers are orders of magnitude slower is when they must be emulated, because there is no hardware support. This generally only occurs on embedded platforms, or where uncommon floating point types are used (eg. 128-bit floats, or decimal floats).
The result of some benchmarks can be found at:
http://pastebin.com/Kx8WGUfg
https://stackoverflow.com/a/2550851/1578925
but generally,
32-bit platforms have a greater disparity between doubles and integers
integers are always at least twice as fast on adding and subtracting
If you are going to change the type integer to double in your program you must also have to rewrite those lines of code that comparing two integers. Like a and b are two integers and you campare if ( a == b) so after changing a, b type to double you also have to change this line and have to use compare method of the double.
Not knowing the exact needs of your program, my instinct is that you're over-optimizing. When choosing between using ints or doubles, you usually base the decision on what type of value you need over which will run faster. If you need floating point values that allow for (not necessarily precise) decimal values, go for doubles. If you need precise integer values, go for ints.
A couple more points:
Rounding your doubles to certain fractions should have no impact on performance. In fact, the overhead required to round them in the first place would probably have a negative impact.
While I would argue not to worry about the performance differences between int and double, there is a significant difference between int and Integer. While an int is a primitive data type that can be used efficiently, an Integer is an object that essentially just holds an int. This incurs a significant overhead. Integers are useful in that they can be stored in collections like Vectors while ints cannot, but in all other cases its best to use ints.
In general maths that naturally fits as an integer will be faster than maths that naturally fits as a double, BUT trying to force double maths to work as an integer is almost always slower, moving back and forth between the two costs more than the speed boost you get.
If you're considering something like:
I only want 1 decimal places within my 'quazi integer float' so i'll just multiply everything by 10;
5.5*6.5
so 5.5 --> 55 and
so 6.5 --> 65
with a special multiplying function
public int specialIntegerMultiply(int a, int b){
return a*b/10;
}
Then for the love of god don't, it'll probably be slower with all the extra overhead and it'll be really confusing to write.
p.s. rounding the doubles will make no difference at all as the remaining decimal places will still exist, they'll just all be 0 (in decimal that is, in binary that won't even be true).

In Java currency representation, what are the pros and cons of using a long/short mantissa/exponent pattern over a double/epsilon pattern

Let me begin this question by stating that for the type of high performance application we are developing, BigDecimal is unacceptably slow. This cannot be compromised on.
In our domain, we will be representing values up to around 100,000,000 with varying levels of precision (in the most esoteric cases we have found so far, this might be six decimal places).
Given that, I see two ways of representing currency information at an arbitrary precision. The first is to follow a pattern similar to that described in JSR-354 where a long represents the mantissa of a value, and a short (or an int) represents the exponent. In this case, a value of 12345.6789 would be internally represented as
long mantissa = 123456789L;
short exponent = -4;
With this, we can represent 18 figures at any precision we choose (9223372036854775807 being 19 figures)
The second is the use a double to represent the value, and use an epsilon to round away any error introduced by performing calculations on floating point numbers. Based on my understanding of What Every Computer Scientist Should Know About Floating-Point Arithmetic and some experimentation, I believe that we can represent 17 figures at any precision chosen. If we use a fixed epsilon, we can represent values up to 99999999999.999999 at our expected requirement of six maximum decimal places, with our epsilon able to round away any error introduced.
I'm not sure that either of these patterns can be considered "best" for the domain we are working in.
A long/short pattern requires us to implement some position shifting logic if we need to perform operations on two values with different precision (this will be required). I think, but haven't confirmed, that this will make it slower than using double/epsilon for certain operations. On the other hand, using a double/epsilon introduces a small overhead on every calculation to perform the rounding.
Both can be extended to give a larger number of digits if required - JSR-354 mentions a long/long/int implementation which gives up to 37 digits at arbitrary precision. This paper describes a C++ implementation of double-double and quad-double types.
I've been unable to find any discussion over the advantages/disadvantages of one of the other which hasn't immediately descended into "Never Use Floating Point For Currency" without any particular justification - a mantra I agree with if performance is not a primary concern, but in this case, I'm less sure.
I'm not sure that either of these patterns can be considered "best" for the domain we are working in.
Well you haven't mentioned what the domain is so it is hard to comment on that.
However a lot of finance-related systems are governed by the general rules of accounting. These are clear on how financial calculations should be performed, and floating point is not acceptable.
If your domain is covered by the rules of accounting, then using floating point is really not an option.
If not, then you need to do a mathematical analysis to determine if the imprecision that is inherent in floating point is going to make a difference to the results of your computations. Simply using a wider floating point type is not a good answer ... unless you've "done the math" to place error bounds on the result of the computation.
Or avoid the analysis and just use a scaled long or BigDecimal. (But note that with scaled long you need to consider the issue of overflow / underflow.)
If you have "done the math" and you are not restricted by the rules of accounting (or some such) then:
Floating point types will be easier to use, cos you don't have to mess around with a scale factor when you do arithmetic. (Even if the scale factor is implicit, it needs to be taken into account ...)
Floating point types will "just work" with standard library methods; e.g. the Math static methods.
Floating point code will be more readable, and less prone to programming logic errors.
Floating point types are inexact ... but if you've done the math correctly, that doesn't matter. (A complete mathematical analysis will tell you if the errors build up during a calculation to the point where they cause the result to be inaccurate.)
Performance ... hard to say. It probably depends on the actual calculations. My advice would be to code the critical calculations both ways and carefully benchmark them.
I would suggest using a long and an int, with the long representing the number of whole units, and the int representing the number of billionths. Such a representation should be fairly easy to work with (especially since the result of adding the billionths parts of two numbers can't overflow) and vastly outperform any kind of floating-point representation fancier than double.

What is the right data type for calculations in Java

Should we use double or BigDecimal for calculations in Java?
How much is the overhead in terms of performance for BigDecimal as compared to double?
For a serious financial application BigDecimal is a must.
Depends on how many digits you need you can go with a long and a decimal factor for visualization.
For general floating point calculations, you should use double. If you are absolutely sure that you really do need arbitrary precision arithmetic (most applications don't), then you can consider BigDecimal.
You will find that double will significantly outperform BigDecimal (not to mention being easier to work with) for any application where double is sufficient precision.
Update: You commented on another answer that you want to use this for a finance related application. This is one of the areas where you actually should consider using BigDecimal, otherwise you may get unexpected rounding effects from double calculations. Also, double values have limited precision, and you won't be able to accurately keep track of pennies at the same time as millions of dollars.
How much is the overhead in terms of performance for BigDecimal as compared to double?
A lot. For example, a multiplication of two doubles is a single machine instruction. Multiplying two BigDecimals is probably a minimum of 50 machine instructions, and has complexity of O(N * M) where M and N are the number of bytes used to represent the two numbers.
However, if your application requires the calculation to be "decimally correct", then you need to accept the overhead.
However (#2) ... even BigDecimal can't do this calculation with real number accuracy:
1/3 + 1/3 + 1/3 -> ?
To do that computation precisely you would need to implement a Rational type; i.e. a pair of BigInteger values ... and some thing to reduce the common factors.
However (#3) ... even a hypothetical Rational type won't give you a precise numeric representation for (say) Pi.
As always: it depends.
If you need the precision (even for "small" numbers, when representing amounts for example) go with BigDecimal.
In some scientific applications, double may be a better choice.
Even in finance we can't answer without knowing what area. For instance if you were doing currency conversions of $billions, where the conversion rate could be to 5 d.p. you might have problems with double. Whereas for simply adding and subtracting balances you'd be fine.
If you don't need to work in fractions of a cent/penny, maybe an integral type might be more appropriate, again it depends on the size of numbers involved.

Categories