Just noticed that Python and JavaScript have exact comparison.
Example in Python:
>>> 2**1023+1 > 8.98846567431158E307
True
>>> 2**1023-1 < 8.98846567431158E307
True
And JavaScript:
> 2n**1023n+1n > 8.98846567431158E307
true
> 2n**1023n-1n < 8.98846567431158E307
true
Anything similar available for Java, except converting both arguments to
BigDecimal?
Preliminary answer, i.e. verbal solution sketch:
I am skeptical about a solution that would convert
to BigDecimal, since this conversion results in
a shift of the base from base=2 to base=10. As soon
as the exponent of the Java double floating point value
is different from the binary precision, this leads to additional digits and lengthy
pow() operations, which one can verify by inspecting
some open source BigDecimal(double)
constructor implementation.
One can get the mantissa via Double.doubleToRawLongBits(d).
If the Java double floating point value is not a sub-normal
all that needs to be done is (raw & DOUBLE_SNIF_MASK) +
(DOUBLE_SNIF_MASK+1) where 0x000fffffffffffffL This means
the integer Java primitive type long should be enough to
carry the mantissa. The challenge is now to perform a comparison
taking the exponent of the float also into account.
But I must admit I didn't have time yet to work
out some Java code. I have also in mind some
optimizations using bigLength() of the other
argument, which is in this setting a BigInteger.
The use of bitLength() would speed up the comparison.
A simple heuristic can implement a fast path, so that
the mantissa can be ignored. Already the exponent
of the double and the bitLength() of the BigInteger
give enough information for a comparison result.
As soon as I have time and a prototype running, I might
publish some Java code fragment here. But maybe somebody
faced the problem already. My general hypothesis is that a fast or
even ultra fast routine is possible. But I didn't have
much time to search the internet and to find an
implementation, thats why I defered the problem to
stack overflow, maybe somebody else had the same
problem as well and/or might point to a complete solution?
Related
Why the inconsistency?
There is no inconsistency: the methods are simply designed to follow different specifications.
long round(double a)
Returns the closest long to the argument.
double floor(double a)
Returns the largest (closest to positive infinity) double value that is less than or equal to the argument and is equal to a mathematical integer.
Compare with double ceil(double a)
double rint(double a)
Returns the double value that is closest in value to the argument and is equal to a mathematical integer
So by design round rounds to a long and rint rounds to a double. This has always been the case since JDK 1.0.
Other methods were added in JDK 1.2 (e.g. toRadians, toDegrees); others were added in 1.5 (e.g. log10, ulp, signum, etc), and yet some more were added in 1.6 (e.g. copySign, getExponent, nextUp, etc) (look for the Since: metadata in the documentation); but round and rint have always had each other the way they are now since the beginning.
Arguably, perhaps instead of long round and double rint, it'd be more "consistent" to name them double round and long rlong, but this is argumentative. That said, if you insist on categorically calling this an "inconsistency", then the reason may be as unsatisfying as "because it's inevitable".
Here's a quote from Effective Java 2nd Edition, Item 40: Design method signatures carefully:
When in doubt, look to the Java library APIs for guidance. While there are plenty of inconsistencies -- inevitable, given the size and scope of these libraries -- there are also fair amount of consensus.
Distantly related questions
Why does int num = Integer.getInteger("123") throw NullPointerException?
Most awkward/misleading method in Java Base API ?
Most Astonishing Violation of the Principle of Least Astonishment
floor would have been chosen to match the standard c routine in math.h (rint, mentioned in another answer, is also present in that library, and returns a double, as in java).
but round was not a standard function in c at that time (it's not mentioned in C89 - c identifiers and standards; c99 does define round and it returns a double, as you would expect). it's normal for language designers to "borrow" ideas, so maybe it comes from some other language? fortran 77 doesn't have a function of that name and i am not sure what else would have been used back then as a reference. perhaps vb - that does have Round but, unfortunately for this theory, it returns a double (php too). interestingly, perl deliberately avoids defining round.
[update: hmmm. looks like smalltalk returns integers. i don't know enough about smalltalk to know if that is correct and/or general, and the method is called rounded, but it might be the source. smalltalk did influence java in some ways (although more conceptually than in details).]
if it's not smalltalk, then we're left with the hypothesis that someone simply chose poorly (given the implicit conversions possible in java it seems to me that returning a double would have been more useful, since then it can be used both while converting types and when doing floating point calculations).
in other words: functions common to java and c tend to be consistent with the c library standard at the time; the rest seem to be arbitrary, but this particular wrinkle may have come from smalltalk.
I agree, that it is odd that Math.round(double) returns long. If large double values are cast to long (which is what Math.round implicitly does), Long.MAX_VALUE is returned. An alternative is using Math.rint() in order to avoid that. However, Math.rint() has a somewhat strange rounding behavior: ties are settled by rounding to the even integer, i.e. 4.5 is rounded down to 4.0 but 5.5 is rounded up to 6.0). Another alternative is to use Math.floor(x+0.5). But be aware that 1.5 is rounded to 2 while -1.5 is rounded to -1, not -2. Yet another alternative is to use Math.round, but only if the number is in the range between Long.MIN_VALUE and Long.MAX_VALUE. Double precision floating point values outside this range are integers anyhow.
Unfortunately, why Math.round() returns long is unknown. Somebody made that decision, and he probably never gave an interview to tell us why. My guess is, that Math.round was designed to provide a better way (i.e., with rounding) for converting doubles to longs.
Like everyone else here I also don't know the answer, but thought someone might find this useful. I noticed that if you want to round a double to an int without casting, you can use the two round implementations long round(double) and int round(float) together:
double d = something;
int i = Math.round(Math.round(d));
Calling a * pow(10.0, n), in any C derived language, results in two rounding errors, but I require only one rounding error as can be provided by a single fused instruction.
For example a=39762108874335653 n=-297 should give 3.9762108874335653E-281 (as can be validated by parsing the string "3.9762108874335653E-281" or using arbitrary precision arithmetic, and outputting a floating point number) but the quoted code returns 3.976210887433566E-281.
That leads back to the question: is there a fused instruction (or C function) that can perform this calculation to the highest possible precision for the machine?
I am only concerned with the case where a and n are integers, and the result is a double precision floating point number. Note that a may not be exactly representable as a floating point number.
Converting to a string and parsing is not an acceptable solution (it's far too slow), nor is using an arbitrary precision library (such as Java's BigDecimal, also because it's too slow).
Note that there is a similar C stdlib function, ldexp (related to scalbn) which uses base2 and therefore calculates a * 2^n as a single fused instruction.
I am aware of the instructions listed in https://en.wikipedia.org/wiki/C_mathematical_functions
Not a dupe of Is floating point math broken? this is not about trying to understand how rounding works. It is just asking: does this function exist anywhere, ideally in a CPU fused instruction for maximum performance? Obviously the workaround in the absense of such a thing is to use arbitrary precision, but that is quite heavyweight.
The answer is "no" there is not a single instruction that can perform this operation, as answered by #eric-postpischil in chat and the evidence for this can be backed up by the fact that standard libraries would use it if it were available, e.g. glibc strtod
However, although there is no single CPU instruction, it is possible to extract the strtod implementation to operate on the integer input as requested in this question, as was done by Alex Huszagh in Rust. The algorithms discussed here effectively use a * pow(10.0, n) when the significand is estimated to be small enough (preferring division for negative n) and falling back to converge algorithms to eliminate the rounding errors.
I'm trying to create a physical calculation program in Java. Therefore I used some formulas, but they always returned a wrong value. I split them and and found: (I used long so far.)
8 * 830584000 = -1945262592
which is obviously wrong. There are fractions and very high numbers in the formulas, such as 6.095E23 and 4.218E-10 for example.
So what datatype would fit best to get a correct result?
Unless you have a very good reason not to, double is the best type for physical calculations. It was good enough for the wormhole modelling in the film Interstellar so, dare I say it, is probably good enough for you. Note well though that, as a rough guide, it only gives you only 15 decimal significant figures of precision.
But you need to help the Java compiler:
Write 8.0 * 830584000 for that expression to be evaluated in double precision. 8.0 is a double literal and causes the other arguments to be promoted to a similar type.
Currently you are using integer arithmetic, and are observing wrap-around effects.
Reference: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
If you need perfect accuracy for large decimal numbers, BigDecimal is the way to go, though it will be at the cost of performance. If you know you numbers are not that large, you can use long instead which should be faster, but has a much more limited range and will require you to convert from and to decimal numbers.
As physics calculations involves a lot of floating point operations, float data type can be a good option in such calculations. I Hope it will help. :)
So long story short I've spent the last 2-3 days trouble shooting a bug in my graphing calculator that arose when I implemented a new window resize listener. The bug was when I resized my window slowly the functions wouldn't be transformed properly and if I moved it fast they would transform just fine. I looked at all of my formulas and algorithms and everything was spot on (the same as it was with my previous window resizing method). Whenever there was a change in width I'd take the difference it changed, divide it by 2 and move the graphs by that amount. Really simple.
double changeX = (newCanvasWidth - canvasWidth)/2;
double changeY = (newCanvasHeight - canvasHeight)/2;
This had worked fine and made all of the logical sense I needed to ignore this as the culprit for nearly 3 days. It was so innocent that I almost rewrote my entire program to try and fix this issue ranging from compensation algorithms to all new methods to predict these errors and fix them. It was becoming a nightmare and was incredibly annoying. Before giving up complete hope I decided to investigate the problem once again using a thorough trace of every single calculation related to this and outputting the results of these calculations and I found something odd. Whenever the difference between (newCanvasWidth - canvasWidth) was odd I was not getting the half at the end of the integer.
So if the difference between them was say 15, changeX would reflect 7. Most troubling is when the difference was 1, changeX would be 0.
Upon discovering this I of course tried the obvious thing and type casted the subtraction.
double changeX = (double)(newCanvasWidth - canvasWidth)/2;
double changeY = (double)(newCanvasHeight - canvasHeight)/2;
And lo and behold my issue was solved!
What I don't understand though is why this didn't happen automatically. Also if this is just something I'm going to have to make accommodations for all of the time where is the limit? Is there anyway to know when you're going to need to type cast simple calculations like this?
Java doesn't automatically expand integral expressions to floating-point because it's very expensive computationally to do so, and because you can lose precision. Yes, if you have an integral value that you want divided into a non-integral quotient, you'll always need to tell Java (and C/C++) that. The Java Language Specification has comprehensive rules about what type of value a math expression is.
A shortcut when using a numeric literal like this is to make the literal a floating-point type:
double changeX = (newCanvasWidth - canvasWidth) / 2.0;
It wasn't happening automatically because the calculation on the right-hand side (RHS) of the assignment, i.e. (newCanvasHeight - canvasHeight)/2 takes place as a separate operation before the assignment to changeY. Since all terms on the RHS are integers, the result is an integer with the decimal part truncated (not rounded), which is then stored as a double (so instead of 7.5 you get 7, which is stored as 7.0). Since you were using a constant term on the RHS, you could make it a double (like #Clown suggested), thereby making the result of the calculation a double before it is stored. If all terms on the RHS were variables, however, then you would cast.
So, yes, there is a way to know when you need to cast (or otherwise convert) in situations like these: when the most precise term of the RHS of the assignment is less precise than the LHS.
Because newCanvasWidth and canvasWidth is declared as int's you don't automatically get a decimal result when dividing with another whole number. If you don't want to cast you should have been using 2.0. Integer division in Java always discard decimals, unless you tell it otherwise.
If the result has a chance of becoming a decimal number, for example when using division, you should always make sure the result is a double. Here you would cast when necessary. But generally speaking, you use type casting a lot more often in different contexts, such as going from Object to something else, or, as an even better example, when going from a View to a TextView in Android.
I have converted a relatively simple algorithm that performs a large number of calculations on numbers of the double type from C++ to Java, however running the algorithm on the two platforms but the same machine produces slightly different results. The algorithm multiplies and sums lots of doubles and ints. I am casting ints to double in the Java algorithm; the C algorithm does not cast.
For example, on one run I get the results:
(Java) 64684970
(C++) 65296408
(Printed to ignore decimal places)
Naturally, there may be an error in my algorithm, however before I start spending time debugging, is it possible that the difference could be explained by different floating point handling in C++ and Java? If so, can I prove that this is the problem?
Update - the place where the types differ is a multiplication between two ints that is then added to a running total double.
Having modified the C code, currently in both:
mydouble += (double)int1 * (double)int2
You could add rounding to each algorithm using the same precision. This would allow each calculation to handle the data the same way. If you use this, it would eliminate the algorithm being the problem as the data would be using the same precision at each step of the equation for both the C++ and Java versions
AFAIK there are times when the value of a double literal could change between two c++ compiler versions (when the algorithm used to convert the source to the next best double value changed).
Also on some cpus floating point registers are larger than 64/32bit (greater range and precision), and how that influences the result depends on how the compiler and JIT move values in and out of these registers - this is likely to differ between java and c++.
Java has the strictftp keyword to ensure that only 64/32 bit precision is used, however that comes with a run-time cost. There are also a large number of options to influence how c++ compilers treat and optimize floating point computations by throwing out guarantess/rules made by the IEEE standard.
if the algorithm is mostly the same then you could check where the first difference for the same input appears.
In Java, double is a 64-bit floating-point number.
In C++, double is a floating-point number guaranteed to have at least 32-bit precision.
To find out the actual size, in bytes, of a C++ double on your system, use sizeof(double).
If it turns out that sizeof(double) == 8, it is almost certain that the difference is due to an error in the translation from one language to another, rather than differences in the handling of floating-point numbers.
(Technically, the size of a byte is platform-dependant, but most modern architectures use 8-bit bytes.)