I need theoretical answer using mathematical approach. Is there any mathematical difference between them and how do they make impact in programming languages.
According to Java terminology, the question of "Truncated Division" vs "Floored division" is best answered by the javadoc of RoundingMode:
DOWN
Rounding mode to round towards zero. Never increments the digit prior to a discarded fraction (i.e., truncates). Note that this rounding mode never increases the magnitude of the calculated value.
FLOOR
Rounding mode to round towards negative infinity. If the result is positive, behave as for RoundingMode.DOWN; if negative, behave as for RoundingMode.UP. Note that this rounding mode never increases the calculated value.
The Java division operator is defined by JLS §15.17.2. Division Operator /:
Integer division rounds toward 0.
That is why 5 / -3 result in -1.
You can also look at the definition of "Truncate" vs "Floor" on Wikipedia:
There are many ways of rounding a number y to an integer q The most common ones are:
round down (or take the floor, or round towards minus infinity): q is the largest integer that does not exceed y.
round up (or take the ceiling, or round towards plus infinity): q is the smallest integer that is not less than y.
round towards zero (or truncate, or round away from infinity): q is the integer part of y, without its fraction digits.
round away from zero (or round towards infinity): if y is an integer, q is y; else q is the integer that is closest to 0 and is such that y is between 0 and q.
round to nearest: q is the integer that is closest to y (see below for tie-breaking rules).
As you can see, Java and Wikipedia agrees on this definition:
Truncate: Round towards zero
Floor: Round towards minus/negative infinity
Note that Java and Wikipedia disagrees on what Round Down means.
You're dividing integers, so the result is going to be rounded to the nearest integer. Use floating point variables instead.
For example, the second part of your question, 1.0/2.0 = 0.5.
Related
For example,
int result;
result = 125/100;
or
result = 43/100;
Will result always be the floor of the division? What is the defined behavior?
Will result always be the floor of the division? What is the defined behavior?
Not quite. It rounds toward 0, rather than flooring.
6.5.5 Multiplicative operators
6 When integers are divided, the result of the / operator is the algebraic quotient with any
fractional part discarded.88) If the quotient a/b is representable, the expression
(a/b)*b + a%b shall equal a.
and the corresponding footnote:
This is often called ‘‘truncation toward zero’’.
Of course two points to note are:
3 The usual arithmetic conversions are performed on the operands.
and:
5 The result of the / operator is the
quotient from the division of the
first operand by the second; the
result of the % operator is the
remainder. In both operations, if the
value of the second operand is zero,
the behavior is undefined.
[Note: Emphasis mine]
Dirkgently gives an excellent description of integer division in C99, but you should also know that in C89 integer division with a negative operand has an implementation-defined direction.
From the ANSI C draft (3.3.5):
If either operand is negative, whether the result of the / operator is the largest integer less than the algebraic quotient or the smallest integer greater than the algebraic quotient is implementation-defined, as is the sign of the result of the % operator. If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a.
So watch out with negative numbers when you are stuck with a C89 compiler.
It's a fun fact that C99 chose truncation towards zero because that was how FORTRAN did it. See this message on comp.std.c.
Yes, the result is always truncated towards zero. It will round towards the smallest absolute value.
-5 / 2 = -2
5 / 2 = 2
For unsigned and non-negative signed values, this is the same as floor (rounding towards -Infinity).
Where the result is negative, C truncates towards 0 rather than flooring - I learnt this reading about why Python integer division always floors here: Why Python's Integer Division Floors
Will result always be the floor of the division?
No. The result varies, but variation happens only for negative values.
What is the defined behavior?
To make it clear floor rounds towards negative infinity,while integer division rounds towards zero (truncates)
For positive values they are the same
int integerDivisionResultPositive= 125/100;//= 1
double flooringResultPositive= floor(125.0/100.0);//=1.0
For negative value this is different
int integerDivisionResultNegative= -125/100;//=-1
double flooringResultNegative= floor(-125.0/100.0);//=-2.0
I know people have answered your question but in layman terms:
5 / 2 = 2 //since both 5 and 2 are integers and integers division always truncates decimals
5.0 / 2 or 5 / 2.0 or 5.0 /2.0 = 2.5 //here either 5 or 2 or both has decimal hence the quotient you will get will be in decimal.
I've written an arbitrary precision rational number class that needs to provide a way to convert to floating-point. This can be done straightforwardly via BigDecimal:
return new BigDecimal(num).divide(new BigDecimal(den), 17, RoundingMode.HALF_EVEN).doubleValue();
but this requires a value for the scale parameter when dividing the decimal numbers. I picked 17 as the initial guess because that is approximately the precision of a double precision floating point number, but I don't know whether that's actually correct.
What would be the correct number to use, defined as, the smallest number such that making it any larger would not make the answer any more accurate?
Introduction
No finite precision suffices.
The problem posed in the question is equivalent to:
What precision p guarantees that converting any rational number x to p decimal digits and then to floating-point yields the floating-point number nearest x (or, in case of a tie, either of the two nearest x)?
To see this is equivalent, observe that the BigDecimal divide shown in the question returns num/div to a selected number of decimal places. The question then asks whether increasing that number of decimal places could increase the accuracy of the result. Clearly, if there is a floating-point number nearer x than the result, then the accuracy could be improved. Thus, we are asking how many decimal places are needed to guarantee the closest floating-point number (or one of the tied two) is obtained.
Since BigDecimal offers a choice of rounding methods, I will consider whether any of them suffices. For the conversion to floating-point, I presume round-to-nearest-ties-to-even is used (which BigDecimal appears to use when converting to Double or Float). I give a proof using the IEEE-754 binary64 format, which Java uses for Double, but the proof applies to any binary floating-point format by changing the 252 used below to 2w-1, where w is the number of bits in the significand.
Proof
One of the parameters to a BigDecimal division is the rounding method. Java’s BigDecimal has several rounding methods. We only need to consider three, ROUND_UP, ROUND_HALF_UP, and ROUND_HALF_EVEN. Arguments for the others are analogous to those below, by using various symmetries.
In the following, suppose we convert to decimal using any large precision p. That is, p is the number of decimal digits in the result of the conversion.
Let m be the rational number 252+1+½−10−p. The two binary64 numbers neighboring m are 252+1 and 252+2. m is closer to the first one, so that is the result we require from converting m first to decimal and then to floating-point.
In decimal, m is 4503599627370497.4999…, where there are p−1 trailing 9s. When rounded to p significant digits with ROUND_UP, ROUND_HALF_UP, or ROUND_HALF_EVEN, the result is 4503599627370497.5 = 252+1+½. (Recognize that, at the position where rounding occurs, there are 16 trailing 9s being discarded, effectively a fraction of .9999999999999999 relative to the rounding position. In ROUND_UP, any non-zero discarded amount causes rounding up. In ROUND_HALF_UP and ROUND_HALF_EVEN, a discarded amount greater than ½ at that position causes rounding up.)
252+1+½ is equally close to the neighboring binary64 numbers 252+1 and 252+2, so the round-to-nearest-ties-to-even method produces 252+2.
Thus, the result is 252+2, which is not the binary64 value closest to m.
Therefore, no finite precision p suffices to round all rational numbers correctly.
I am writing tests for code performing calculations on floating point numbers. Quite expectedly, the results are rarely exact and I would like to set a tolerance between the calculated and expected result. I have verified that in practice, with double precision, the results are always correct after rounding of last two significant decimals, but usually after rounding the last decimal. I am aware of the format in which doubles and floats are stored, as well as the two main methods of rounding (precise via BigDecimal and faster via multiplication, math.round and division). As the mantissa is stored in binary however, is there a way to perform rounding using base 2 rather than 10?
Just clearing the last 3 bits almost always yields equal results, but if I could push it and instead 'add 2' to the mantissa if its second least significast bit is set, I could probably reach the limit of accuracy. This would be easy enough, expect I have no idea how to handle overflow (when all bits 52-1 are set).
A Java solution would be preferred, but I could probably port one for another language if I understood it.
EDIT:
As part of the problem was that my code was generic with regards to arithmetic (relying on scala.Numeric type class), what I did was an incorporation of rounding suggested in the answer into a new numeric type, which carried the calculated number (floating point in this case) and rounding error, essentially representing a range instead of a point. I then overrode equals so that two numbers are equal if their error ranges overlap (and they share arithmetic, i.e. the number type).
Yes, rounding off binary digits makes more sense than going through BigDecimal and can be implemented very efficiently if you are not worried about being within a small factor of Double.MAX_VALUE.
You can round a floating-point double value x with the following sequence in Java (untested):
double t = 9 * x; // beware: this overflows if x is too close to Double.MAX_VALUE
double y = x - t + t;
After this sequence, y should contain the rounded value. Adjust the distance between the two set bits in the constant 9 in order to adjust the number of bits that are rounded off. The value 3 rounds off one bit. The value 5 rounds off two bits. The value 17 rounds off four bits, and so on.
This sequence of instruction is attributed to Veltkamp and is typically used in “Dekker multiplication”. This page has some references.
When I perform simple math in java with doubles and other number data types, the double values seem to randomly vary a bit from the supposed result, which might be 5,59999999997 or 6,0000000002 or something. When I cast to int, the double value is obviously rounded down to the next whole number. Does this mean the double could be both 5 or 6? Or does that "5,999999999997" still count as 6 though which would be depending on the binary float value? If not, is there a way to let the negative vary be rounded up, but not lower values from 5,5 to 5,999999999996?
I mean, I dont really want to round the value as described in my last sentence. I'd like to always round down to the next whole number, but I don't want to cause an extra decrement due to wrong double math results.
Converting a double to an int always rounds down. You can round to the nearest whole integer via Math.round(double). The double is varying from what you expect because of floating point error.
If you want to round, you can use the round() method.
double d = 6 +/- some small error
long l = Math.round(d);
Or you can add 0.5 for positive numbers
long l = (long) (d + 0.5);
or
long l = (long) (d + (d < 0 ? -0.5 : 0.5));
I'm not sure I understand the question. Usually when you cast a double to int you add 0.5 to have a nice round.
From the Java Language Specification:
The Java programming language uses round toward zero when converting a floating value to an
integer (§5.1.3), which acts, in this case, as though the number were truncated, discarding
the mantissa bits. Rounding toward zero chooses at its result the format's value closest to
and no greater in magnitude than the infinitely precise result.
So 5,999999999997 when casted to an int will 5 and 6,0000000002 will be 6. If I understand what you are asking with having negative versions of the values (e.g. -5.97), I fail to see how
Math.round() does not suffice you. -6,0000000002 will be rounded to -6 as will -5,999999999997 and every other value above (but not including) -5.5.
Can anyone shed some light on why Double.MIN_VALUE is not actually the minimum value that Doubles can take? It is a positive value, and a Double can of course be negative.
I understand why it's a useful number, but it seems a very unintuitive name, especially when compared to Integer.MIN_VALUE. Calling it Double.SMALLEST_POSITIVE or MIN_INCREMENT or similar would have clearer semantics.
Also, what is the minimum value that Doubles can take? Is it -Double.MAX_VALUE? The docs don't seem to say.
The IEEE 754 format has one bit reserved for the sign and the remaining bits representing the magnitude. This means that it is "symmetrical" around origo (as opposed to the Integer values, which have one more negative value). Thus the minimum value is simply the same as the maximum value, with the sign-bit flipped, so yes, -Double.MAX_VALUE is the lowest actual number you can represent with a double.
I suppose the Double.MAX_VALUE should be seen as maximum magnitude, in which case it actually makes sense to simply write -Double.MAX_VALUE. It also explains why Double.MIN_VALUE is the least positive value (since that represents the least possible magnitude).
But sure, I agree that the naming is a bit misleading. Being used to the meaning Integer.MIN_VALUE, I too was a bit surprised when I read that Double.MIN_VALUE was the smallest absolute value that could be represented. Perhaps they thought it was superfluous to have a constant representing the least possible value as it is simply a - away from MAX_VALUE :-)
(Note, there is also Double.NEGATIVE_INFINITY but I'm disregarding from this, as it is to be seen as a "special case" and does not in fact represent any actual number.)
Here is a good text on the subject.
These constants have nothing to do with sign. This makes more sense if you consider a double as a composite of three parts: Sign, Exponent and Mantissa.
Double.MIN_VALUE is actually the smallest value Mantissa can assume when the Exponent is at minimun value before a flush to zero occurs. Likewise MAX_VALUE can be understood as the largest value Mantissa can assume when the Exponent is at maximum value before a flush to infinity occurs.
A more descriptive name for these two could be Largest Absolute (add non-zero for verbositiy) and Smallest Absolute value (add non-infinity for verbositiy).
Check out the IEEE 754 (1985) standard for details. There is a revised (2008) version, but that only introduces more formats which aren't even supported by java (strictly speaking java even lacks support for some mandatory features of IEEE 754 1985, like many other high level languages).
I assume the confusing names can be traced back to C, which defined FLT_MIN as the smallest positive number.
Like in Java, where you have to use -Double.MAX_VALUE, you have to use -FLT_MAX to get the smallest float in C.
The minimum value for a double is Double.NEGATIVE_INFINITY that's why Double.MIN_VALUE isn't really the minimum for a Double.
As the double are floating point numbers, you can only have the biggest number (with a lower precision) or the closest number to 0 (with a great precision).
If you really want a minimal value for a double that isn't infinity then you can use -Double.MAX_VALUE.
Because with floating point numbers, the precision is what is important as there's no exact range.
/**
* A constant holding the smallest positive nonzero value of type
* <code>double</code>, 2<sup>-1074</sup>. It is equal to the
* hexadecimal floating-point literal
* <code>0x0.0000000000001P-1022</code> and also equal to
* <code>Double.longBitsToDouble(0x1L)</code>.
*/
But i agree that it should probably have been named something better :)
As it says in the documents,
Double.MIN_VALUE is a constant holding the smallest POSITIVE nonzero value of type double, 2^(-1074).
The trick here is we are talking about a floating point number representation. The double data type is a double-precision 64-bit IEEE 754 floating point. Floating points represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease, and while maximizing precision (the number of digits) at both ends of the scale. (For more refer this)
The mantissa, always a positive number, holds the significant digits of the floating-point number. The exponent indicates the positive or negative power of the radix that the mantissa and sign should be multiplied by. The four components are combined as follows to get the floating-point value.
Think that the MIN_VALUE is the minimum value that the mantissa can represent. As the minimum values of a floating point representation is the minimum magnitude that can be represented using that. (Could have used a better name to avoid this confusion though)
123 > 10 > 1 > 0.12 > 0.012 > 0.0000123 > 0.000000001 > 0.0000000000000001
Below is just FYI.
Double-precision floating-point can represent 2,098 powers of two, from 2^-1074 through 2^1023. Denormalized powers of two are those from 2^-1074 through 2^-1023; normalized powers of two are those from 2^-1022 through 2^1023. Refer this and this.