type calculation in math expressions - java

I am prepare for ocpjp.
I know that:
byte * byte = int
long * int = long
But I was wondered that float * float = float
float on float may be very huge number and for logically convert it to double.
Anyway for success exam passing I should to know all rules about it.
Please explain me these rules or just point to relevant jls part.

I suppose the relevant part of the JLS is 4.2.4 Floating-Point Operations :
If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. If the other operand is not a double, it is first widened (§5.1.5) to type double by numeric promotion (§5.6).
Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. (If the other operand is not a float, it is first widened to type float by numeric promotion.)
So if you multiply two float values, a 32-bit operation is performed. An overflow to infinity will occur if the values are too large.
If you multiply a float by a double, the result will be a double and the float will be expanded to a 64-bit value during the operation.

As a result of my investigation I created following rule:
if (statement contains double)
result-double
else if (statement contains float)
result-float
else if (statement contains long)
result - long
else
result - int

Related

How to convert from double for values larger than long in java

I am facing issues while converting from double to long:
double power = Math.pow(2, 63);
long powerInLong = (double) power;
The above code returns:
9223372036854775807
while it should return:
-9223372036854775808
I am confused why is this happening.
Thanks in advance!
You are doing a conversion from double to long, which is a narrowing primitive conversion.
This conversion is clearly specified in the JLS §5.1.3 (emphasis mine):
A narrowing conversion of a floating-point number to an integral type
T takes two steps:
In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int,
as follows:
If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.
Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward
zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there are
two cases:
a. If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.
b. Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.
Otherwise, one of the following two cases must be true:
a. The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is
the smallest representable value of type int or long.
b. The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is
the largest representable value of type int or long.
Because 2^63 exceeds the range of long (but not the range of double), converting it to a long will result in the largest long. This can be demonstrated also by converting Math.pow(2, 64) to long, which results in the same long.

Java implicit conversion

With the following code:
Float a = 1.2;
there is an error because it takes the decimal as double value and double is a bigger datatype than float.
Now, it takes integer as default int type. So, why is the following code not giving any error?
Byte b = 20;
The compiler is smart enough to figure out that the bit representation of 20 (an int value) can fit into a byte with no loss of data. From the Java Language Specification §5.1.3:
A narrowing primitive conversion from double to float is governed by the IEEE 754 rounding rules (§4.2.4). This conversion can lose precision, but also lose range, resulting in a float zero from a nonzero double and a float infinity from a finite double. A double NaN is converted to a float NaN and a double infinity is converted to the same-signed float infinity.
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
See also this thread.
There are no implicit narrowing conversions in general - constant expressions are the only exception, and they are explicitly allowed by JLS 5.2:
In addition, if the expression is a constant expression (§15.28) of type byte, short, char, or int:
* A narrowing primitive conversion may be used if the type of the variable is byte, short, or char, and the value of the constant expression is representable in the type of the variable.
There is no mention of implicit narrowing conversions being allowed for floating point numbers, so they are forbidden as per the general rule.

Converting an int to a double by multiplying by 1.0 or adding 1d?

As far as I know, to convert an integer to a double one can multiply the former by "1.0". It's apparently also possible to add "1d" (the double literal) to it. What, then, is the difference?
Thanks!
So, if you mean "add the d to the end of the numeral"...then there's no difference. In Java, by default, all floating-point literals are double.
So these two literals are the same thing:
41.32
41.32d
If you were to add f instead of d, then one would be a float instead of a double.
You can change an int to a double in this way, too:
113
113d
If you're multiplying an int with a double to get a double, then the int is being promoted to a double so that the floating-point arithmetic can take place.
From the JLS:
Widening primitive conversion (§5.1.2) is applied to convert either or both operands as specified by the following rules:
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted to float.
Otherwise, if either operand is of type long, the other is converted to long.
Otherwise, both operands are converted to type int.
Adding 'd' is like an explicit cast to a double, multiplying will also convert to double cause
If either operand is of type double, the other is converted to double
before the operation is carried out.
and 1.0 is a double, so multiplying an int by 1.0, would give a result of type double, but would also convert the other operand to double

Math in Java - different results with different objects

I'm getting some strange results doing a calculation for the application I'm working on and I thought someone on here might be able to help figure out what's going on.
The requirements for this particular calculation state that the calculation should look like this:
A and B are known
A * B = C
For this particular calculation
A = 0.0410
B = 123456789010
Here are the results I'm seeing:
Calculator:
0.0410 * 123456789010 = 5061728349.41
Java:
B is a double:
0.0410f * 123456789010d = 5.061728489223363E9 = 5061728489.223363
B is a long:
0.0410f * 123456789010l = 5.0617288E9
The loss of precision is of less importance to me (I only need 9 digits of precision anyway) than the difference in the 10s and 1s spot. Why does doing the calculation using the double give me the "wrong" result?
Incidentally, I tried doing the calculation using BigDecimal and got the same result as I did using a double.
The various type conversions that happen are specified by the JLS #5.6.2. In your case (extract):
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted to float.
In 0.0410f * 123456789010d = 506172848.9223363, 0.0410f is first converted to a double which is not necessarily equal to 0.0410d. Actually you can try it and see that is is not:
double d1 = 0.041d;
double d2 = 0.041f;
System.out.println(new BigDecimal(d1));
System.out.println(new BigDecimal(d2));
outputs:
0.041000000000000001720845688168992637656629085540771484375
0.041000001132488250732421875
In your next example:
0.0410f * 123456789010L = 506172832
the long is converted to a float, which you can verify with this example:
float f1 = 0.0410f;
float f2 = 123456789010L;
System.out.println(new BigDecimal(f1)); // 0.041000001132488250732421875
System.out.println(new BigDecimal(f2)); // 123456790528
System.out.println(new BigDecimal(0.0410f * 123456789010L)); // 5061728768
System.out.println(new BigDecimal(f1 * f2)); // 5061728768
As for the precision of float / double operations in general, check this question.
Finally, if you use a BigDecimal, you get the correct answer:
BigDecimal a = new BigDecimal("0.041");
BigDecimal b = new BigDecimal("123456789010");
System.out.println(a.multiply(b)); // outputs 5061728349.410
TLDR Answer: The float cannot represent the 'correct' answer any more exactly. Use a double instead. Also the multiplication will be done inexactly as well without an explicit cast.
Answers I get using http://www.ideone.com
A B C
float long float 5061728768.000000
double long double 5061728489.223363
The problem is that the precision of a float is much less than a double, so when multiplied up by a large number (e.g. your 10^10 value) you lose this precision in the multiplication. If we explicitly cast A to a double for the multiplication:
double C = ((double)A)*B; //=5061728489.223363
Then we get back the additional precision. If we cast the double answer back to a float:
float C = (float)((double)((double)A)*B); //=5061728256.000000
You see that the answer is different again. The result type of the multiply is used, so in this instance double, but the cast back to float drops precision. Without an explicit case to double (double C=A*B), the float type is used. With both casts, the multiply is done as a double, and the precision is lost after the multiplication.
The first calculation is using double (64 bits), the second float (32 bits). What you are seeing is "rounding errors".
In both cases it is a floating-point calculation, but in the second case, no "double" arguments are involved, so it just uses 32 bit arithmetic.
Quoting the Java language spec:
If at least one of the operands to a binary operator is of floating-point type, then the operation is a floating-point operation, even if the other is integral.
If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. If the other operand is not a double, it is first widened (§5.1.5) to type double by numeric promotion (§5.6).
Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. (If the other operand is not a float, it is first widened to type float by numeric promotion.)
The answer to your question is probably in the Floating point operation section of the Java Language Specification and in this older post. You are probably experiencing rounding errors due to the implicit conversion that is ocurring.
The quote that applies to your situation is
Third operation:
If at least one of the operands to a binary operator is of
floating-point type, then the operation is a floating-point operation,
even if the other is integral.
Second operation:
If at least one of the operands to a numerical operator is of type
double, then the operation is carried out using 64-bit floating-point
arithmetic, and the result of the numerical operator is a value of
type double. If the other operand is not a double, it is first widened
(§5.1.5) to type double by numeric promotion (§5.6).
First operation
Otherwise, the operation is carried out using 32-bit floating-point
arithmetic, and the result of the numerical operator is a value of
type float. (If the other operand is not a float, it is first widened
to type float by numeric promotion.)
Hence, you should not be worried, but decide what is the precision you desire and use the appropriate casting, if necessary.
32-bit IEEE floating point numbers have seven digits of precision; 64-bit allows 16. That's all you get. If neither of those is sufficient, you have to use BigDecimal.
This is true in every language that implements the IEE standard, not just Java.

casting between short,int,long,double,float in Java

As i understand, when you cast between 2 of these types with an arithmetic operation in Java, for example double + int, the result will be as the bigger type (meaning in this example, the result will be double). What happens when you make an arithmetic operation on 2 types with the same size? what will int + float and long + double give? since int and float are 4 bytes each, and long and double are 8 bytes.
This is all specified by the binary numeric promotion rules in the JLS. From http://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.6.2:
When an operator applies binary numeric promotion to a pair of
operands, each of which must denote a value that is convertible to a
numeric type, the following rules apply, in order:
If any operand is of a reference type, it is subjected to unboxing
conversion (§5.1.8).
Widening primitive conversion (§5.1.2) is applied to convert either or
both operands as specified by the following rules:
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted
to float.
Otherwise, if either operand is of type long, the other is converted
to long.
Otherwise, both operands are converted to type int.
After the type conversion, if any, value set conversion (§5.1.13) is
applied to each operand.
Binary numeric promotion is performed on the operands of certain operators:
The multiplicative operators *, / and % (§15.17)
The addition and subtraction operators for numeric types + and -
(§15.18.2)
The numerical comparison operators <, <=, >, and >= (§15.20.1)
The numerical equality operators == and != (§15.21.1)
The integer bitwise operators &, ^, and | (§15.22.1)
In certain cases, the conditional operator ? : (§15.25)
("value set conversion" is about mapping between floating-point representations.)
int + float will give you float (note that you'll have to cast result to float because double is used by default). long + double will give you double.
Still going to return the "bigger" type that can hold the entirely value. In your specific question
if you make a + between a int and a float the return will be a float
and long + double returns a double,
The behavior of the additive operators + and - is defined in by 15.18.2. Additive Operators (+ and -) for Numeric Types of the JLS. It states that it first performs a binary numeric promotion:
Binary numeric promotion is performed on the operand.
This in turns is defined by 5.6.2. Binary Numeric Promotion. In substance, for primitives:
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted to float.
Otherwise, if either operand is of type long, the other is converted to long.
Otherwise, both operands are converted to type int.
There are two an interesting FAQ on type conversion that can be found in: http://www.programmersheaven.com/2/FAQ-JAVA-Type-Conversion-Casting
http://myhowto.org/java/60-understanding-the-primitive-numeric-type-conversions-in-java/
Answering your questions on two types of the same size, the return value is of the type with the biggest precision.
Try the following code:
public static void main(String[] args) {
int i=1;
float f=2.5f;
long l=10;
double d=3.74;
System.out.println(i+f);
System.out.println(f+i);
System.out.println(l+d);
System.out.println(d+l);
}
You will see that the results are 3.5 and 13.74, which are a float and a double, respectively (tested in Netbeans 6.9 and java 1.6).
A gotcha of this "promotion" is that a long + float will "widen" to using a float.
e.g.
System.out.println(1111111111111111111L + 0.0f);
System.out.println(1111111111111111111L + 0.0);
prints
1.11111113E18
1.11111111111111117E18
When dealing with long and float, you may not get a wider type and can lose more precision than you might expect.

Categories