I'm studying the float-point type and the examples is a declaration of a variable float expressed as an hexadecimal
float f_in_hex = Ox1.59a8f6p8f
This is the computation to find the float value:
(1 * 16^0 + 5 * 16^-1 + 9 * 16^-2 + 10 * 16^-3 + 8 * 16^-4 + 15 * 16^-5 + 6 * 16^-6) * 2^8
So, I know what is the prefix Ox, that base is 16 but I still don't understand why the exponential part start from 0 and goes with negative values
it's negative value because it's after the decimal point
16^(-1) is the same as 1/16 = .0625
if it was positive exponent it would be a big number.
hope you understand what i mean
that is not a hexadecimal number, firstly Ox is the letter O and it is not a zero, then within the supposed hexadecimal numbers there is a letter p, the hexadecimal numbers only cover from a-f
Related
Why double in Java has a specific range of values from ±5,0*10(^-324) to ±1,7*10(^308)? I mean why it's not like ±5,0*10(^-324) to ±5,0*10(^308) or ±1,7*10(^-324) to ±1,7*10(^308)?
Answer to your question is subnormal numbers, check following link
https://en.wikipedia.org/wiki/Denormal_number
Double floating point numbers in Java are based on the format defined in IEEE 754.
See this link for the explanation.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
Following is a simple set of rules
Floating point number is represented in 64 bits
64 bits are divided in following
Sign bit: 1 bit (sign of the number)
Exponent: 11 bits (signed)
Significand precision (Fraction): 52 bits
Number range that we get from this setup is
-1022 <= Exponent <= 1023 (total 2046) (excluding 0 and 2047, they have special meanings)
000 (0 in base 16) is used to represent a signed zero (if F=0) and subnormals (if F≠0); and
7ff (2047 in base 16) is used to represent ∞ (if F=0) and NaNs (if F≠0),
https://en.wikipedia.org/wiki/Exponent_bias
and
-2^52 <= Fraction <= 2^52
So the minimum and maximum numbers that can be represented are
Min positive double = +1 * 2^(-1022) ≈ 2.225 * 10(−308)
Note: 1022 * Math.log(2) / Math.log(10) = 307.652
and Math.pow(10, 1 - .652) = 2.228 (.652 is approximation)
Max positive double = +(2^52) * (2^1023) = 1.797 * 10^308
So the range becomes [-2.225 * 10(−308), 1.797 * 10^308]
This range changes due to subnormal numbers
Subnormal number is a number that is smaller than the minimum normal
number defined by the specification.
If I have a number 0.00123 it would be represented as 1.23 * 10^(-3). Floating point numbers by specification don't have leading zeroes. So If there's a number with leading zeros, it adds to the default Exponent. So If I have a number with minimum exponent possible with leading zeroes, leading zeros will add to the negative exponent.
There are 52 bits for the signifand (fraction) so maximum number of leading zeros in binary can be 51. which effectively produce following number.
Min positive Subnormal = 1 * 2^-52 * (2^-1022) = 2^(-2074) ≈ 4.9·10^(−324)
Note: 1074 * Math.log(2) / Math.log(10) = 323.306
Math.pow(10, 1 - 0.306) = 4.943
So there you have it, range is now
[- Min subnormal number, + Max normal number]
or
[- 4.9 * 10^(−324), + 1.79769 *10^308]
I've been trying to find out the reason, but I couldn't.
Can anybody help me?
Look at the following example.
float f = 125.32f;
System.out.println("value of f = " + f);
double d = (double) 125.32f;
System.out.println("value of d = " + d);
This is the output:
value of f = 125.32
value of d = 125.31999969482422
The value of a float does not change when converted to a double. There is a difference in the displayed numerals because more digits are required to distinguish a double value from its neighbors, which is required by the Java documentation. That is the documentation for toString, which is referred (through several links) from the documentation for println.
The exact value for 125.32f is 125.31999969482421875. The two neighboring float values are 125.3199920654296875 and 125.32000732421875. Observe that 125.32 is closer to 125.31999969482421875 than to either of the neighbors. Therefore, by displaying “125.32”, Java has displayed enough digits so that conversion back from the decimal numeral to float reproduces the value of the float passed to println.
The two neighboring double values of 125.31999969482421875 are 125.3199996948242045391452847979962825775146484375 and 125.3199996948242329608547152020037174224853515625.
Observe that 125.32 is closer to the latter neighbor than to the original value (125.31999969482421875). Therefore, printing “125.32” does not contain enough digits to distinguish the original value. Java must print more digits in order to ensure that a conversion from the displayed numeral back to double reproduces the value of the double passed to println.
When you convert a float into a double, there is no loss of information. Every float can be represented exactly as a double.
On the other hand, neither decimal representation printed by System.out.println is the exact value for the number. An exact decimal representation could require up to about 760 decimal digits. Instead, System.out.println prints exactly the number of decimal digits that allow to parse the decimal representation back into the original float or double. There are more doubles, so when printing one, System.out.println needs to print more digits before the representation becomes unambiguous.
The conversion from float to double is a widening conversion, as specified by the JLS. A widening conversion is defined as an injective mapping of a smaller set into its superset. Therefore the number being represented does not change after a conversion from float to double.
More information regarding your updated question
In your update you added an example which is supposed to demonstrate that the number has changed. However, it only shows that the string representation of the number has changed, which indeed it has due to the additional precision acquired through the conversion to double. Note that your first output is just a rounding of the second output. As specified by Double.toString,
There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double.
Since the adjacent values in the type double are much closer than in float, more digits are needed to comply with that ruling.
The 32bit IEEE-754 floating point number closest to 125.32 is in fact 125.31999969482421875. Pretty close, but not quite there (that's because 0.32 is repeating in binary).
When you cast that to a double, it's the value 125.31999969482421875 that will be made into a double (125.32 is nowhere to be found at this point, the information that it should really end in .32 is completely lost) and of course can be represented exactly by a double. When you print that double, the print routine thinks it has more significant digits than it really has (but of course it can't know that), so it prints to 125.31999969482422, which is the shortest decimal that rounds to that exact double (and of all decimals of that length, it is the closest).
The issue of the precision of floating-point numbers is really language-agnostic, so I'll be using MATLAB in my explanation.
The reason you see a difference is that certain numbers are not exactly representable in fixed number of bits. Take 0.1 for example:
>> format hex
>> double(0.1)
ans =
3fb999999999999a
>> double(single(0.1))
ans =
3fb99999a0000000
So the error in the approximation of 0.1 in single-precision gets bigger when you cast it as double-precision floating-point number. The result is different from its approximation if you started directly in double-precision.
>> double(single(0.1)) - double(0.1)
ans =
1.490116113833651e-09
As already explained, all floats can be exactly represented as a double and the reason for your issue is that System.out.println performs some rounding when displaying the value of a float or double but the rounding methodology is not the same in both cases.
To see the exact value of the float, you can use a BigDecimal:
float f = 125.32f;
System.out.println("value of f = " + new BigDecimal(f));
double d = (double) 125.32f;
System.out.println("value of d = " + new BigDecimal(d));
which outputs:
value of f = 125.31999969482421875
value of d = 125.31999969482421875
it won`t work in java because in java by default it will take real values as double and if we declare a float value without float representation
like
123.45f
by default it will take it as double and it will cause an error as loss of precision
The representation of the values changes due to contracts of the methods that convert numerical values to a String, correspondingly java.lang.Float#toString(float) and java.lang.Double#toString(double), while the actual value remains the same. There is a common part in Javadoc of both aforementioned methods that elaborates requirements to values' String representation:
There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values
To illustrate the similarity of significant parts for values of both types, the following snippet can be run:
package com.my.sandbox.numbers;
public class FloatToDoubleConversion {
public static void main(String[] args) {
float f = 125.32f;
floatToBits(f);
double d = (double) f;
doubleToBits(d);
}
private static void floatToBits(float floatValue) {
System.out.println();
System.out.println("Float.");
System.out.println("String representation of float: " + floatValue);
int bits = Float.floatToIntBits(floatValue);
int sign = bits >>> 31;
int exponent = (bits >>> 23 & ((1 << 8) - 1)) - ((1 << 7) - 1);
int mantissa = bits & ((1 << 23) - 1);
System.out.println("Bytes: " + Long.toBinaryString(Float.floatToIntBits(floatValue)));
System.out.println("Sign: " + Long.toBinaryString(sign));
System.out.println("Exponent: " + Long.toBinaryString(exponent));
System.out.println("Mantissa: " + Long.toBinaryString(mantissa));
System.out.println("Back from parts: " + Float.intBitsToFloat((sign << 31) | (exponent + ((1 << 7) - 1)) << 23 | mantissa));
System.out.println(10D);
}
private static void doubleToBits(double doubleValue) {
System.out.println();
System.out.println("Double.");
System.out.println("String representation of double: " + doubleValue);
long bits = Double.doubleToLongBits(doubleValue);
long sign = bits >>> 63;
long exponent = (bits >>> 52 & ((1 << 11) - 1)) - ((1 << 10) - 1);
long mantissa = bits & ((1L << 52) - 1);
System.out.println("Bytes: " + Long.toBinaryString(Double.doubleToLongBits(doubleValue)));
System.out.println("Sign: " + Long.toBinaryString(sign));
System.out.println("Exponent: " + Long.toBinaryString(exponent));
System.out.println("Mantissa: " + Long.toBinaryString(mantissa));
System.out.println("Back from parts: " + Double.longBitsToDouble((sign << 63) | (exponent + ((1 << 10) - 1)) << 52 | mantissa));
}
}
In my environment, the output is:
Float.
String representation of float: 125.32
Bytes: 1000010111110101010001111010111
Sign: 0
Exponent: 110
Mantissa: 11110101010001111010111
Back from parts: 125.32
Double.
String representation of double: 125.31999969482422
Bytes: 100000001011111010101000111101011100000000000000000000000000000
Sign: 0
Exponent: 110
Mantissa: 1111010101000111101011100000000000000000000000000000
Back from parts: 125.31999969482422
This way, you can see that values' sign, exponent are the same, while its mantissa was extended retained its significant part (11110101010001111010111) exactly the same.
The used extraction logic of floating point number parts: 1 and 2.
Both are what Microsoft refers to as "approximate number data types."
There's a reason. A float has a precision of 7 digits, and a double 15. But I have seen it happen many times that 8.0 - 1.0 - 6.999999999. This is because they are not guaranteed to represent a decimal number fraction exactly.
If you need absolute, invariable precision, go with a decimal, or integral type.
I would like to check if the result is measurable; that is, whether it has a finite number if decimal places. What do i mean?
double x = 5.0 / 9.0; // x = 0.(5)
x is not measurable.
I want to round x to the second digit ( x = 0.56 ), but in such case:
double x = 1.0 / 8.0; // x = 0.125
I don't want to round anything.
So here is my question. How do i decide if the result can be measured or not?
You cannot. That is the reason, why 1.0 / 3 / 100 * 3 * 100 gives you 0.9999...9. You only have so many bits to represent the numbers. You cannot distinguish between the period
1.0 / 3 and a number that actually has 0.3333.....3 as value
The only fractions which will be exactly represented in a binary will be ones where the denominator is a power of two. If your input is two integers for the numerator and denominator then find the prime factorisation of both and remove the common factors. Then check the only remaining factors on the denominator are power of 2. Say if we want to find 56 / 70 this is 2^3 * 7 / ( 2 * 5 * 7) removing common factors gives 2^2 / 5 so that will not work. But 63 / 72 = (7*3^2) / (2^3 * 3^2) = 7 / 2^3 so will be a terminating binary number
If your working in decimal then powers of 2 and 5 on the denominator will be allowed.
I have some problem with numerator, denumerator and modulo. 7 / 3 = 2.3333333333 gives me a modulo of 1!? Must be some wrong? I study a non-objective ground level course, so my code is simple and I have simplified the code below. (Some lines are in swedish)
Calling the method:
// Anropar metod och presenterar beräkning av ett bråktal utifrån täljare och nämnare
int numerator = 7;
int denumerator = 3;
System.out.println("Bråkberäkning med täljare " + numerator + " och nämnare " + denumerator + " ger " + fraction(numerator,denumerator));
And the method:
// Metod för beräkning av bråktal utifrån täljare och nämnare
public static String fraction(int numerator, int denumerator) {
// Beräkning
int resultat1 = numerator / denumerator;
int resultat2 = numerator % denumerator;
return Integer.toString(resultat1) + " rest " + Integer.toString(resultat2);
}
3 goes into 7 twice with 1 left over. The answer is supposed to be 1. That's what modulo means.
7 modulo 3 gives 1. Since 7 = 2*3 + 1.
7 % 3 = 1
Just as expected. If you want the .3333 you could take the modulo and devide it by your denominator to get 1 / 3 = 0.3333
Or do (7.0 / 3.0) % 1 = 0.3333
Ehm 7 % 3 = 1
What would you expect?
Given two positive numbers, a (the dividend) and n (the divisor), a modulo n (abbreviated as a mod n) can be thought of as the remainder, on division of a by n. For instance, the expression "5 mod 4" would evaluate to 1 because 5 divided by 4 leaves a remainder of 1, while "9 mod 3" would evaluate to 0 because the division of 9 by 3 leaves a remainder of 0; there is nothing to subtract from 9 after multiplying 3 times 3. (Notice that doing the division with a calculator won't show you the result referred to here by this operation, the quotient will be expressed as a decimal.) When either a or n is negative, this naive definition breaks down and programming languages differ in how these values are defined. Although typically performed with a and n both being integers, many computing systems allow other types of numeric operands.
More info : http://en.wikipedia.org/wiki/Modulo_operation
you didn't do a question!
And if your question is just:
"...gives me a modulo of 1!? Must be some wrong?"
No, it isn't, 7/3 = 2, and has a modulo of 1. Since (3 * 2) + 1 = 7.
You are using integer operands so you get an integer result. That's how the language works.
A modulo operator will give you the reminder of a division. Therefore, it is normal that you get the number 1 as a result.
Also, note that you are using integers... 7/3 != 2.3333333333.
One last thing, be careful with that code. A division by zero would make your program crash. ;)
% for ints does not give the decimal fraction but the remainder from the division. Here it is from 6 which is the highest multiplum of 2 lower than your number 7. 7-6 is 1.
I'm confused with converting the RGB values to YCbCr color scheme. I used this equation:
int R, G, b;
double Y = 0.229 * R + 0.587 * G + 0.144 * B;
double Cb = -0.168 * R - 0.3313 * G + 0.5 * B + 128;
double Cr = 0.5 * R - 0.4187 * G - 0.0813 * B + 128;
The expected output of YCbCr is normalized between 0-255, I'm confused because one of my source says it is normalized within the range of 0-1.
And it is going well, But I am having problem when getting the LipMap to isolate/detect the lips of the face, I implemented this:
double LipMap = Cr*Cr*(Cr*Cr-n*(Cr/Cb))*(Cr*Cr-n*(Cr/Cb));
n returns 0-255, the equation for n is: n=0.95*(summation(Cr*Cr)/summation(Cr/Cb))
but another sources says: n = 0.95*(((1/k)*summation(Cr*Cr))/((1/k)*summation(Cr/Cb)))
where k is equal to the number of pixels in the face image.
It say's from my sources that it will return a result of 0-255, but in my program it always returns large numbers always, not even giving me 0-255.
So can anyone help me implement this and solve my problem?
From the sources you linked in your comments, it looks like either the equations or the descriptions in the first source are wrong:
If you use RGB values in the Range [0,255] and the given conversion (your Cb conversion differs from that btw.) you should get Cr and Cb values in the same range.
Now if you calculate n = 0.95 * (ΣCr2/Σ(Cr/Cb)) you'll notice that the values for Cr2 range from [0,65025] whereas Cr/Cb is in the range [0,255] (assuming Cb=0 is not possible and thus the highest value would be 255/1 = 255).
If you further assume an image with quite high red and low blue components, you'll get way higher values for n than what is stated in that paper:
Constant η fits final value in range 0..255
The second paper states this, which makes much more sense IMHO (although I don't know whether they normalize Cr and Cb to range [0,1] before the calculation or if they normalize the result which might result in a higher difference between Cr2 and Cr/Cb):
Where (Cr) 2,(Cr/Cb) all are normalized to the
range [0 1].
Note that in order to normalize Cr and Cb to range [0,1] you'd either need to divide the result of your equations by 255 or simply use RGB in range [0,1] and add 0.5 instead of 128:
//assumes RGB are in range [0,1]
double Cb = -0.168 * R - 0.3313 * G + 0.5 * B + 0.5;
double Cr = 0.5 * R - 0.4187 * G - 0.0813 * B + 0.5;