Bijection between Java float and integer keeping order - java

Both int and float in Java are 32 bits size values. Is it possible to program a pair of functions
int toInt(float f);
float toFloat(int n);
such that if f1 and f2 are arbitrary float non-NaN values and i1 and i2 are arbitraty int values:
f1 < f2 if and only if toInt(f1) < toInt(f2)
f1 > f2 if and only if toInt(f1) > toInt(f2)
f1 == f2 if and only if toInt(f1) == toInt(f2)
toInt(toFloat(i1) == i1
toFloat(toInt(f1)) == f1
Edit: I have edited the question to exclude NaN values for float, thanks to the answers clarifying what happens with those.

Yes. IEEE floats and doubles are arranged in such a way that you can compare them by doing an unsigned comparison of the raw binary representation. The function to convert from float to raw integer and back are java.lang.Float.floatToIntBits and java.lang.Float.intBitsToFloat. These functions are processor intrinsics, so they have an extremely low cost.
The same is true for longs and doubles. Here the conversion functions are java.lang.Double.doubleToLongBits and java.lang.Double.longBitsToDouble.
Note that if you want to use the normal signed comparison for your integers, you have to do some additional transformation in addition to the conversion to integer.
The only exception to this rule is NaN, which does not permit a total ordering anyway.

You can use
int n = Float.floatToRawIntBits(f);
float f2 = Float.intBitToFloat(n);
int n2 = Float.floatToRawIntBits(f2);
assert n == n2; // always
assert f == f2 || Float.isNaN(f);
The raw bits as a int have the same sort order as the original float with the exception of the NaN values which are not comparable as a float value have a value as an int
Note: there is multiple values for NaN which are not equal to each other as float

No you cannot
There are 2^32 possible int values, all of which are distinct.
However, thee are less than 2^32 floats; ie. 7FF0000000000001 to 7FF7FFFFFFFFFFFF represent NaN's,
There fore, you have more ints than floats an cannot distinctly map them to each other as toFloat(i1) would not be cable of producing a distinct float for every int

I see what you're saying. At first I had a different interpretation of your question. As everyone else has mention: yes. Use the articles described here and here to explain why we should use the methods described by #Peter Lawrey in order to compare the underlying bit pattern between ints and floats

The answer from Rüdiger Klaehn gives the normal case, but it lacks some details. The bijection exits only in the domain of nice and clean floats.
Notice : representation of an IEEE float is sign_bit(1 bit) exponent(8 bits) sinificand(23 bits) and the value is : (-1)<sup>sign</sup> * 2<sup>exp</sup> * significand in clean cases. In fact, the 23 bits represent the fractional part of the actual significand, the integer part being 1.
All is fine for 0 < exp < 255 (which correspond to normal not null floats ) as an unsigned byte and in that domain you have a bijection.
For exp == 255 you have the infinite values is significand == 0 and all the NaN for significand != 0 - ok, you explicitely excluded them.
But for exp == 0 there are still weird things : when significand == 0 you have +0 and -0. I am not sure if they are considered equal. If anybody knows, please feel free to edit the post. But as integer values, they will of course be different.
And when exp == 0 and significand != 0 you find denormalized numbers ... which while not being equal will be converted to either 0 of the littlest number not being 0.
So if you want a bijection only use normal numbers having 0 < exp < 255< and avoid NaN, infinite, 0 and denormal numbers where things are weird.
References :
IEEE floating point
Single-precision floating-point format
Denormal number

f1 == f2
is impossible, see this answer for more info. You will need to include a delta if you actually want to APPROXIMATE your equality-check.

Related

Infer smallest possible data type without accuracy loss

I have an input, which is a String. The input will always be a String representation of a decimal number, like one of these examples:
3.14159265
314159265
314159265e5
I have defined an enum called Types that has two members: Types.FLOAT and Types.DOUBLE.
I want to write a function that will return Types.FLOAT for all inputs that can be represented as a float without loosing accuracy, and Types.DOUBLE for all others that can be represented as a double without loosing accuracy. If the number is too accurate for double, something like null should be returned.
As we all know, float has a size of 32 bits while double has a size of 64 bits.
Casting a double to a float will result in a loss of accuracy, so for example:
3.14159265 --> 3.141592
314159265 --> 314159200
314159265e5 --> 314159200e5
Some things to clear up:
I don't actually want to parse my input into one of these types. I just need the information so I can pass it on to Hive.
Not using the smallest accurate data type is not acceptable.
There are 2 major parts of a floating point number, the Significand and the Exponent. According to wikipedia: https://en.wikipedia.org/wiki/IEEE_754 32 bit can represent 7.22 digits and an exponent of -126:127. A double can do 15.95 digits and -1022:1023 digits in the exponent.
So I would do something like the following (sudo code)
public static FloatingPointEnum getType(String number){
// Looks for the digits ignoring possible leading / trailing zeros
int numDigits = getDigits(number);
// Counts the leading / trailing zeros
int exponent = getExponent(number);
if(numDigits < 7 && exponent > -126 && exponent < 127){
return _32Bit;
} else if(numDigits < 15 && exponent > -1022 && exponent < 1023){
return _64Bit;
} else {
return null;
}
}

Efficient way of finding the number of decimals a double value has

I want to find an effiecient way of making sure that the number of decimal places in
double is not more than three.
double num1 = 10.012; //True
double num2 = 10.2211; //False
double num2 = 10.2; //True
Currently, what I do is just use a regex split and count index of . like below.
String[] split = new Double(num).toString().split("\\.")
split[0].length() //num of decimal places
Is there an efficient or better way to do this since I'll be calling this
function a lot?
If you want a solution that will tell you that information in a way that will agree with the eventual result of converting the double to a string, then efficiency doesn't really come into it; you basically have to convert to string and check. The result is that it's entirely possible for a double to contain a value that mathematically has a (say) non-zero value in (say) the hundred-thousandth place, but which when converted to string will not. Such is the joy of IEEE-754 double-precision binary floating point: The number of digits you get from the string representation is only as many as necessary to distinguish the value from its adjacent representable value. From the Double docs:
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
But if you're not concerned about that, and assuming limiting your value range to long is okay, you can do something like this:
private static boolean onlyThreePlaces(double v) {
double d = (double)((long)(v * 1000)) / 1000;
return d == v;
}
...which should have less memory overhead than a String-round-trip.
However, I'd be surprised if there weren't a fair number of times when that method and the result of Double.toString(double) didn't match in terms of digits after the decimal, for the reasons given above.
In a comment on the question, you've said (when I asked about the value range):
Honestly I'm not sure. I'm dealing with prices; For starters, I'll assume 0-200K
Using double for financial values is usually not a good idea. If you don't want to use BigDecimal because of memory concerns, pick your precision and use int or long depending on your value range. For instance, if you only need to-the-penny precision, you'd use values multiplied by 100 (e.g., 2000 is Ⓠ20 [or whatever currency you're using, I'm using Ⓠ for quatloos]). If you need precision to thousanths of a penny (as your question suggests), then multiply by 100000 (e.g., 2000000 is Ⓠ20). If you need more precision, pick a larger multiplier. Even if you go to hundred-thousanths of a penny (muliplier: 10000000), with long you have a range of Ⓠ-922,337,203,685 to Ⓠ922,337,203,685.
This has the side-benefit that it makes this check easier: Just a straight %. If your multiplier is 10000000 (hundred-thousandths of a penny), it's just value % 10000 != 0 to identify invalid ones (or value % 10000 == 0 to identify valid ones).
long num1 = 100120000; // 10.012 => true
// 100120000 % 10000 is 0 = valid
long num2 = 102211000; // 10.2211 => false
// 102211000 % 10000 is 1000 = invalid
long num3 = 102000000; // 10.2 => true
// 102000000 % 10000 is 0 = valid

Is it possible to get the raw bits in memory of a double?

I was just trying to convert the following methods that I wrote in C/C++ to Java. In short, the code provides a very efficient way of calculating the indices of the left-most and right-most bits of a number that are set to one. The two methods are based off of code in Knuth's Art of Computer programming, volume 4.
// Returns index of the left-most bit of x that is one in the binary
// expansion of x. Assumes x > 0 since otherwise lambda(x) is undefined.
// Can be used to calculate floor(log(x, 2)), the number of binary digits
// of x, minus one.
int lambda(unsigned long x) {
double y = (double) x;
// Excuse the monstrocity below. I need to have a long that has the raw
// bits of x in data. Simply (long)y would yield x back since C would cast
// the double to a long. So we need to cast it to a (void *) so that C
// "forgets" what kind of data we are dealing with, and then cast it to
// long.
unsigned long xx = *((long *)((void*)&y));
// The first 52 bits are the the significant. The rest are the sign and
// exponent. Since the number is assumed to be positive, we don't have to
// worry about the sign bit being 1 and can simply extract the exponent by
// shifting right 52 bits. The exponent is in "excess-1023" format so we
// must subtract 1023 after.
return (int)(xx >> 52) - 1023;
}
// Returns the index of the right-most one bit in the binary expansion of x
int rho(unsigned long x) {
return lambda(x & -x);
}
As you can see, I need to have a long that has the same bits of a double, but without a void* cast, I am not sure how to do this in Java. Any thoughts? Is it even possible?
There's a static function, doubleToLongBits(), to perform the type conversion.
long xx = Double.doubleToLongBits(y);
return (int) (xx >>> 52) - 1023;
Note the >>> treats the long as an unsigned value when shifting right.
Reading the commentary, though, it sounds like what you want is a simple function of the number of leading zeros.
return 63 - Long.numberOfLeadingZeros(x);
I would guess this is more efficient on most current architectures, but you'd have to profile it to be sure. There's a similar "trailing zeros" method to compute your rho() function.

Smallest epsilon so that comparison result change

What is the smallest float value A so that (x < x + A) == true?
I tried with Float.MIN_VALUE but surprisingly(? [1]) it doesn't work (except for values of 0.)
Knowing how the IEEE 754 standard stores float values, I could just add 1 to the mantissa of the float in question, but this seams really hackish. I don't want to put byte arrays and bit operations in my code for such a trivial matter, especially with Java. In addition if I simply add 1 to the Float.floatToIntBits() and the mantissa is all 1, it will increase the exponent by 1 and set the mantissa to 0. I don't want to implements all the handling of this cases if it is not necessary.
Isn't there some sort of function (hopefully build-in) that given the float x, it returns the smallest float A such that (x < x + A) == true?
If there isn't, what would be the cleanest way to implement it?
I'm using this because of how I'm iterating over a line of vertices
// return the next vertices strictly at the left of pNewX
float otherLeftX = pOppositeToCurrentCave.leftVertexTo(pNewX);
// we add MIN_VALUE so that the next call to leftVertexTo will return the same vertex returned by leftVertexTo(pNewX)
otherLeftX += Float.MIN_VALUE;
while(otherLeftX >= 0 && pOppositeToCurrentCave.hasLeftVertexTo(otherLeftX)) {
otherLeftX = pOppositeToCurrentCave.leftVertexTo(otherLeftX);
//stuff
}
Right now because of this problem the first vertex is always skipped because the second call to leftVertexTo(otherLeftX) doesn't return the same value it returned on the first call
[1] Not so surprising. I happened to realize after I noticed the problem that since the gap between floats is relative, for whatever number != 0 the MIN_VALUE is so small that it will be truncated and (x = x + FLOAT.MIN_VALUE) == true
You can try Math.nextUp(x)
Here is the doc:
Returns the floating-point value adjacent to f in the direction of positive infinity. This method is semantically equivalent to nextAfter(f, Float.POSITIVE_INFINITY); however, a nextUp implementation may run faster than its equivalent nextAfter call.
Special Cases:
If the argument is NaN, the result is NaN.
If the argument is positive infinity, the result is positive infinity.
If the argument is zero, the result is Float.MIN_VALUE
Parameters:
f - starting floating-point value
Returns:
The adjacent floating-point value closer to positive infinity.

Loss of precision - int -> float or double

I have an exam question I am revising for and the question is for 4 marks.
"In java we can assign a int to a double or a float". Will this ever lose information and why?
I have put that because ints are normally of fixed length or size - the precision for storing data is finite, where storing information in floating point can be infinite, essentially we lose information because of this
Now I am a little sketchy as to whether or not I am hitting the right areas here. I very sure it will lose precision but I can't exactly put my finger on why. Can I get some help, please?
In Java Integer uses 32 bits to represent its value.
In Java a FLOAT uses a 23 bit mantissa, so integers greater than 2^23 will have their least significant bits truncated. For example 33554435 (or 0x200003) will be truncated to around 33554432 +/- 4
In Java a DOUBLE uses a 52 bit mantissa, so will be able to represent a 32bit integer without lost of data.
See also "Floating Point" on wikipedia
It's not necessary to know the internal layout of floating-point numbers. All you need is the pigeonhole principle and the knowledge that int and float are the same size.
int is a 32-bit type, for which every bit pattern represents a distinct integer, so there are 2^32 int values.
float is a 32-bit type, so it has at most 2^32 distinct values.
Some floats represent non-integers, so there are fewer than 2^32 float values that represent integers.
Therefore, different int values will be converted to the same float (=loss of precision).
Similar reasoning can be used with long and double.
Here's what JLS has to say about the matter (in a non-technical discussion).
JLS 5.1.2 Widening primitive conversion
The following 19 specific conversions on primitive types are called the widening primitive conversions:
int to long, float, or double
(rest omitted)
Conversion of an int or a long value to float, or of a long value to double, may result in loss of precision -- that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value, using IEEE 754 round-to-nearest mode.
Despite the fact that loss of precision may occur, widening conversions among primitive types never result in a run-time exception.
Here is an example of a widening conversion that loses precision:
class Test {
public static void main(String[] args) {
int big = 1234567890;
float approx = big;
System.out.println(big - (int)approx);
}
}
which prints:
-46
thus indicating that information was lost during the conversion from type int to type float because values of type float are not precise to nine significant digits.
No, float and double are fixed-length too - they just use their bits differently. Read more about how exactly they work in the Floating-Poing Guide .
Basically, you cannot lose precision when assigning an int to a double, because double has 52 bits of precision, which is enough to hold all int values. But float only has 23 bits of precision, so it cannot exactly represent all int values that are larger than about 2^23.
Your intuition is correct, you MAY loose precision when converting int to float. However it not as simple as presented in most other answers.
In Java a FLOAT uses a 23 bit mantissa, so integers greater than 2^23 will have their least significant bits truncated. (from a post on this page)
Not true.
Example: here is an integer that is greater than 2^23 that converts to a float with no loss:
int i = 33_554_430 * 64; // is greater than 2^23 (and also greater than 2^24); i = 2_147_483_520
float f = i;
System.out.println("result: " + (i - (int) f)); // Prints: result: 0
System.out.println("with i:" + i + ", f:" + f);//Prints: with i:2_147_483_520, f:2.14748352E9
Therefore, it is not true that integers greater than 2^23 will have their least significant bits truncated.
The best explanation I found is here:
A float in Java is 32-bit and is represented by:
sign * mantissa * 2^exponent
sign * (0 to 33_554_431) * 2^(-125 to +127)
Source: http://www.ibm.com/developerworks/java/library/j-math2/index.html
Why is this an issue?
It leaves the impression that you can determine whether there is a loss of precision from int to float just by looking at how large the int is.
I have especially seen Java exam questions where one is asked whether a large int would convert to a float with no loss.
Also, sometimes people tend to think that there will be loss of precision from int to float:
when an int is larger than: 1_234_567_890 not true (see counter-example above)
when an int is larger than: 2 exponent 23 (equals: 8_388_608) not true
when an int is larger than: 2 exponent 24 (equals: 16_777_216) not true
Conclusion
Conversions from sufficiently large ints to floats MAY lose precision.
It is not possible to determine whether there will be loss just by looking at how large the int is (i.e. without trying to go deeper into the actual float representation).
Possibly the clearest explanation I've seen:
http://www.ibm.com/developerworks/java/library/j-math2/index.html
the ULP or unit of least precision defines the precision available between any two float values. As these values increase the available precision decreases.
For example: between 1.0 and 2.0 inclusive there are 8,388,609 floats, between 1,000,000 and 1,000,001 there are 17. At 10,000,000 the ULP is 1.0, so above this value you soon have multiple integeral values mapping to each available float, hence the loss of precision.
There are two reasons that assigning an int to a double or a float might lose precision:
There are certain numbers that just can't be represented as a double/float, so they end up approximated
Large integer numbers may contain too much precision in the lease-significant digits
For these examples, I'm using Java.
Use a function like this to check for loss of precision when casting from int to float
static boolean checkPrecisionLossToFloat(int val)
{
if(val < 0)
{
val = -val;
}
// 8 is the bit-width of the exponent for single-precision
return Integer.numberOfLeadingZeros(val) + Integer.numberOfTrailingZeros(val) < 8;
}
Use a function like this to check for loss of precision when casting from long to double
static boolean checkPrecisionLossToDouble(long val)
{
if(val < 0)
{
val = -val;
}
// 11 is the bit-width for the exponent in double-precision
return Long.numberOfLeadingZeros(val) + Long.numberOfTrailingZeros(val) < 11;
}
Use a function like this to check for loss of precision when casting from long to float
static boolean checkPrecisionLossToFloat(long val)
{
if(val < 0)
{
val = -val;
}
// 8 + 32
return Long.numberOfLeadingZeros(val) + Long.numberOfTrailingZeros(val) < 40;
}
For each of these functions, returning true means that casting that integral value to the floating point value will result in a loss of precision.
Casting to float will lose precision if the integral value has more than 24 significant bits.
Casting to double will lose precision if the integral value has more than 53 significant bits.
You can assign double as int without losing precision.

Categories