This is likely an easy solution that is simply eluding me. Specifically, I am creating locations on a canvas dynamically using a sin() function for equidistant points on a circle. Once these points are created, I am animating a shape moving from one point to the next by calculating the slope between points and redrawing the shape at each slope step.
Problem is, depending on the coordinate values, the slope step may just be one step from point a to point b. I need the shape to move along the path, not just jump point to point.
What I want to do is force the location coordinates (x, y) to be even numbers allowing for slope values to always be reducible. So, the simple part of question is...
How do I check if an int value is even? If it is not, I will simply add 1 to the coordinate value.
int newNumber = someInt % 2 == 0 ? someInt : someInt + 1;
To see if an integer is even:
Check if its value is congruent to 0 modulo 2. That is value MOD 2 == 0. In C-style languages this is usually expressed as value % 2 == 0.
Alternatively, check the value of bit 0. That is value BITWISE-AND 0x01 == 0. In C-style languages this is usually expressed as (value & 0x01) == 0.
If you do not care which direction you round, you can even-ize an integer in a single operation by taking its value bitwise-and a mask of 0xFFFE (of course padded to the width of your integer), which will force set the 0 bit to zero. That is value := value BITWISE-AND 0xFFFE, or in C-style languages value &= 0xFFFE.
Do a mod 2 on it. If the remainder is 0, it's an even number.
VB:
Dim even = (3 mod 2 = 0 )
It's hard to give you specifics since you haven't given many, but you should look into the
Modulo operation.
For integers, while using modulo will give you the correct answer, it requires division. Division isn't as fast as bitwise operations. For what you need, a bitwise AND is sufficient.
if(x & 0x1)
{
sdt::cout << "x is odd" << std::endl;
}
else
{
std::cout << "x is even" << std::endl;
}
The key is that all positive powers of two are even. Therefore the only way a binary representation of an integer can be odd is if the first bit is set.
Related
I was wondering about the differences between positive and negative zero in different numeric types.
I understand the IEEE-754 for floating point arithmetic and bit representation in double precision so the following didn't come as a surprise
double posz = 0.0;
double negz = -0.0;
System.out.println(Long.toBinaryString(Double.doubleToLongBits(posz)));
System.out.println(Long.toBinaryString(Double.doubleToLongBits(negz)));
// output
>>> 0
>>> 1000000000000000000000000000000000000000000000000000000000000000
What did surprise me and showed me that im clueless about the bit representation of long type in java is that even if i shift right (unsigned >>>) then the binary representation of both positive and negative zero is the same
long posz = 0L;
long negz = -0L;
for (int i = 63; i >= 0; i--) {
System.out.print((posz >>> i) & 1);
}
System.out.println();
for (int i = 63; i >= 0; i--) {
System.out.print((negz >>> i) & 1);
}
// output
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> 0000000000000000000000000000000000000000000000000000000000000000
so i am wondering what does java do from a bit representation when i write the following
long posz = 0L;
long negz = -0L;
Does the compiler understand that they are both zero and disregards the sign (and so assignes 0 to the sign bit) or is there other magic here?
or is there other magic here?
Yes. 2's complement.
2's complement is a bit magical. It accomplishes 2 major objectives. Before getting into that, let's first stew on the notion of negative zero for a moment.
Negative zero is kinda weird. Why does it exist at all?
Negative zero isn't actually a thing. Ask any mathematician "Hey, so, what's up with negative zero?" and they'll just look at you in befuddlement. It's not a thing. Mathematically, 0 and -0 are utterly identical. Not just 'nearly identical', but 100%, fully, in all possible ways, identical. We don't generally want our numbers to be capable of representing both 5.0 as well as 5.00 - as those two are entirely, 100%, identical. If you don't think that a value system ought to waste bits trying to differentiate between 5.0 and 5.00, then it's equally bizarro to want the ability to represent -0.0 and +0.0 as distinct entities.
So, wanting -0 in the first place is kinda weird. All the numeric primitives (long, int, short, byte, and I guess char which is technically numeric too) all cannot represent this number. Instead, long z = -0 boils down to:
Take the constant "0".
Apply the 'negate' operation to this number (- is a unary operator. Just like 2+5 makes the system calculate the binary operation of "addition" on elements 2 and 5, -x makes the system calculate the unary operation of "negation" on element x. Applying the negation operation to 0 produces 0. It's no different from writing, say, int x = 5 + 0;. That +0 part doesn't do anything. The - in front of -0 doesn't do anything. In contrast to -0.0 where it does do something (gets you negative zero, the double value, instead of positive zero).
Store this result in z (so, just 0 then).
There is no way to tell if that minus is there. They both result in ALL ZERO bits, and hence, there is no way for the computer to tell if you initialized that variable with the expression -0 or with +0. Again in contrast to double where as you noticed there's a bit different.
So why does double have it then?
Let's stew a bit on the notion of doubles and IEEE-754 math.
A double takes 64 bits. From basic pure mathematical principles then, a double is as incapable of representing more than 2^64 different possible values you are capable of breaking the speed of light or making 1+1=3.
And yet, a double aims to represent all numbers. There are way more numbers between 0 and 1 than 2^64 options (in fact, an infinite amount of numbers exist between 0 and 1), and that's just 0 to 1.
So, how doubles actually work is different. A few less than 2^64 numbers are chosen from the entire number line. Let's call these the blessed numbers.
The blessed numbers are not equally distributed. The closer you are to 1, the more blessed numbers exist. In other words, the distance between 2 blessed numbers increases as you move away from 1. For example, if you go from, say, 1e100 (a 1 with a hundred zeroes) and want to find the next blessed number, it's quite a ways. It's in fact higher than 1.0! - 1e100+1 is in fact 1e100 again, because the way double math works is that after every single last mathematical operation you to do them, the end result is rounded to the nearest blessed number.
Let's try it!
double d = 1e100;
System.out.println(d);
System.out.println(d + 1);
// prints: 1.0E100
// 1.0E100
But that means.. double values don't actually represent a single number!!. What any given double represents is in fact this concept:
An unknown number whose value lies between [D - 𝛿, D + 𝛿], where D is the blessed number that is closed to this unknown number this value represents, and, and 𝛿 is half of the distance between D and the next nearest blessed number on either side.
Given that usually 𝛿 is incredibly small, this is 'good enough'. But this weirdness does explain why you really, really do not want any business at all with double if accuracy is important (such as with currencies. Don't store those in doubles, ever!)
Given that, what does -0.0 represent? not actually just 0. It represents, specifically: An unknown number whose value lies between [-𝛿, 0] where 0 is real zero (and this, has no sign), and 𝛿 is Double.MIN_VALUE: the smallest non-zero positive number representable with a double.
That's why -0.0 and +0.0 both exist: They are in fact different concepts. Rarely relevant, but sometimes it is. In contrast to e.g. long where 5 just means 5 and not "between 4.5 and 5.5", because longs fundamentally don't recognize that fractional parts exist in the first place. Given that 5 just means 5, then 0 just means 0, and there is no such thing as negative zero in the first place.
Now we get to 2's complement
2's complement is a cool system. It has two neat properties:
It only has the one zero.
It does not matter if you treat the bit sequence as signed-by-way-of-2s-complement or as unsigned, for the purposes of the operations: Addition, Substraction, Increment, Decrement, zero-check. The modifications you do to the bits to implement those operations is identical.
It DOES matter for greater than, less than, and divide.
2's complement works like this: To negate a number, take all bits and flip them (i.e. do a NOT operation on the bits). Then, add 1.
Let's try it!
int x = 5;
int y = -x;
for (int i = 31; i >= 0; i--) {
System.out.print((x >>> i) & 1);
}
System.out.println();
for (int i = 31; i >= 0; i--) {
System.out.print((y >>> i) & 1);
}
System.out.println();
// prints 00000000000000000000000000000101
// 11111111111111111111111111111011
As we can see, the 'flip all bits and add 1' algorithm was applied.
2s complement is, of course, reversible: If you do 'flip all bits and add 1' twice in a row you get the same number out.
Now let's try -0. 0 is 32 0 bits, then flip them all, then add 1:
00000000000000000000000000000000
11111111111111111111111111111111 // flip all
100000000000000000000000000000000 // add 1
00000000000000000000000000000000 // that 1 fell off
and because ints can only store 32 bits, that final '1' falls off of the end. And we're left with zero again.
Now let's go with bytes ( abit smaller) and try to add, say, 200 and 50 together.
11001000 // 200 in binary
00110010 // 50 in binary
-------- +
11111010 // 250 in binary.
now let's instead go: Oh wait, whoops, that was an error, actually these numbers are in 2s complement. That wasn't 200, nono. 11001000 is a bit sequence that actually means (let's apply the 'flip all bits, add 1' scheme: 00111000 - it's actually -56. So the operation was meant to represent '-56 + 50'. Which is -6. -6 in binary is (write out 6, flip bits, add 1):
00000110
11111001
11111010
hey now, look at that, nothing changed! It's the same result! So, when the computer does x + y, where x and y are numbers, the computer does not care. Whether x is "an unsigned number" or "a signed with 2s complement number", the operation is identical.
That's why 2s complement is applied. It makes math MUCH faster. The CPU doesn't have to futz about with branching out to deal with sign bits.
In this sense it is more correct to say that in java, int, long, char, byte and short are neither signed nor unsigned, they just are. At least for the purposes of +, -, ++, and --. No the idea that int is signed is fundamentally a property of e.g. System.out.println(int) - that method chooses to render the bitsequence 11111111111111111111111111111111 as "-1" instead of as 4294967296.
long has no such thing as negative zero. Only float and double have a different representation of positive and negative zero.
I need to find out if int n is a power of two and my approach is to convert n to a hex number and check each bit (0 or 1). However, I've never used hex numbers in Java, could anyone help me out here?
Both converting to a String and replacing using a regular expression is expensive.
A simple way to check for a (positive) power of two is to check the number bits set.
if (x > 0 && Long.bitCount(x) == 1)
While Long.bitCount looks complicated, the JVM can replace it with a single machine code instruction.
The canonical way (which you will encounter a lot when googling how to test for powers of two, and you will probably encounter it in code) to test for powers of two is
x != 0 && (x & x - 1) == 0
It is often encountered with more parenthesis (precedence paranoia?)
x != 0 && (x & (x - 1)) == 0
It is common to skip the x != 0 check and guarantee zero can't be an input, or (often enough) zero is "as good as" a power of two in some sense.
You can of course change the condition to x > 0 if you don't want to consider -2147483648 a power of two, though in many cases it would be (because it's congruent to 231 modulo 232, so interpreted unsigned it's a PoT, and anyway it only has one bit set so it was a PoT all along)
So what's the deal with x & x - 1. It unsets the lowest set bit, but let's look at it in the context of PoTs.
Let's ignore x<2 and say x has the form a10k (ie an arbitrary string of bits followed by a one followed by k zeroes), subtracting 1 from it gives a01k because the -1 borrows through all the trailing zeroes until it reaches the lowest set bit, unsets that bit, then the borrow dies and the top bits are unchanged.
If you take the bitwise AND of a10k and a01k you get a00k because something ANDed with itself is itself again (the a), and in the tail there's always a 0 involved in the AND so it's all zeroes.
Now there are two cases. 0) a=0. Then the result is zero, and we know we started out with x of the form 10k which is a power of two. And 1) a!=0, then the result isn't zero either because a still appears in it, and x wasn't a power of two because it has at least two set bits (the lowest set bit which we explicitly looked at, and at least one other somewhere in a).
Well actually there are two more cases, x=0 and x=1 which were ignored. If we allow k to be zero then x=1 is also included. x=0 is annoying though, in x & x - 1 there is that x as the left operand of &, so the result must be zero no matter what happens in the right operand. So it falls out as a special case.
There is no reason to convert to hex to check bits. In fact, that actually makes it harder. The following line produces a binary String for an int variable "num":
String binary = Integer.toString(num, 2)
That String will only have 1s and 0s in it. Then eliminate any 0s:
String ones = binary.replace("0", "");
If the length of the String is more than one, there there is more than one bit, which means it is not a power of 2. If the length is 0, it means the number had no bits, which means it is 0.
I stumbled upon a question that asks whether you ever had to use bit shifting in real projects. I have used bit shifts quite extensively in many projects, however, I never had to use arithmetic bit shifting, i.e., bit shifting where the left operand could be negative and the sign bit should be shifted in instead of zeros. For example, in Java, you would do arithmetic bit shifting with the >> operator (while >>> would perform a logical shift). After thinking a lot, I came to the conclusion that I have never used the >> with a possibly negative left operand.
As stated in this answer arithmetic shifting is even implementation defined in C++, so – in contrast to Java – there is not even a standardized operator in C++ for performing arithmetic shifting. The answer also states an interesting problem with shifting negative numbers that I was not even aware of:
+63 >> 1 = +31 (integral part of quotient E1/2E2)
00111111 >> 1 = 00011111
-63 >> 1 = -32
11000001 >> 1 = 11100000
So -63>>1 yields -32 which is obvious when looking at the bits, but maybe not what most programmers would anticipate on first sight. Even more surprising (but again obvious when looking at the bits) is that -1>>1 is -1, not 0.
So, what are concrete use cases for arithmetic right shifting of possibly negative values?
Perhaps the best known is the branchless absolute value:
int m = x >> 31;
int abs = x + m ^ m;
Which uses an arithmetic shift to copy the signbit to all bits. Most uses of arithmetic shift that I've encountered were of that form. Of course an arithmetic shift is not required for this, you could replace all occurrences of x >> 31 (where x is an int) by -(x >>> 31).
The value 31 comes from the size of int in bits, which is 32 by definition in Java. So shifting right by 31 shifts out all bits except the signbit, which (since it's an arithmetic shift) is copied to those 31 bits, leaving a copy of the signbit in every position.
It has come in handy for me before, in the creation of masks that were then used in '&' or '|' operators when manipulating bit fields, either for bitwise data packing or bitwise graphics.
I don't have a handy code sample, but I do recall using that technique many years ago in black-and-white graphics to zoom in (by extending a bit, either 1 or 0). For a 3x zoom, '0' would become '000' and '1' would become '111' without having to know the initial value of the bit. The bit to be expanded would be placed in the high order position, then an arithmetic right shift would extend it, regardless of whether it was 0 or 1. A logical shift, either left or right, always brings in zeros to fill vacated bit positions. In this case the sign bit was the key to the solution.
Here's an example of a function that will find the least power of two greater than or equal to the input. There are other solutions to this problem that are probably faster, namly any hardware oriented solution or just a series of right shifts and ORs. This solution uses arithmetic shift to perform a binary search.
unsigned ClosestPowerOfTwo(unsigned num) {
int mask = 0xFFFF0000;
mask = (num & mask) ? (mask << 8) : (mask >> 8);
mask = (num & mask) ? (mask << 4) : (mask >> 4);
mask = (num & mask) ? (mask << 2) : (mask >> 2);
mask = (num & mask) ? (mask << 1) : (mask >> 1);
mask = (num & mask) ? mask : (mask >> 1);
return (num & mask) ? -mask : -(mask << 1);
}
Indeed logical right shift is much more commonly used. However there are many operations that require an arithmetic shift (or are solved much more elegantly with an arithmetic shift)
Sign extension:
Most of the time you only deal with the available types in C and the compiler will automatically sign extend when casting/promoting a narrower type to a wider one (like short to int) so you may not notice it, but under the hood a left-then-right shift is used if the architecture doesn't have an instruction for sign extension. For "odd" number of bits you'll have to do the sign extension manually so this would be much more common. For example if a 10-bit pixel or ADC value is read into the top bits of a 16-bit register: value >> 6 will move the bits to the lower 10 bit positions and sign extend to preserve the value. If they're read into the low 10 bits with the top 6 bits being zero you'll use value << 6 >> 6 to sign extend the value to work with it
You also need signed extension when working with signed bit fields
struct bitfield {
int x: 15;
int y: 12;
int z: 5;
};
int f(bitfield b) {
return (b.x/8 + b.y/5) * b.z;
}
Demo on Godbolt. The shifts are generated by the compiler but usually you don't use bitfields (as they're not portable) and operate on raw integer values instead so you'll need to do arithmetic shifts yourself to extract the fields
Another example: sign-extend a pointer to make a canonical address in x86-64. This is used to store additional data in the pointer: char* pointer = (char*)((intptr_t)address << 16 >> 16). You can think of this as a 48-bit bitfield at the bottom
V8 engine's SMI optimization stores the value in the top 31 bits so it needs a right shift to restore the signed integer
Round signed division properly when converting to a multiplication, for example x/12 will be optimized to x*43691 >> 19 with some additional rounding. Of course you'll never do this in normal scalar code because the compiler already does this for you but sometimes you may need to vectorize the code or make some related libraries then you'll need to calculate the rounding yourself with arithmetic shift. You can see how compilers round the division results in the output assembly for bitfield above
Saturated shift or shifts larger than bit width, i.e. the value becomes zero when the shift count >= bit width
uint32_t lsh_saturated(uint32_t x, int32_t n) // returns 0 if n == 32
{
return (x << (n & 0x1F)) & ((n-32) >> 5);
}
uint32_t lsh(uint32_t x, int32_t n) // returns 0 if n >= 32
{
return (x << (n & 0x1F)) & ((n-32) >> 31);
}
Bit mask, useful in various cases like branchless selection (i.e. muxer). You can see lots of ways to conditionally do something on the famous bithacks page. Most of them are done by generating a mask of all ones or all zeros. The mask is usually calculated by propagating the sign bit of a subtraction like this (x - y) >> 31 (for 32-bit ints). Of course it can be changed to -(unsigned(x - y) >> 31) but that requires 2's complement and needs more operations. Here's the way to get the min and max of two integers without branching:
min = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1)));
max = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1)));
Another example is m = m & -((signed)(m - d) >> s); in Compute modulus division by (1 << s) - 1 in parallel without a division operator
I am not too sure what you mean. BUt i'm going to speculate that you want to use the bit shift as an arithmetic function.
One interesting thing i have seen is this property of binary numbers.
int n = 4;
int k = 1;
n = n << k; // is the same as n = n * 2^k
//now n = (4 * 2) i.e. 8
n = n >> k; // is the same as n = n / 2^k
//now n = (8 / 2) i.e. 4
hope that helps.
But yes you want to be careful of negative numbers
i would mask and then turn it back accordingly
In C when writing device drivers, bit shift operators are used extensively since bits are used as switches that need to be turned on and off. Bit shift allow one to easily and correctly target the right switch.
Many hashing and cryptographic functions make use of bit shift. Take a look at Mercenne Twister.
Lastly, it is sometimes useful to use bitfields to contain state information. Bit manipulation functions including bit shift are useful for these things.
I understand what the unsigned right shift operator ">>>" in Java does, but why do we need it, and why do we not need a corresponding unsigned left shift operator?
The >>> operator lets you treat int and long as 32- and 64-bit unsigned integral types, which are missing from the Java language.
This is useful when you shift something that does not represent a numeric value. For example, you could represent a black and white bit map image using 32-bit ints, where each int encodes 32 pixels on the screen. If you need to scroll the image to the right, you would prefer the bits on the left of an int to become zeros, so that you could easily put the bits from the adjacent ints:
int shiftBy = 3;
int[] imageRow = ...
int shiftCarry = 0;
// The last shiftBy bits are set to 1, the remaining ones are zero
int mask = (1 << shiftBy)-1;
for (int i = 0 ; i != imageRow.length ; i++) {
// Cut out the shiftBits bits on the right
int nextCarry = imageRow & mask;
// Do the shift, and move in the carry into the freed upper bits
imageRow[i] = (imageRow[i] >>> shiftBy) | (carry << (32-shiftBy));
// Prepare the carry for the next iteration of the loop
carry = nextCarry;
}
The code above does not pay attention to the content of the upper three bits, because >>> operator makes them
There is no corresponding << operator because left-shift operations on signed and unsigned data types are identical.
>>> is also the safe and efficient way of finding the rounded mean of two (large) integers:
int mid = (low + high) >>> 1;
If integers high and low are close to the the largest machine integer, the above will be correct but
int mid = (low + high) / 2;
can get a wrong result because of overflow.
Here's an example use, fixing a bug in a naive binary search.
Basically this has to do with sign (numberic shifts) or unsigned shifts (normally pixel related stuff).
Since the left shift, doesn't deal with the sign bit anyhow, it's the same thing (<<< and <<)...
Either way I have yet to meet anyone that needed to use the >>>, but I'm sure they are out there doing amazing things.
As you have just seen, the >> operator automatically fills the
high-order bit with its previous contents each time a shift occurs.
This preserves the sign of the value. However, sometimes this is
undesirable. For example, if you are shifting something that does not
represent a numeric value, you may not want sign extension to take
place. This situation is common when you are working with pixel-based
values and graphics. In these cases you will generally want to shift a
zero into the high-order bit no matter what its initial value was.
This is known as an unsigned shift. To accomplish this, you will use
java’s unsigned, shift-right operator,>>>, which always shifts zeros
into the high-order bit.
Further reading:
http://henkelmann.eu/2011/02/01/java_the_unsigned_right_shift_operator
http://www.java-samples.com/showtutorial.php?tutorialid=60
The signed right-shift operator is useful if one has an int that represents a number and one wishes to divide it by a power of two, rounding toward negative infinity. This can be nice when doing things like scaling coordinates for display; not only is it faster than division, but coordinates which differ by the scale factor before scaling will differ by one pixel afterward. If instead of using shifting one uses division, that won't work. When scaling by a factor of two, for example, -1 and +1 differ by two, and should thus differ by one afterward, but -1/2=0 and 1/2=0. If instead one uses signed right-shift, things work out nicely: -1>>1=-1 and 1>>1=0, properly yielding values one pixel apart.
The unsigned operator is useful either in cases where either the input is expected to have exactly one bit set and one will want the result to do so as well, or in cases where one will be using a loop to output all the bits in a word and wants it to terminate cleanly. For example:
void processBitsLsbFirst(int n, BitProcessor whatever)
{
while(n != 0)
{
whatever.processBit(n & 1);
n >>>= 1;
}
}
If the code were to use a signed right-shift operation and were passed a negative value, it would output 1's indefinitely. With the unsigned-right-shift operator, however, the most significant bit ends up being interpreted just like any other.
The unsigned right-shift operator may also be useful when a computation would, arithmetically, yield a positive number between 0 and 4,294,967,295 and one wishes to divide that number by a power of two. For example, when computing the sum of two int values which are known to be positive, one may use (n1+n2)>>>1 without having to promote the operands to long. Also, if one wishes to divide a positive int value by something like pi without using floating-point math, one may compute ((value*5468522205L) >>> 34) [(1L<<34)/pi is 5468522204.61, which rounded up yields 5468522205]. For dividends over 1686629712, the computation of value*5468522205L would yield a "negative" value, but since the arithmetically-correct value is known to be positive, using the unsigned right-shift would allow the correct positive number to be used.
A normal right shift >> of a negative number will keep it negative. I.e. the sign bit will be retained.
An unsigned right shift >>> will shift the sign bit too, replacing it with a zero bit.
There is no need to have the equivalent left shift because there is only one sign bit and it is the leftmost bit so it only interferes when shifting right.
Essentially, the difference is that one preserves the sign bit, the other shifts in zeros to replace the sign bit.
For positive numbers they act identically.
For an example of using both >> and >>> see BigInteger shiftRight.
In the Java domain most typical applications the way to avoid overflows is to use casting or Big Integer, such as int to long in the previous examples.
int hiint = 2147483647;
System.out.println("mean hiint+hiint/2 = " + ( (((long)hiint+(long)hiint)))/2);
System.out.println("mean hiint*2/2 = " + ( (((long)hiint*(long)2)))/2);
BigInteger bhiint = BigInteger.valueOf(2147483647);
System.out.println("mean bhiint+bhiint/2 = " + (bhiint.add(bhiint).divide(BigInteger.valueOf(2))));
The code review tool I use complains with the below when I start comparing two float values using equality operator. What is the correct way and how to do it? Is there a helper function (commons-*) out there which I can reuse?
Description
Cannot compare floating-point values using the equals (==) operator
Explanation
Comparing floating-point values by using either the equality (==) or inequality (!=) operators is not always accurate because of rounding errors.
Recommendation
Compare the two float values to see if they are close in value.
float a;
float b;
if(a==b)
{
..
}
IBM has a recommendation for comparing two floats, using division rather than subtraction - this makes it easier to select an epsilon that works for all ranges of input.
if (abs(a/b - 1) < epsilon)
As for the value of epsilon, I would use 5.96e-08 as given in this Wikipedia table, or perhaps 2x that value.
It wants you to compare them to within the amount of accuracy you need. For example if you require that the first 4 decimal digits of your floats are equal, then you would use:
if(-0.00001 <= a-b && a-b <= 0.00001)
{
..
}
Or:
if(Math.abs(a-b) < 0.00001){ ... }
Where you add the desired precision to the difference of the two numbers and compare it to twice the desired precision.
Whatever you think is more readable. I prefer the first one myself as it clearly shows the precision you are allowing on both sides.
a = 5.43421 and b = 5.434205 will pass the comparison
private static final float EPSILON = <very small positive number>;
if (Math.abs(a-b) < EPSILON)
...
As floating point offers you variable but uncontrollable precision (that is, you can't set the precision other than when you choose between using double and float), you have to pick your own fixed precision for comparisons.
Note that this isn't a true equivalence operator any more, as it isn't transitive. You can easily get a equals b and b equals c but a not equals c.
Edit: also note that if a is negative and b is a very large positive number, the subtraction can overflow and the result will be negative infinity, but the test will still work, as the absolute value of negative infinity is positive infinity, which will be bigger than EPSILON.
Use commons-lang
org.apache.commons.lang.math.NumberUtils#compare
Also commons-math (in your situation more appropriate solution):
http://commons.apache.org/math/apidocs/org/apache/commons/math/util/MathUtils.html#equals(double, double)
The float type is an approximate value - there's an exponent portion and a value portion with finite accuracy.
For example:
System.out.println((0.6 / 0.2) == 3); // false
The risk is that a tiny rounding error can make a comparison false, when mathematically it should be true.
The workaround is to compare floats allowing a minor difference to still be "equal":
static float e = 0.00000000000001f;
if (Math.abs(a - b) < e)
Apache commons-math to the rescue: MathUtils.(double x, double y, int maxUlps)
Returns true if both arguments are equal or within the range of allowed error (inclusive). Two float numbers are considered equal if there are (maxUlps - 1) (or fewer) floating point numbers between them, i.e. two adjacent floating point numbers are considered equal.
Here's the actual code form the Commons Math implementation:
private static final int SGN_MASK_FLOAT = 0x80000000;
public static boolean equals(float x, float y, int maxUlps) {
int xInt = Float.floatToIntBits(x);
int yInt = Float.floatToIntBits(y);
if (xInt < 0)
xInt = SGN_MASK_FLOAT - xInt;
if (yInt < 0)
yInt = SGN_MASK_FLOAT - yInt;
final boolean isEqual = Math.abs(xInt - yInt) <= maxUlps;
return isEqual && !Float.isNaN(x) && !Float.isNaN(y);
}
This gives you the number of floats that can be represented between your two values at the current scale, which should work better than an absolute epsilon.
I took a stab at this based on the way java implements == for doubles. It converts to the IEEE 754 long integer form first and then does a bitwise compare. Double also provides the static doubleToLongBits() to get the integer form. Using bit fiddling you can 'round' the mantissa of the double by adding 1/2 (one bit) and truncating.
In keeping with supercat's observation, the function first tries a simple == comparison and only rounds if that fails. Here is what I came up with some (hopefully) helpful comments.
I did some limited testing, but can't say I've tried all edge cases. Also, I did not test performance. It shouldn't be too bad.
I just realized that this is essentially the same solution as the one offered by Dmitri. Perhaps a bit more concise.
static public boolean nearlyEqual(double lhs, double rhs){
// This rounds to the 6th mantissa bit from the end. So the numbers must have the same sign and exponent and the mantissas (as integers)
// need to be within 32 of each other (bottom 5 bits of 52 bits can be different).
// To allow 'n' bits of difference create an additive value of 1<<(n-1) and a mask of 0xffffffffffffffffL<<n
// e.g. 4 bits are: additive: 0x10L = 0x1L << 4 and mask: 0xffffffffffffffe0L = 0xffffffffffffffffL << 5
//int bitsToIgnore = 5;
//long additive = 1L << (bitsToIgnore - 1);
//long mask = ~0x0L << bitsToIgnore;
//return ((Double.doubleToLongBits(lhs)+additive) & mask) == ((Double.doubleToLongBits(rhs)+additive) & mask);
return lhs==rhs?true:((Double.doubleToLongBits(lhs)+0x10L) & 0xffffffffffffffe0L) == ((Double.doubleToLongBits(rhs)+0x10L) & 0xffffffffffffffe0L);
}
The following modification handles the change in sign case where the value is on either side of 0.
return lhs==rhs?true:((Double.doubleToLongBits(lhs)+0x10L) & 0x7fffffffffffffe0L) == ((Double.doubleToLongBits(rhs)+0x10L) & 0x7fffffffffffffe0L);
There are many cases where one wants to regard two floating-point numbers as equal only if they are absolutely equivalent, and a "delta" comparison would be wrong. For example, if f is a pure function), and one knows that q=f(x) and y===x, then one should know that q=f(y) without having to compute it. Unfortunately the == has two defects in this regard.
If one value is positive zero and the other is negative zero, they will compare as equal even though they are not necessarily equivalent. For example if f(d)=1/d, a=0 and b=-1*a, then a==b but f(a)!=f(b).
If either value is a NaN, the comparison will always yield false even if one value was assigned directly from the other.
Although there are many cases where checking floating-point numbers for exact equivalence is right and proper, I'm not sure about any cases where the actual behavior of == should be considered preferable. Arguably, all tests for equivalence should be done via a function that actually tests equivalence (e.g. by comparing bitwise forms).
First, a few things to note:
The "Standard" way to do this is to choose an constant epsilon, but constant epsilons do not work correctly for all number ranges.
If you want to use a constant epsilon sqrt(EPSILON) the square root of the epsilon from float.h is a generally considered a good value. (this comes from an infamous "orange book" who's name escapes me at the moment).
Floating point division is going to be slow, so you probably want to avoid it for comparisons even if it behaves like picking an epsilon that is custom made for the numbers' magnitudes.
What do you really want to do? something like this:
Compare how many representable floating point numbers the values differ by.
This code comes from this really great article by Bruce Dawson. The article has been since updated here. The main difference is the old article breaks the strict-aliasing rule. (casting floating pointers to int pointer, dereferencing, writing, casting back). While the C/C++ purist will quickly point out the flaw, in practice this works, and I consider the code more readable. However, the new article uses unions and C/C++ gets to keep its dignity. For brevity I give the code that breaks strict aliasing below.
// Usable AlmostEqual function
bool AlmostEqual2sComplement(float A, float B, int maxUlps)
{
// Make sure maxUlps is non-negative and small enough that the
// default NAN won't compare as equal to anything.
assert(maxUlps > 0 && maxUlps < 4 * 1024 * 1024);
int aInt = *(int*)&A;
// Make aInt lexicographically ordered as a twos-complement int
if (aInt < 0)
aInt = 0x80000000 - aInt;
// Make bInt lexicographically ordered as a twos-complement int
int bInt = *(int*)&B;
if (bInt < 0)
bInt = 0x80000000 - bInt;
int intDiff = abs(aInt - bInt);
if (intDiff <= maxUlps)
return true;
return false;
}
The basic idea in the code above is to first notice that given the IEEE 754 floating point format, {sign-bit, biased-exponent, mantissa}, that the numbers are lexicographically ordered if interpreted as signed magnitude ints. That is the sign bit becomes the sign bit, the and the exponent always completely outranks the mantissa in defining magnitude of the float and because it comes first in determining the magnitude of the number interpreted as an int.
So, we interpret the bit representation of the floating point number as a signed-magnitude int. We then convert the signed-magnitude ints to a two's complement ints by subtracting them from 0x80000000 if the number is negative. Then we just compare the two values as we would any signed two's complement ints, and seeing how many values they differ by. If this amount is less than the threshold you choose for how many representable floats the values may differ by and still be considered equal, then you say that they are "equal." Note that this method correctly lets "equal" numbers differ by larger values for larger magnitude floats, and by smaller values for smaller magnitude floats.