Ensure converging to 0, not -1, with binary string - java

I'm using cooperative coevolution to solve a couple of function optimisation problems, and am having issues.
The functions take N parameters, where each parameter is a number, and all the functions minimise when each and every one of the N parameters is 0.
My representation of the 'individuals' for use in cooperative coevolution is a binary string, so I can perform bit-flip mutations (it's imperative I keep this representation).
Due to the fact that the functions converge when the parameters are zero, of course -1 is pretty close. However, in 32-bit binary representation, -1 is a string of 32 1s, where 0 is a string of 32 0s. I such get stuck in a local optimum, where some of the parameters are -1 and others are 0.
So my question is, how does one go about avoiding/getting out of this? Is it acceptable to probabilistically flip all 32 bits with a probability equal to that of my mutation rate?
Thanks in advance guys

Using machine internal representation is considered harmful because it leads to deceptive solutions in genetic algorithms.
There are many articles about MDP (Minimal Deceptive Problem) in genetic algoritms that cover this topic, for example:
http://www.dtic.mil/get-tr-doc/pdf?AD=ADA294072
and great David Goldberg's book which explains deceptive problems and MDP and Building Blocks hypothesis (Genetic Algorithms in Search, Optimization, and Machine Learning, David Goldberg).
Internal representation of signed integer is Two's complement which leads to deceptive chromosome encoding where:
11111111 11111111 11111111 11111111 = -1
00000000 00000000 00000000 00000000 = 0
If you want to use this Two's complement encoding, then I suggest to mask most left bit to make individual always positive number and then convert to float number with any offset you want.
For example:
int a = ... // any value from 0 to 0x7fffffff
float x = ( ((float) a) / 0x7fffffff ) * 100 - 50.0;
// now x is in range: -50.0 .. 50.0

Related

Java - Masking out sign bit causes unexpected behavior

I have the following value:
int x = -51232;
Java integers are 32 bits, so in binary this should be the following:
10000000000000001100100000100000
The sign bit on the left is set to 1 since x is negative.
Then I do the operation
x = (x & Integer.MAX_VALUE);
Integer.MAX_VALUE is 2147483647 and in binary that would be:
01111111111111111111111111111111
0 on the left because the value is positive.
So why does x & Integer.MAX_VALUE yield 2147432416? The AND operator should only retrieve bits that x and Integer.MAX_VALUE have in common, which should be equivalent to -x (since they do not share the same sign bit).
What's going on here?
Your misunderstanding is caused by lack of knowledge regarding how negative integers are represented in binary in Java. You should read about 2's complement.
10000000000000001100100000100000 is not the binary representation of -51232.
11111111111111110011011111100000 is.
And when you run bitwise AND, you get:
11111111111111110011011111100000 (-51232)
01111111111111111111111111111111 (Integer.MAX_VALUE)
--------------------------------
01111111111111110011011111100000 (2147432416)
Here's the binary representation of -51232 next to the binary representation of 51232. You can see that their sum is 232. that's always the case with 2's complement, for any pair of ints x and -x.
00000000000000001100100000100000 (-51232)
11111111111111110011011111100000 (51232)
Integers are stored in two complement: https://en.m.wikipedia.org/wiki/Two%27s_complement.
Thus while the leftmost bit indicate negative numbers, it is not really a sign bit as it would be for floating point numbers in common representation.
The main reason of this representation is that it make it easy to do addition and substraction among other at the hardware level.
One should not that with two complement notation, -1 is represented by all bit set to 1.

java : shift distance for int restricted to 31 bits

Any idea why shift distance for int in java is restricted to 31 bits (5 lower bits of the right hand operand)?
http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.19
x >>> n
I could see a similar question java Bit operations >>> shift but nobody pointed the right answer
The shift distance is restricted to 31 bits because a Java int has 32 bits. Shifting an int number by more than 32 bits would produce the same value (either 0 or 0xFFFFFFFF, depending on the initial value and the shift operation you use).
It's a design decision, but it seems a bit unfortunate at least for some use cases. First some terminology: let's call the approach of defining as zero all shifts amounts larger than the number of bits in the shifted word the saturating approach, and the Java approach of using only the bottom 5 (or 6 for long) bits to define the shift amount as the mod approach.
You can look at the problem by listing the useful shift values. Those are shift amounts that result in unique output values1. If you take >>>, the interesting values are 0 though 32 inclusive. 0 results in an unchanged value, and 32 results in 0. Shifting by more than 32 would again produce the same result as 32, sure - but java doesn't even let you shift by 32: it stops at 31! A shift by 32 will, perhaps unexpectedly, leave your value unchanged.
In many uses of >>> a shift by 32 is not possible, or the Java behavior works. In other cases, however, the natural result is 32, and you must special case zero.
As to why they would choose that design? Well, it probably helped that the common PC hardware at the time (x86, just like today) implements shifts in exactly that way (using only the last 5 bits for 32-bit shifts, and the last 6 for 64-bits). So the shifts can be directly mapped to hardware without any special cases, conditional moves or branches2.
Furthermore, for hardware that doesn't implement those semantics by default, it is easy to get the Java semantics by a simple mask: shiftAmount & 0x1F. That's going to be fast on all hardware. The reverse mapping - implementing saturating shifts on hardware that doesn't support it is more complex: you may need a costly compare and branch, some bit twiddling hacks or predicated moves to handle the > 31 case.
Finally, the mod approach is quite natural for many algorithms. For example, if you are implementing a bitmap structure, addressable per-bit, a good implementation may be to have an array of integers, with each integer representing 32 bits. Internally to index into the Nth bit, you would break N down into two parts - the high 27 bits would find the word in the array the bit is in, and the low 5 bits would pick the bit out of the word. To pick the bit out of the word (e.g., to move it to the LSB), you might do:
int val = (word >>> (index & 0x1F)) & 1
That sets val to 1 if the bit was set, 0 otherwise. However, because of the way the Java >>> operator was specified, you don't need the & 0x1F part at all, because it is already implied in the mod definition! So you can omit it, and indeed the JDK's BitSet uses exactly that trick.
1 Granted, any value without a 1 in the MSB may not produce unique values under >>>, once all the 1s get shifted off, so let's just talk about any value with a leading one.
2 For what it's worth, I checked ARM and the semantics are even weirder: for variable shifts, the bottom eight bits of the shift amount is used. So the shift is a weird hybrid - it is effectively a saturating shift once you exceed 31, but only up to 255, at which point it loops around and suddenly has non-zero values for the next 31 values, etc.

Cast to byte: how do we clip figures

I'm reading Core Java by Horstmann.
This is an example:
byte nx = (byte)300;
System.out.println(nx);
The result is 44. I can't understand why? I suppose 2 variants: 1) everything is ruined and you just get a complete garbage; 2) there is some logic.
I'm inclined to the second variant as the book tells me that it is 44 that is received. So, there is some algorithm behind it.
Could you help me understand.
A cast to byte will only retain the least significant 8 bits. 300 (as an int here) in binary is
00000000 00000000 00000001 00101100
Retaining the last 8 bits throws away the most signficant 1 (and everything else before it) which represents 256, so the remaining value is 300 - 256 = 44
00101100
The byte data type is only 8 bits long, and the decimal number 300 requires 9 bits.
When you cast it, you truncated it and cut off the leftmost bit, leaving the binary representation of the decimal number 44.
If you need an analogy, think of casting from a float like 35.6 to an int. Because ints cannot have decimal places, that cast truncates '.6' off of the float, ignoring it completely in the returned value.
Look at the binary expansion of 300:
100101100
Now chop off all but the last 8 bits (the width of a byte in Java):
00101100
Now convert that back to an integer value:
44
Note that you have to be careful about sign bits. You can't just take the remainder after dividing by 256. For instance:
byte nx = (byte)400;
System.out.println(nx);
will print -112 (not 144). That's because the bit pattern after the cast is
10010000
and the left-most (eighth) bit is treated as the sign bit in the two's complement representation of -112.

Unary "~" operator - What exactly is happening here?

I recently did a Java course (1 week crash course), and we covered some binary mathematics.
This unary ~ operator (tilde I think it's called?) was explained to us thus:
It inverts the bit pattern turning every "0" into a "1" and every "1" into a "0".
e.g. There are 8 bits to a byte. If you have the following byte: 00000000 the inverted value would change to become 11111111.
The above explanation is clear and concise, and totally makes sense to me. Until, that is, I try to implement it.
Given this:
byte x = 3;
byte y = 5;
System.out.println(~x);
System.out.println(~y);
The output is:
-4
-6
I'm very confused about how this happens.
If +3 in binary is 11, then the inversion of this would be 00, which clearly isn't -3.
But as there are 8 bits in a byte, then shouldn't the binary representation of +3 be written as 00000011?
Which would invert to become 11111100. Converted back to decimal value this would be 252.
If however you write the +3 as 011, then it does indeed convert to 100, which is +4, but then how do you know it's a negative number?
How about if you try 0011, which converts to 1100, which if you use the first bit as a sign, then it does indeed become -4.
Ah - so at this point I thought I was getting somewhere.
But then I got to the second value of y = 5.
How do we write this? Using the same logic, +5 converts to binary 0101, which inverts to 1010.
And it's around now that I'm horribly confused. This looks to represent either a signed value of -2, or an unsigned value of +10 decimal? Neither of which are the -6 I'm getting printed out.
Again, if I increase the length up to the 8 digits of a byte, +5 is 00000101, which inverted becomes 11111010. And I really can't find a way to turn this into -6.
Does anyone out there understand this, as I have no idea what is happening here and the more numbers I print out the more confused I become.
Google doesn't seem to come up with anything much on this - maybe it doesn't like looking at little operator signs.. :-(
See this demonstration: -
3 -> 0011
~3 -> 1100 -> -4 (2's complement)
5 -> 0101
~5 -> 1010 -> -6 (2's complement)
Since signed integers are stored as 2's complement, taking 2's complement of 1100 gives you 4. Now since 1100 is a negative number. So, the result is -4. Same is the case with 1010.
1100
0011 - 1's complement
0100 - 2's complement - value = 4 (take negative)
From wikipedia: In two's complement notation, a non-negative number is represented by its ordinary binary representation; in this case, the most significant bit is 0. The two's complement operation is the negation operation, so negative numbers are represented by the two's complement of the absolute value.
To get the two's complement of a binary number, the bits are inverted, or "flipped", by using the bitwise NOT operation; the value of 1 is then added to the resulting value, ignoring the overflow which occurs when taking the two's complement of 0. http://en.wikipedia.org/wiki/Two%27s_complement
So if you have 0101 which is +5 the inverse of that is 1010, which is -5.
You don't really read the 010 as a 5 though, but when you see the 1 at the beginning, you know that to get the number you have to invert the rest of the digits again to get the positive number which you want to negate. If that makes sense.
It's a bit of an alien concept if you have not worked with it before. It's certainly not the way that decimal numbers work, but it is actually simple once you see what happening.
A value of 8 decimal is written as 01010, which negates to 10101. The first digit (1) means it's negative, and then you flip the rest back to get the numeric value: 1010.
One thing to remember is that Two's complement is not the same as ordinary old binary system counting. In normal binary the value of 10101 (which in Two's complement is -8 as above) is of course 21. I guess this is where the confusion comes - how do you tell the difference by looking at them? You must know which representation has been used in order to decide what the value of the number actually is. There is also One's complement which differs slightly.
A good tutorial on binary maths, including One's and Two's complement is given here. http://www.math.grin.edu/~rebelsky/Courses/152/97F/Readings/student-binary
Signed integers are almost universally stored using twos complement. This means inverting the bits (taking the one's complement) and adding one. This way you don't have two representations of integer zero (+0 and -0), and certain signed operations become easier to implement in hardware.
Java uses signed numbers in Two's complement. Your reasoning would be correct in C or other languages when using types as "unsigned int" or "unsigned char".

Calculate lat/long

Can any please help me to find out the algorithm of calculating latitude and longitude from 8 byte value?
8- byte value - A027AFDF5D984840 and It's longitude is 49.1903648
8- byte value - 3AC7253383DD4B40 and It's latitude is 55.7305664
Also if possible then please tell me how to calculate floating point value of 0000000000805A40 as 106.0
This isn't specific to longitude and latitude calculation. Given that you have 8 bytes to work with in your examples, what you are looking to do is convert a byte array to double precision floating point values. The process is the same for all three examples.
To find the floating point value, you need to deal with the byte array at the bit level. An explanation of the process is here. Or check out Wikipedia.
To get you started, in your first example, you will be dealing with the bytes in the order of 40, 48, 98, 5D and so on. Breaking down the first two bytes, 40 and 48 (0100 0000 0100 1000), you have enough information to get the sign bit (0) and the exponent bit range (100 0000 0100 -> 1028). From here, continue listing out the 52 remaining bits to determine the fractional portion of the number. If you follow the calculations at the links above, you will see the floating point values you expect from your examples.
As a side note, some programming languages provide a method of doing this conversion for you. Depending on what language you are using, there is no need to reinvent the wheel here.
EDIT: Example
To convert the byte array to a double precision floating point value, we will need three pieces of information from the byte array: the sign bit (S), the exponent (E) and the fraction (F). The first thing to do is create your 64 bit representation of the 8 byte array. As I mentioned above, you will use the bytes in "reverse" order (little-endian if you want to do reading on the topic). I will use only the first four bytes as they will be enough to illustrate the process.
40 48 98 5D ==> 01000000 01001000 10011000 01011101
I will refer to the bits above in the order of 0 being leftmost.
Sign bit:
This is the 0th bit in the array above. In this case, S is 0. The sign bit is exactly what you'd think it would be, it determines whether the floating point result will be negative or positive.
Exponent:
Bits 1-11 determine the exponent. In the example, the exponent E is 100 0000 0100 ==> 1028.
Fraction:
The remaining bits, 12-63 determine the fraction. I only illustrated bits 12-31 to show how the process works: 1000 10011000 01011101....
The fractional bits are not converted to a decimal value. Instead, you need to iterate through them paying attention to the bits which are set (1, not 0). The index of the bits is what is important here. Consider the fraction bits indexed starting at 1 increasing from left to right to 20. Again, in the full example (bits 12-63), this indexing would be 1 to 52.
The fraction is found by summing each i'th bit that is set using this expression: 2^-i. For this example this means we are dealing with indexes 1,5,8,9,14,16,17,18,20.
The first four indexes will give us enough precision for the purposes of the example:
F = (2^-1) + (2^-5) + (2^-8) + (2^-9) + ... = 0.5 + 0.03125 + 0.00390625 + 0.001953125 = 0.537109375
The final value V is found by applying the formula:
V = -1^S * 2^(E-1023) * (1 + F) = -1^0 + 2^(1028-1023) * 1.537109375 = 49.1875
This is a good approximation of your goal value of 49.1903648. If you were to continue using the full fractional bit range in the manner I showed you, your final value will match.
Lastly, since you mentioned you are using Java, have you taken a look at using the ByteBuffer class and the getDouble function here?

Categories