Representing relative addressing in byte code using Java

Representing relative addressing in byte code using Java - java

I'm trying to represent relative addressing mode in Java. As we know, with relative addressing the operand must take a value between -128 and +127, as relative addressing uses 2's complement to represent whether the operand should be added or subtracted from the base value.
However I'm struggling to convert this into its hexadecimal representation.
2's complement form uses the most significant bit to determine whether the bit pattern in question is positive or negative, hence the reason for the range of -128 to +127.
Let's say we have the relative address operand *+3A, which states that the user wants to add 3A to the value held in the program counter. This is simple enough as 3A converts to 00111010 binary.
But then how do I go about representing *-3A in Java? If I convert 3A to decimal, which is 58 and then negate it to get -58 and then convert that back to hexadecimal, will that work?
I'm not sure if this is just something that's really simple and I'm making it into something bigger than it actually is.

We have a couple things going on.
Two's Complement is the representation for negative integers in all modern machines. To negate an integer, you:
take the bitwise complement, and then
add one, ignoring the carry-bit.
So this needs to be done in the context of a specific word-size. If your allowable range is -128 to 127, you are dealing with an 8-bit "word".
The second issue is how you encode the result for human (or compiler) consumption. In either case, simply using a signed decimal integer is probably ideal.

Related

Rounding error from BigDecimal to byte array in Java

How should one properly handle a conversion rounding error with BigDecimal in Java:
BigDecimal -> byte[] -> BigDecimal
I have a custom datatype 32 bytes in length (yep, 32 bytes not 32 bits) and I need to decode the fractional part of BigDecimal into byte[].
I understand that I will lose some accuracy. Are there any established techniques to implement such a conversion?
NOTE:
It is fixed point datatype of form MxN, where M % 8 == N % 8 == 0

Your fixed-point fractional part can be interpreted as the numerator, n, of a fraction n/(2256). I suggest, therefore, computing the BigDecimal value representing 1/(2256) (this is exactly representable as a BigDecimal) and storing a reference to it in a final static field.
To convert to a byte[], then, use the two-arg version of BigDecimal.divideToIntegralValue() to divide the fractional part of your starting number by 1/(2256), using the MathContext argument to specify the rounding mode you want. Presumably you want either RoundingMode.HALF_EVEN or RoundingMode.HALF_UP. Then get the BigInteger unscaled value of the result (which should be numerically equal to the scaled value, since an integral value should have scale 0) via BigDecimal.unscaledValue(). BigInteger.toByteArray() will then give you a byte[] closely related to what you're after.*
To go the other way, you can pretty much reverse the process. BigDecimal has a constructor that accepts a byte[] that, again, is very closely related to your representation. Using that constructor, convert your byte[] to a BigInteger, and thence to BigDecimal via the appropriate constructor. Multiply by that stored 1/(2256) value to get the fractional part you want.
* The biggest trick here may involve twiddling signs appropriately. If your BigDecimals may be negative, then you probably want to first obtain their absolute values before converting to byte[]. More importantly, the byte[]s produced and consumed by BigInteger use a two's complement representation (i.e. with a sign bit), whereas I suppose you'll want an unsigned, pure binary representation. That mainly means that you'll need to allow for an extra bit -- and therefore a whole extra byte -- when you convert. Be also aware of byte order; check the docs of BigInteger for the byte order it uses, and adjust as appropriate.

Why does byte representation in Java store data in decimal?

I was debugging a problem where I created a byte array from binary representation of strings such as below.
While debugging the array just to see how it is stored, I could see them stored internally as decimals. Why is that?
When it gets interpreted as bytecode I am assuming it will get converted as binary. Then why not store in binary in the first place.
String[] binArray = {"10101","11100","11010","00101"};
byte[]bytes = new byte[binArray.length];
for (int i=0; i< binArray.length;i++){
bytes[i] = Byte.parseByte(binArray[i],2);
}
I may be missing something here. Hence request your guidance.

There seems to be a very general misunderstanding.
In some sense, all data is stored "in binary form". Particularly, the integral numerical values, like byte, short, int etc., are internally all stored in binary form. This internal representation is known as the Two's complement form.
(For floating point numbers, the representation is a bit more complicated: It's the IEEE 754 representation - still, they are stored in binary form)
The key issue that has likely lead to your question is: When you just print a number, or convert them to a string, with
System.out.println(someByte);
or
String s = String.valueOf(someByte);
then by default, the decimal form is printed. Mainly because this is the most "natural" and most readable form for humans.
You can pass your bytes to Integer.html#toBinaryString to create the string representation of their binary representation:
System.out.println(Integer.toBinaryString(someByte));

Well, the JVM, just like any program doesn't store numeric values as "decimal" or "hexadecimal". It's just a bit-pattern in memory. Your debugger displays the value in decimal format. This is just for convenience, since most people prefer decimal to binary format for readability. For the computer itself it's just a bit-string of 8 bits length, stored as binary-value.

While debugging the array just to see how it is stored, I could see them stored internally as decimals. Why is that?
You are seeing the byte value displayed in decimal. The debugger is not displaying the internal value.
When it gets interpreted as bytecode I am assuming it will get converted as binary. Then why not store in binary in the first place.
It is stored in binary.

Why is a double always 8 bytes and an int always 4 bytes, even if the int has more digits?

I don't understand how an int 63823, takes up less space than a double 1.0. Is there not more information stored in the int, in this particular instance?

I don't understand how an int 63823, takes up less space than a double 1.0. Is there not more information stored in the int, in this particular instance?
Good question. What you're seeing when you see 63823 and 1.0 is a representation of the underlying data, you are not seeing the underlying data. It is specially formatted so that you can read it, but it is not how the machine sees it.
Java uses very special formats for representing int and double. You need to look at those representations to understand why 63823 takes thirty-two bits when represented as a Java int and 1.0 takes sixty-four bits when represented as a Java double.
In particular, 63823 as an int in Java is represented as:
00000000000000001111100101001111
and 1.0 as a double is represented in Java as:
0011111111110000000000000000000000000000000000000000000000000000
If you want to explore more, I recommend Two's Complement and What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Not exactly. The double 1.0 represents more information because, by the definition of a double as a 64 bit float, there are more values that it could be. To use your example, if you had a special data type that could only have two values, 63823 and 98321234213474932, then it would only take 1 bit to represent the number 63823, though it would be far less useful than an int.
In terms of implementation, it's often a lot easier and faster to work with fixed-size data types, so that you can allocate a fixed chunk of memory (that's what a variable is) without having to know it's value and constantly reallocate space. Examples of a variables with a different approach would be String and BigInteger, which do allocate space to accommodate their values. Note that both are immutable in Java -- that's not a coincidence.

These primitive datatypes need to be defined somewhere for you to use them. It is not a flexible container where you can stuff in whatever you want, rather more like a bottle which takes the same space no matter if full or empty. And they also have a maximum they can contain.
Read more yourself here.

The zeros that are not shown also count. Approximately, ignoring the fact that the numbers are actually stored in binary and not in decimal, when you write both numbers with the implied zero digits included, you get:
1.0 = 1.00000000000000000*10^0000
63823 = 0000063823
As you can see, 1.0 is twice as long as 63823. Therefore it requires twice as much storage.

The int and double don't have decimal digits at all. The decimal representation of the int has 8 decimal digits after removing leading zeros. The int itself has room for 32 binary digits. The double has room for 53 binary digits in the mantissa and a 10-bit exponent, and a sign bit.

Unary "~" operator - What exactly is happening here?

I recently did a Java course (1 week crash course), and we covered some binary mathematics.
This unary ~ operator (tilde I think it's called?) was explained to us thus:
It inverts the bit pattern turning every "0" into a "1" and every "1" into a "0".
e.g. There are 8 bits to a byte. If you have the following byte: 00000000 the inverted value would change to become 11111111.
The above explanation is clear and concise, and totally makes sense to me. Until, that is, I try to implement it.
Given this:
byte x = 3;
byte y = 5;
System.out.println(~x);
System.out.println(~y);
The output is:
-4
-6
I'm very confused about how this happens.
If +3 in binary is 11, then the inversion of this would be 00, which clearly isn't -3.
But as there are 8 bits in a byte, then shouldn't the binary representation of +3 be written as 00000011?
Which would invert to become 11111100. Converted back to decimal value this would be 252.
If however you write the +3 as 011, then it does indeed convert to 100, which is +4, but then how do you know it's a negative number?
How about if you try 0011, which converts to 1100, which if you use the first bit as a sign, then it does indeed become -4.
Ah - so at this point I thought I was getting somewhere.
But then I got to the second value of y = 5.
How do we write this? Using the same logic, +5 converts to binary 0101, which inverts to 1010.
And it's around now that I'm horribly confused. This looks to represent either a signed value of -2, or an unsigned value of +10 decimal? Neither of which are the -6 I'm getting printed out.
Again, if I increase the length up to the 8 digits of a byte, +5 is 00000101, which inverted becomes 11111010. And I really can't find a way to turn this into -6.
Does anyone out there understand this, as I have no idea what is happening here and the more numbers I print out the more confused I become.
Google doesn't seem to come up with anything much on this - maybe it doesn't like looking at little operator signs.. :-(

See this demonstration: -
3 -> 0011
~3 -> 1100 -> -4 (2's complement)
5 -> 0101
~5 -> 1010 -> -6 (2's complement)
Since signed integers are stored as 2's complement, taking 2's complement of 1100 gives you 4. Now since 1100 is a negative number. So, the result is -4. Same is the case with 1010.
1100
0011 - 1's complement
0100 - 2's complement - value = 4 (take negative)

From wikipedia: In two's complement notation, a non-negative number is represented by its ordinary binary representation; in this case, the most significant bit is 0. The two's complement operation is the negation operation, so negative numbers are represented by the two's complement of the absolute value.
To get the two's complement of a binary number, the bits are inverted, or "flipped", by using the bitwise NOT operation; the value of 1 is then added to the resulting value, ignoring the overflow which occurs when taking the two's complement of 0. http://en.wikipedia.org/wiki/Two%27s_complement
So if you have 0101 which is +5 the inverse of that is 1010, which is -5.
You don't really read the 010 as a 5 though, but when you see the 1 at the beginning, you know that to get the number you have to invert the rest of the digits again to get the positive number which you want to negate. If that makes sense.
It's a bit of an alien concept if you have not worked with it before. It's certainly not the way that decimal numbers work, but it is actually simple once you see what happening.
A value of 8 decimal is written as 01010, which negates to 10101. The first digit (1) means it's negative, and then you flip the rest back to get the numeric value: 1010.
One thing to remember is that Two's complement is not the same as ordinary old binary system counting. In normal binary the value of 10101 (which in Two's complement is -8 as above) is of course 21. I guess this is where the confusion comes - how do you tell the difference by looking at them? You must know which representation has been used in order to decide what the value of the number actually is. There is also One's complement which differs slightly.
A good tutorial on binary maths, including One's and Two's complement is given here. http://www.math.grin.edu/~rebelsky/Courses/152/97F/Readings/student-binary

Signed integers are almost universally stored using twos complement. This means inverting the bits (taking the one's complement) and adding one. This way you don't have two representations of integer zero (+0 and -0), and certain signed operations become easier to implement in hardware.

Java uses signed numbers in Two's complement. Your reasoning would be correct in C or other languages when using types as "unsigned int" or "unsigned char".

Why do integers in Java integer not use all the 32 or 64 bits?

I was looking into 32-bit and 64-bit. I noticed that the range of integer values that can stored in 32 bits is ±4,294,967,295 but the Java int is also 32-bit (If I am not mistaken) and it stores values up to ±2 147 483 648. Same thing for long, it stores values from 0 to ±2^63 but 64-bit stores ±2^64 values. How come these values are different?

Integers in Java are signed, so one bit is reserved to represent whether the number is positive or negative. The representation is called "two's complement notation." With this approach, the maximum positive value represented by n bits is given by
(2 ^ (n - 1)) - 1
and the corresponding minimum negative value is given by
-(2 ^ (n - 1))
The "off-by-one" aspect to the positive and negative bounds is due to zero. Zero takes up a slot, leaving an even number of negative numbers and an odd number of positive numbers. If you picture the represented values as marks on a circle—like hours on a clock face—you'll see that zero belongs more to the positive range than the negative range. In other words, if you count zero as sort of positive, you'll find more symmetry in the positive and negative value ranges.
To learn this representation, start small. Take, say, three bits and write out all the numbers that can be represented:
0
1
2
3
-4
-3
-2
-1
Can you write the three-bit sequence that defines each of those numbers? Once you understand how to do that, try it with one more bit. From there, you imagine how it extends up to 32 or 64 bits.
That sequence forms a "wheel," where each is formed by adding one to the previous, with noted wraparound from 3 to -4. That wraparound effect (which can also occur with subtraction) is called "modulo arithemetic."

In 32 bit you can store 2^32 values. If you call these values 0 to 4294967295 or -2147483648 to +2147483647 is up to you. This difference is called "signed type" versus "unsigned type". The language Java supports only signed types for int. Other languages have different types for an unsigned 32bit type.
NO laguage will have a 32bit type for ±4294967295, because the "-" part would require another bit.

That's because Java ints are signed, so you need one bit for the sign.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.