Java file operations [duplicate] - java

This question already has answers here:
What does << mean in Java?
(8 answers)
Closed 8 years ago.
I'm doing an assignment using Java, and our lecturers provided an example code for us. I'm unable to understand some of the code so can anyone explain?
private long magic;
private int minor;
magic = file.readUnsignedShort() << 16 | file.readUnsignedShort();
mnior = file.readUnsignedShort();
I don't understand why did he use "readUnsignedShort" for both of them and why did he add
"<< 16 | file.readUnsignedShort()" for magic. Any help will be much appreciated

A short uses 16 bits, an int uses 32 and a long uses 64 bits.
Given this, there are not unsigned values in Java, so if the most significant bit is 1, it means the value is negative.
Splitting the code you have:
file.readUnsignedShort() <- reads 16 bits
<< 16 <- moves them 16 positions "to the left", adding zeros (it's like multiply by 2^16)
| file.readUnsignedShort(); <- in those 16 zeros, puts the next 16 bites read, using the OR operation, which works as follows:
xxxx0000 | 0000YYYY = xxxxYYYY

Okay, let's start at the beginning.
RandomAccessFile.readUnsignedShort() reads 16 bits of information from the file, without interpreting it as a 2s complement representation.
So in the first line you do this: <first two bytes of the file> << 16 | <third and fourth bytes of the file>
<<16 means shift the number left by 16 bits. For unsigned numbers, this is the equivalent of multiplying by 65536.
| means do a bitwise or. Because we know the first number was shifted, in our case this is equivalent to just adding the numbers.
So this is what it does, why it does it is anybody's guess. You should ask your teacher really, but my guess is that it might have something to do with the fact that this is the only way to read a 32-bit unsigned number via RandomAccessFile.

This operation is like concat for bits.
short is 16 bit and integer is 32 bit long. Concatenating 2 short you will obtain a integer.
The << Operator stands for bitshift, in this way you are shifting the bits by sixteen positions.
Example:
NUMBER1 1111 1111 1111 1111
NUMBER2 0000 0000 0000 0000
NUMBER1 << 16 | NUMBER2 = 1111 1111 1111 1111 0000 0000 0000 0000
(thanks for hints)

magic = file.readUnsignedShort() << 16 | file.readUnsignedShort();
<< is bit shift operation
| is bitwise operation.
Read more here.

Related

How does BitSet's set method work with bits shifting to the left?

Java's BitSet class has a method Set that sets a single bit to 1 (=true). The method source code is as follows:
public void set(int bitIndex) {
if (bitIndex < 0)
throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);
int wordIndex = wordIndex(bitIndex);
expandTo(wordIndex);
words[wordIndex] |= (1L << bitIndex); // Restores invariants
checkInvariants();
}
In addition to the checks, the method's core code is: words[wordIndex] |= (1L << bitIndex). I can clearly see in the assignment that the left part is the specific word that holds the relevant bit. However, I don't understand how the right part (the shifting left of the bit index) cause the setting of the requested (and only it) bit to 1. Could you please explain?
1L << bitIndex produces a long, whose bits are all 0s, except for one of the bits. The position of the "1" bit is determined by bitIndex. For example, if bitIndex is 10, then the 11th least significant bit is 1.
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000
Because the 1 is shifted to the left 10 times. In general, the (bitIndex mod 64 + 1)th least significant bit is the "1" bit.
This mask is then bitwise OR'ed with whatever is in words[wordIndex]. Every bit in words[wordIndex] stays the same because they are OR'ed with 0, except for the one place where it is a "1" in the mask. That bit in words[wordIndex] will become "1" no matter what it was originally, due to how OR works.
There are two parts of interest: 1L << bitIndex and words[wordIndex] |=.
The first part – 1L << bitIndex – shifts a bit some number of spots. Here's a table showing how that plays out with a few different numbers of shifting.
statement
decimal
binary
1L << 0
1
1
1L << 1
2
10
1L << 2
4
100
1L << 3
8
1000
1L << 4
16
10000
The second part – words[wordIndex] |= – takes whatever value was already at words[wordIndex] and uses bitwise "or" to include whatever bit was isolated from above. If that bit was already set (for that word), this operation wouldn't change anything. If the bit was 0, the "or" operation (|=) would set the bit to 1.
Remember that the left side is being ORed (|) with the bit and the right most bit (or the Least Significant Bit - LSB) is bit 0. An OR operator will set the specified one bit(s) in the source, to the same bits in the target, leaving the other bits unchanged. Here are some examples.
int bits = 1;
System.out.println(Integer.toBinaryString(bits));
bits |= 0b100_0000_0000; // set bit 10 to 1.
System.out.println(Integer.toBinaryString(bits));
prints
1
10000000001
^
bit 10
or do it like this
System.out.println(Integer.toBinaryString(bits));
bits = 1;
// or it can be done this way.
bits |= 1<<10; // 10 is the bitIndex in bits to set so it is shifted
// left to that position and the result is as before
System.out.println(Integer.toBinaryString(bits));
prints
10000000001
^
bit 10

Java - How to Extract Certain Portions of Bits Using Bit Mask?

This is the first time I am looking at bitwise operations and bit mask. I have seen some examples using C and C++, but I am looking for clarification using Java.
I have the following 32-bit string
0000 0000 1010 0110 0011 1000 0010 0000
But I want to extract bits 21 through 25 in bold (00101) and ignore everything else. In the end, I want my output result to be integer 5 (0101 in binary)
Using bit mask, what approach I could use to get just that bit portion as my output result?
If you really have a String, take the two substrings (00 and 101), concatenate them, and use Integer.parseInt with a radix of two:
int bits = Integer.parseInt(str.substring(7, 9) + str.substring(10, 13), 2);
If you have an int, use a logical right shift, then use & with a bit mask (0b signifies binary notation).
int bits = (n >>> 21) & 0b11111;

Two Ways to Interpret Integer Overflow

I have read here (https://stackoverflow.com/a/27762490/4415632) that when integer overflow occurs, the most significant bits are simply cut off.
However, I have also read here (https://stackoverflow.com/a/27747180/3808877) that when overflow occurs, "the value becomes the minimum value of the type, and start counting up again." Which one is correct, or are both answers correct? If so, can anyone show me why those two interpretations are equivalent to each other?
Both are correct, it depends on context. One is the result of casting and one is the result of overflow. Those are different operations. For example, if you cast Long.MAX_VALUE to an int that is a cast operation
System.out.println((int) Long.MAX_VALUE); // <-- -1
If you overflow an int by adding one to Integer.MAX_VALUE then
System.out.println(Integer.MAX_VALUE + 1); // <-- Integer.MIN_VALUE
Both interpretations are correct, because they are actually the same.
Let's look at the maths to see why.
Java stores values in byte, short, char, int and long in a format called two's complement.
In case of byte, short, int and long it is signed, in case of char it is unsigned.
One of the attributes of the two's complement format is that for most operations it does not matter whether the value is interpreted as signed or unsigned as the resulting bit pattern would be the same.
To shorten things, I'll explain it using byte, but the other types work along the same scheme.
A byte has 8 bits. The topmost bit is interpreted as sign bit. So, the bit pattern goes like this:
snnn nnnn
The separation into two groups of 4 bits each is called Nibble and is performed here for pure readability. As a side note, a nibble can be represented by a hexadecimal digit.
So there are 8 bits in a byte, and each bits could be 0 or 1. This leaves us with 2^8 = 256 different values that could be stored in a byte.
Here are some sample values:
0000 0000 -> 0
0000 0001 -> 1
0000 0010 -> 2
0100 0000 -> 64
0111 1111 -> 127
1000 0000 -> -128
1111 1110 -> -2
1111 1111 -> -1
The 2's complement value of signed numbers which are negative, i.e. the sign bit is set, is created by taking the positive value of the 8 bits and subtracting the range, i.e. in case of a byte by subtracting 256.
Now let's see what happens if you take -1 and add 1.
1111 1111 -1 / 255
+ 0000 0001 1
--------------
= 1 0000 0000 -0 / 256 intermediate result
= 0000 0000 0 / 256 result after dropping excess leading bits
There is an overflow. The result would need 9 bits now, but the byte only has 8 bits, so the most significant bit is lost.
Let's look at another example, -1 plus -1.
1111 1111 -1 / 255
+ 1111 1111 -1 / 255
--------------
= 1 1111 1110 -2 / 510 intermediate result
= 1111 1110 -2 / 254 result after dropping excess leading bits
Or this, 127 plus 5.
0111 1111 127
+ 0000 0101 5
--------------
= 1000 0100 132 / -124
As we can see, the leading bits are dropped and this actually is what leads to the effect that causes it to overflow by "starting to count from the minimum value again".
I add another option: a processor trap. Some processors will generate a trap on integer overflows. When available, this feature usually can be enabled in user mode by setting a bit in the processor status register.

Binary presentation of negative integer in Java

Please, help me to understand binary presentation of negative integers.
For example we have 5.
Binary presentation of 5 is 00000000.00000000.00000000.00000101.
And as I understand binary presentation of -5 should be like 10000000.00000000.00000000.00000101.
But output is 11111111.11111111.11111111.11111011.
I have 2 question:
1) Why here is so much 1 bits.
2) What I really cant understand it last 3 bits 011. It looks like 3. Even +1 or -1 it'll be 100 or 010
Thanks
Your understanding of what those negative numbers should look like is flawed. Java uses two's complement for negative numbers and the basic rule is to take the positive, invert all bits then add one. That gets you the negative.
Hence five is, as you state:
0000...00000101
Inverting that gives you:
1111...11111010
Then adding one gives:
1111...11111011
The bit pattern you have shown for -5 is what's called sign/magnitude, where you negate a number simply by flipping the leftmost bit. That's allowed in C implementations as one of the three possibilities(a), but Java uses two's complement only (for its negative integers).
(a) But keep in mind there are current efforts in both C and C++ to remove the other two encoding types and allow only two's complement.
And as I understand binary presentation of -5 should be like 10000000.00000000.00000000.00000101.
That would be right if Java used a Sign and Magnitude representation for integers. However, Java uses Two's Complement representation, so the rest of the bits are changed in accordance with the rules of that representation.
The idea behind two's complement representation is that when you add a number in such representation to another value dropping the extra bit on the most significant end, the result would be as if you subtracted a positive number of the same magnitude.
You can illustrate this with decimal numbers. In a two-digit representation, the value of 99 would behave like -1, 98 would be like -2, 97 like -3, and so on. For example, if you drop the top digit in 23 + 99 = [1]22, so 99 behaved like -1. 23 + 98 = [1]21, so 98 behaved like -2.
This works the same way with two's complement representation of binary numbers, except you would drop the extra bit at the top.
http://en.wikipedia.org/wiki/Two%27s_complement
The way negative numbers are stored is that the most significant bit (e.g. the bit representing 2^31 for a 32 bit number) is regarded as negative. So if you stored all 1s, you would add up
(-2^31) + 2^30 + 2^29 + ... + 2^1 + 2^0
which makes -1.
Small negative numbers will be mostly ones under this representation.
Here is an example for 2's compliment:
If you have -30, and want to represent it in 2's complement, you take the binary representation of 30:
0000 0000 0000 0000 0000 0000 0001 1110
Invert the digits.
1111 1111 1111 1111 1111 1111 1110 0001
And add one.
1111 1111 1111 1111 1111 1111 1110 0010
Converted back into hex, this is 0xFFFFFFE2. And indeed, suppose you have this code:
#include <stdio.h>
int main() {
int myInt;
myInt = 0xFFFFFFE2;
printf("%d\n",myInt);
return 0;
}
That should yield an output of -30. Try it out if you like.
With two's complement it's true that a MSB of 1 indicates a negative number. But the remaining bits are not the binary representation of its value. On the other hand, if the MSB is 0 the remaining bits represent the binary value. But it cannot be said that the number is positive then. Zero is neither positive nor negative.
This picture helped me to understand the principle when I started to learn that there are more representations of numbers than with 0..9:
0
-1 000 1
111 001
-2 110 010 2
101 011
-3 100 3
-4

Understanding Java bytes

So at work yesterday, I had to write an application to count the pages in an AFP file. So I dusted off my MO:DCA spec PDF and found the structured field BPG (Begin Page) and its 3-byte identifier. The app needs to run on an AIX box, so I decided to write it in Java.
For maximum efficiency, I decided that I would read the first 6 bytes of each structured field and then skip the remaining bytes in the field. This would get me:
0: Start of field byte
1-2: 2-byte length of field
3-5: 3-byte sequence identifying the type of field
So I check the field type and increment a page counter if it's BPG, and I don't if it's not. Then I skip the remaining bytes in the field rather than read through them. And here, in the skipping (and really in the field length) is where I discovered that Java uses signed bytes.
I did some googling and found quite a bit of useful information. Most useful, of course, was the instruction to do a bitwise & to 0xff to get the unsigned int value. This was necessary for me to get a length that could be used in the calculation for the number of bytes to skip.
I now know that at 128, we start counting backwards from -128. What I want to know is how the bitwise operation works here--more specifically, how I arrive at the binary representation for a negative number.
If I understand the bitwise & properly, your result is equal to a number where only the common bits of your two numbers are set. So assuming byte b = -128, we would have:
b & 0xff // 128
1000 0000-128
1111 1111 255
---------
1000 0000 128
So how would I arrive at 1000 0000 for -128? How would I get the binary representation of something less obvious like -72 or -64?
In order to obtain the binary representation of a negative number you calculate two's complement:
Get the binary representation of the positive number
Invert all the bits
Add one
Let's do -72 as an example:
0100 1000 72
1011 0111 All bits inverted
1011 1000 Add one
So the binary (8-bit) representation of -72 is 10111000.
What is actually happening to you is the following: You file has a byte with value 10111000. When interpreted as an unsigned byte (which is probably what you want), this is 88.
In Java, when this byte is used as an int (for example because read() returns an int, or because of implicit promotion), it will be interpreted as a signed byte, and sign-extended to 11111111 11111111 11111111 10111000. This is an integer with value -72.
By ANDing with 0xff you retain only the lowest 8 bits, so your integer is now 00000000 00000000 00000000 10111000, which is 88.
What I want to know is how the bitwise operation works here--more specifically, how I arrive at the binary representation for a negative number.
The binary representation of a negative number is that of the corresponding positive number bit-flipped with 1 added to it. This representation is called two's complement.
I guess the magic here is that the byte is stored in a bigger container, likely a 32 bit int. And if the byte was interpreted as being a signed byte it gets expanded to represent the same number in the 32 bit int, that is if the most significant bit (the first one) of the byte is a 1 then in the 32 bit int all the bits left of that 1 are also turned to 1 (that's due to the way negative numbers are represented, two's complement).
Now, if you & 0xFF that int you cut off those 1's and end up with a "positive" int representing the byte value you've read.
Not sure what you really want :) I assume you are asking how to extract a signed multi-byte value? First, look at what happens when you sign extend a single byte:
byte[] b = new byte[] { -128 };
int i = b[0];
System.out.println(i); // prints -128!
So, the sign is correctly extendet to 32 bits without doing anything special. The byte 1000 0000 extends correctly to 1111 1111 1111 1111 1111 1111 1000 0000.
You already know how to suppress sign extension by AND'ing with 0xFF - for multi byte values, you want only the sign of the most significant byte to be extendet, and the less significant bytes you want to treat as unsigned (example assumes network byte order, 16-bit int value):
byte[] b = new byte[] { -128, 1 }; // 0x80, 0x01
int i = (b[0] << 8) | (b[1] & 0xFF);
System.out.println(i); // prints -32767!
System.out.println(Integer.toHexString(i)); // prints ffff8001
You need to suppress the sign extension of every byte except the most significant one, so to extract a signed 32-bit int to a 64-bit long:
byte[] b = new byte[] { -54, -2, -70, -66 }; // 0xca, 0xfe, 0xba, 0xbe
long l = ( b[0] << 24) |
((b[1] & 0xFF) << 16) |
((b[2] & 0xFF) << 8) |
((b[3] & 0xFF) );
System.out.println(l); // prints -889275714
System.out.println(Long.toHexString(l)); // prints ffffffffcafebabe
Note: on intel based systems, bytes are often stored in reverse order (least significant byte first) because the x86 architecture stores larger entities in this order in memory. A lot of x86 originated software does use it in file formats, too.
To get the unsigned byte value you can either.
int u = b & 0xFF;
or
int u = b < 0 ? b + 256 : b;
For bytes with bit 7 set:
unsigned_value = signed_value + 256
Mathematically when you compute with bytes you compute modulo 256. The difference between signed and unsigned is that you choose different representatives for the equivalence classes, while the underlying representation as a bit pattern stays the same for each equivalence class. This also explains why addition, subtraction and multiplication have the same result as a bit pattern, regardless of whether you compute with signed or unsigned integers.

Categories