I am playing around on how to manipulate bytes from an inputted Hex number. Data is a Hex:
0x022DA822 == 10001011011010100000100010. After I run the following code:
byte mask= (byte) data;
mask will = 100010, only those last bits. How come it only shows the last 6 bits or 22 in the hex?
Does it mask the first 20 bits by default?
Your cast is causing a loss of data. A byte can hold (you guessed it), one byte of data. Thus the range of a byte is [-128, 127]. Note that the most significant bit is reserved as the sign bit. So basically when you are saying: (byte)data, you are converting your hex data into a variable of type byte, which has a smaller range than your hex string. And thus only the last byte of your data can be stored in the byte.
Related
I am currently going through Java I/O tutorial and having hard time understanding the read() method of FileInputStream class. I know that per documantion that read() method reads "byte" of data from stream and returns an integer representing the byte (between 0 and 256) or -1 if it reaches the end of file.
Byte in java has a range between -128 and 127, so, how come when I edit xanadu.txt and add ASCI symbol "ƒ" (which has a decimal value of 131), java does not complain by throwing an error that value 131 is out of range defined by byte (-128 and 127)? When I try to test this using literals I get two different results.
The following works:
byte b = 120;
int c = b;
System.out.println((char)c);
Output: x
But this does NOT work (even though it works when added to xanadu.txt):
byte b = 131;
int c = b;
System.out.println((char)c);
Output: error: incompatible types: possible lossy conversion from int to byte
byte b = 131;
I tried explicitly casting using byte: (how is this possible?)
byte b = (byte)131;
int c = b;
System.out.println((char)c);
Output: テ
I am total newbie when it comes to I/O streams, somebody please help me understand it.
EDIT: Turns out my knowledge on concepts of type casting was lacking, specifically in understanding the difference between "Widening" and "Narrowing". Reading up more about these concepts helped me understand why explicit (aka narrowing) casting works.
Allow me to explain: Look at the 3rd code block where I am explicitly casting the literal '131' to type of byte. If we are to convert the literal 131 into binary form of 32-bit signed 2's complement integer, we will get 00000000 00000000 00000000 10000011 which is 32-bits or 4 bytes. Recall that Java data type 'byte' can only hold 8-bit signed 2's complement integer, so, 131 is out of range and thus we get error "possible lossy conversion from int to byte". But, when we explicitly cast it to byte, we are 'chopping off' or correct term would be 'narrowing' the binary down to 8 bit integer. So, when we do that, then the resulting binary is 10000011 which is -125 in decimal value. Since -125 is in range of -128 and 127, byte has no issues accepting and storing it. Now when I try to story the value of byte in int c, implicit or "widening" casting takes place, where -125 in binary form of 8-bit 10000011 is converted into equivalent -125 in binary form of 32-bit 11111111 11111111 11111111 10000011. Finally, system.out is trying to output the value of (char)c which is another explicit or "narrowing" casting where its trying to shrink from 32-bit signed to 16-bit unsigned. When casting is complete, we get 11111111 10000011 in binary form. Now, when this binary is converted into character form by java, it returns テ.
I can conclude by saying that it helps converting everything into binary form and go from there. But make sure you understand encoding and 2's complement
I don't know where you got the value 131 from, but as far as I am concerned, LATIN SMALL LETTER F WITH HOOK (ƒ) is not in the original ASCII character set, but in extended ASCII, with a decimal value of 159. See here. It is also encoded in UTF-16 (how Java chars are encoded) as hex 192 (decimal value 402).
First, ensure that your text files are encoded in extended ASCII, and not UTF-8 (which is the most likely encoding). Then you can use a FileInputStream to read the file, and you will get 159.
Note that 159 is outside the range of the the Java byte type. This is fine, because read returns an int. If the text file is encoded in UTF-8 however, ƒ is encoded in 2 bytes, so read will be reading one byte at a time.
Your second code block doesn't work because as you said, byte goes from -128 to 127, so 131 obviously doesn't fit.
Your third code block forces 131 into a byte, which causes overflow and the value "wraps back around" to -125. b and c are both -125. When you cast this to a char it becomes 65411 because this conversion involves padding the whole number to 16-bits first, then treating it as an unsigned integer.
The reason why this all works when you use FileInputStream.read instead of doing these conversions yourself, is because read actually returns an int, not a byte. It's just that the int it returns will always be in the range -1 ~ 255. This is why we say "read returns a byte", but its actual return type is int.
byte b = 131; // this is 8 bits type, but >8 bits value
int c = b; // this is 32 bits type
System.out.println((char)c); // this is 16 bits type
Output: error: incompatible types: possible lossy conversion from int to byte
byte b = 131;
The two-complement encoding of 131 is:
2^7+2^1+2^0
^^^
sign bit
131 won't fit in a signed byte without an overflow in the two complement representation that is used for signed types. The highest bit=sign bit is set which gets extended when casting from byte to int.
The Java compiler notices that 131 won't fit properly in a byte which leads to the error.
I been reading about encoding Unicode Java 9 compact Strings in the last two days i am getting quite well. But there is something that i dont understand.
About byte data type
1). Is a 8-bit storage ranges from -128 to 127
Questions
1). Why Java didn't implement it like char unsigned 16 bits? i mean it would be in a range of 0.256 because from 0 to 127 only can i hold a Ascii value but what would happen if i set the value 200 a extended ascii would overflow to -56.
2). Does the negative value mean something i mean i have try a simple example using Java 11
final char value = (char)200;//in byte would overflow
final String stringValue = new String(new char[]{value});
System.out.println(stringValue);//THE SAME VALUE OF JAVA 8
I have checked the String.value variable and i see a byte array of
System.out.println(value[0]);//-56
The same questions like before arise does the -56 mean something i mean the (negative value) in other languages this overflow is detected to return to the value 200? How can Java know that -56 value is the same as 200 in char.
I have try hardest examples like codepoint 128048 and i see in String.value variable a array of bytes like this.
0 = 61
1 = -40
2 = 48
3 = -36
I know this codepoint takes 4 bytes but i get it how is transformed char[] to byte[] but i dont know how String handle this byte[] data.
Sorry if this question is simple and sorry any typing english is not my natural language thanks a lot.
Why Java didn't implement it like char unsigned 16 bits? i mean it would be in a range of 0.256 because from 0 to 127 only can i hold a Ascii value but what would happen if i set the value 200 a extended ascii would overflow to -56.
Java’s primitive data types were settled with Java 1.0 a quarter century ago. The compact strings were introduced in Java 9, less than two years ago. This new feature, which is merely an implementation detail, did not justify fundamental changes at Java’s type system.
Besides that, you are looking at one interpretation of the data stored in a byte. For the sake of representing iso-latin-1 units, it is entirely irrelevant whether interpreting the same data as Java’s built-in signed byte would result in a positive or negative number.
Likewise Java’s I/O API allows reading a file into a byte[] array and write byte[] arrays back to files and these two operations are already sufficient to copy a file losslessly, regardless of its file format which would be relevant when interpreting its content.
So the following works since Java 1.1:
byte[] bytes = "È".getBytes("iso-8859-1");
System.out.println(bytes[0]);
System.out.println(bytes[0] & 0xff);
-56
200
The two numbers, -56 and 200 are just different interpretations of the bit pattern 11001000 whereas the iso-latin-1 interpretation of a byte containing the bit pattern 11001000 is the character È.
A char value is also just an interpretation of a two byte quantity, i.e. as UTF-16 code unit. Likewise, a char[] array is a sequence of bytes in the computer’s memory with a standard interpretation.
We can also interpret other byte sequences this way.
StringBuilder sb = new StringBuilder().appendCodePoint(128048);
byte[] array = new byte[4];
StandardCharsets.UTF_16LE.newEncoder()
.encode(CharBuffer.wrap(sb), ByteBuffer.wrap(array), true);
System.out.println(Arrays.toString(array));
will print the value you’ve seen, [61, -40, 48, -36].
The advantage of using a byte[] array inside the String class is, that now, the interpretation can be chosen, to use iso-latin-1 when all characters are representable with this encoding or utf-16 otherwise.
The possible numeric interpretations are irrelevant to the string. However, when you ask “How can Java know that -56 value is the same as 200”, you should ask yourself, how does it know that the bit pattern 11001000 of a byte is -56 in the first place?
System.out.println(value[0]);
bears an actually expensive operation, compared to ordinary computer arithmetic, the conversion of a byte (or an int) to a String. This conversion operation is often overlooked as it has been defined as the default way of printing a byte, but is not more natural than a conversion to a String interpreting the value as an unsigned quantity. For further reading, I recommend Two's complement.
This is because not all bytes in a string are interpreted the same. This depends to the string's character encoding.
Example:
if a string is an UTF-8 string, its characters will be 8-bits in size.
in an UTF-16 string, its characters will be 16-bits in size.
etc...
This means, if the string is to be represented as UTF-8, the characters will be made by reading 1 byte at a time; if 16-bits, the characters will made by reading 2 bytes at a time.
Look at this code: a single byte array data is transformed to string using UTF-8 and UTF-16.
byte[] data = new byte[] {97, 98, 99, 100};
System.out.println(new String(data, StandardCharsets.UTF_8));
System.out.println(new String(data, StandardCharsets.UTF_16));
The output of this code is:
abcd // 4 bytes = 4 chars, 1 byte per char
慢捤 // 4 bytes = 2 chars, 2 byte per char
Going back to the question, what motivated the developers to do so is to reduce memory footprint on strings. Not all strings uses all the 16-bits a char offers.
EDIT: Code here
I have read that byte array in Java is used to represent a binary data. I am not able to understand this. How byte array can represent a binary data (and which can be transferred over the network and can be constructed back to original form).
Byte can have (integer) values from -128 to 127; so how does a byte array represent a binary data?
Byte can be (integer) values -128 to 127, so how does a byte
array represent a binary data?
Each byte (octet) is a sequence of eight bits, and having sequence of bytes lets us represent binary data of any length (though it's limited to per 8-bits increments).
Memory of most modern computers is addressed as a sequence of bytes, network interfaces send packets containing sequences of bytes, hard drives store sequences of bytes (but are addressable only in much larger blocks, say, 4096 bytes).
There is rarely need to access data bit-by-bit, and when needed it can be done with bitwise operators, so no data type for sequence of bits is provided by default.
So to conclude:
1 Byte == 8 bits, and Byte Array == stream of bits,
and hence represent binary data?
Yes. For example: A Byte Array of length 2 bytes is a stream of 16 bits of binary data.
I have an array of 9 bytes in Java but my function need to return an array of size 10. The difference I need to pad with Nibbles.
If a nibble is a half of a byte, can I simply add a (byte) 0 to an array at the end or adding 2 nibbles instead will make a difference? Is there a difference between 2 nibbles and a (byte) 0 in this case?
If a nibble is a half of a byte, can I simply add a (byte) 0 to an
array at the end or adding 2 nibbles instead will make a difference?
Is there a difference between 2 nibbles and a (byte) 0 in this case?
No there is no difference since a byte (8 bits) is made up 2 x nibble (4 bits).
Can I even put 8.5 bits into an array of bytes and then check the
array for "length" and get back 8.5 as value?
For 8.5 bits? At this point it will be returned as length of 2 bytes. If you mean 8.5 bytes then it's returned as 9 bytes.
Remember The smallest data-type is a byte not a nibble!! By declaring a type byte (even with no value assigned) you've automatically made 8 bits of 0000 0000. Nibbles just make it easier to visualize the bits within a byte's value (when written in hex notation).
For example the single byte value 0xF0 can be visualised as two nibbles of 1111 0000. Now for a byte 0xBA, if you wanted just one nibble you'll have to make the byte as either 0x0B or 0x0A...
I have an array of 9 bytes in Java but my function need to return an
array of size 10. The difference I need to pad with Nibbles.
Your function returns 80 bits. With 9 bytes you'll have a total 72 bits so you pad with extra 8 bits which means add one byte extra of zero value (eg: 0x00).
In the case of some 8 & half bytes, you should actually create 9 bytes (72bits) but only update the first 68 bits (8.5 bytes).
Best logic is to just declare a byte array with length of 10 (ie: 80 zero bits). Then you can update as many or as few bits as you need within this total allocated space. This way "I have an array of 9 bytes" becomes "I create an array of 10 bytes and only update 9 or even 8.5 bytes" which leaves you with automatic padding of those unused 1 or 1.5 bytes.
So your question is if nibble 0000 adjacent to nibble 0000 equals byte 00000000 ?
Yes.
When you write down the individual bits like that it should be obvious I think.
The need for padding with partial bytes will only arise when you are storing data with partial bytes. For example, let's assume I limited my character set to only a-z, A-Z, 0-9, and the space character. That would enable me to use only 4 bits to encode each character.
How can i read bits from file ? I wrote bits to file something like that:
File plik=new File("bitowo");
FileOutputStream fos=new FileOutputStream(plik);
byte[] test =new byte[2];
test[0]=(byte)01101000;
test[1]=(byte)10101010;
fos.write(test);
fos.close();
and "bitowo" has only 2 bytes but how can i read from file "bitowo" bit after bit ?
You can't read bit-by-bit. You can read byte-by-byte and then shift your byte bit-by-bit.
This:
test[0]=(byte)01101000;
test[1]=(byte)10101010;
Does not do what you think it does. Specifically, it does not write two bytes with the bit patterns that the code seems to suggest.
The number 01101000 will be interpreted as an octal integer literal, because it starts with 0. In decimal, that would be the number 295424. When you cast that to a byte, only the lower 8 bits are kept, and those happen to be 0. So the first byte in your file is 0.
The number 10101010 will be interpreted as a decimal integer literal (the number ten million, one hundred and one thousand and ten). Again, by casting it to byte, only the lower 8 bits are kept, so the second byte in your file will contain the value 18 (decimal).
If you're using Java 7, you can use binary literals in your code by prefixing the digits with 0b:
test[0]=(byte)0b01101000;
test[1]=(byte)0b10101010;
To read the two bytes back, just open the file with a FileInputStream and read two bytes from it.