How byte array in Java is used to represent a binary data? - java

I have read that byte array in Java is used to represent a binary data. I am not able to understand this. How byte array can represent a binary data (and which can be transferred over the network and can be constructed back to original form).
Byte can have (integer) values from -128 to 127; so how does a byte array represent a binary data?

Byte can be (integer) values -128 to 127, so how does a byte
array represent a binary data?
Each byte (octet) is a sequence of eight bits, and having sequence of bytes lets us represent binary data of any length (though it's limited to per 8-bits increments).
Memory of most modern computers is addressed as a sequence of bytes, network interfaces send packets containing sequences of bytes, hard drives store sequences of bytes (but are addressable only in much larger blocks, say, 4096 bytes).
There is rarely need to access data bit-by-bit, and when needed it can be done with bitwise operators, so no data type for sequence of bits is provided by default.
So to conclude:
1 Byte == 8 bits, and Byte Array == stream of bits,
and hence represent binary data?
Yes. For example: A Byte Array of length 2 bytes is a stream of 16 bits of binary data.

Related

Java 11 Compact Strings magic behind char[] to byte[]

I been reading about encoding Unicode Java 9 compact Strings in the last two days i am getting quite well. But there is something that i dont understand.
About byte data type
1). Is a 8-bit storage ranges from -128 to 127
Questions
1). Why Java didn't implement it like char unsigned 16 bits? i mean it would be in a range of 0.256 because from 0 to 127 only can i hold a Ascii value but what would happen if i set the value 200 a extended ascii would overflow to -56.
2). Does the negative value mean something i mean i have try a simple example using Java 11
final char value = (char)200;//in byte would overflow
final String stringValue = new String(new char[]{value});
System.out.println(stringValue);//THE SAME VALUE OF JAVA 8
I have checked the String.value variable and i see a byte array of
System.out.println(value[0]);//-56
The same questions like before arise does the -56 mean something i mean the (negative value) in other languages this overflow is detected to return to the value 200? How can Java know that -56 value is the same as 200 in char.
I have try hardest examples like codepoint 128048 and i see in String.value variable a array of bytes like this.
0 = 61
1 = -40
2 = 48
3 = -36
I know this codepoint takes 4 bytes but i get it how is transformed char[] to byte[] but i dont know how String handle this byte[] data.
Sorry if this question is simple and sorry any typing english is not my natural language thanks a lot.
Why Java didn't implement it like char unsigned 16 bits? i mean it would be in a range of 0.256 because from 0 to 127 only can i hold a Ascii value but what would happen if i set the value 200 a extended ascii would overflow to -56.
Java’s primitive data types were settled with Java 1.0 a quarter century ago. The compact strings were introduced in Java 9, less than two years ago. This new feature, which is merely an implementation detail, did not justify fundamental changes at Java’s type system.
Besides that, you are looking at one interpretation of the data stored in a byte. For the sake of representing iso-latin-1 units, it is entirely irrelevant whether interpreting the same data as Java’s built-in signed byte would result in a positive or negative number.
Likewise Java’s I/O API allows reading a file into a byte[] array and write byte[] arrays back to files and these two operations are already sufficient to copy a file losslessly, regardless of its file format which would be relevant when interpreting its content.
So the following works since Java 1.1:
byte[] bytes = "È".getBytes("iso-8859-1");
System.out.println(bytes[0]);
System.out.println(bytes[0] & 0xff);
-56
200
The two numbers, -56 and 200 are just different interpretations of the bit pattern 11001000 whereas the iso-latin-1 interpretation of a byte containing the bit pattern 11001000 is the character È.
A char value is also just an interpretation of a two byte quantity, i.e. as UTF-16 code unit. Likewise, a char[] array is a sequence of bytes in the computer’s memory with a standard interpretation.
We can also interpret other byte sequences this way.
StringBuilder sb = new StringBuilder().appendCodePoint(128048);
byte[] array = new byte[4];
StandardCharsets.UTF_16LE.newEncoder()
.encode(CharBuffer.wrap(sb), ByteBuffer.wrap(array), true);
System.out.println(Arrays.toString(array));
will print the value you’ve seen, [61, -40, 48, -36].
The advantage of using a byte[] array inside the String class is, that now, the interpretation can be chosen, to use iso-latin-1 when all characters are representable with this encoding or utf-16 otherwise.
The possible numeric interpretations are irrelevant to the string. However, when you ask “How can Java know that -56 value is the same as 200”, you should ask yourself, how does it know that the bit pattern 11001000 of a byte is -56 in the first place?
System.out.println(value[0]);
bears an actually expensive operation, compared to ordinary computer arithmetic, the conversion of a byte (or an int) to a String. This conversion operation is often overlooked as it has been defined as the default way of printing a byte, but is not more natural than a conversion to a String interpreting the value as an unsigned quantity. For further reading, I recommend Two's complement.
This is because not all bytes in a string are interpreted the same. This depends to the string's character encoding.
Example:
if a string is an UTF-8 string, its characters will be 8-bits in size.
in an UTF-16 string, its characters will be 16-bits in size.
etc...
This means, if the string is to be represented as UTF-8, the characters will be made by reading 1 byte at a time; if 16-bits, the characters will made by reading 2 bytes at a time.
Look at this code: a single byte array data is transformed to string using UTF-8 and UTF-16.
byte[] data = new byte[] {97, 98, 99, 100};
System.out.println(new String(data, StandardCharsets.UTF_8));
System.out.println(new String(data, StandardCharsets.UTF_16));
The output of this code is:
abcd // 4 bytes = 4 chars, 1 byte per char
慢捤 // 4 bytes = 2 chars, 2 byte per char
Going back to the question, what motivated the developers to do so is to reduce memory footprint on strings. Not all strings uses all the 16-bits a char offers.
EDIT: Code here

What is Java Byte Array and How should it be used?

What does it mean by byte array ? I mean it holds the 0s and 1s just like how data is hold in memory ?
For example
String a = "32";
byte [] arr = a.getBytes() ;
What does exist now inside arr array,why and when to use it?
By byte array, it literally means an array where each item is of the byte primitive data type. If you do not know the difference between a byte and a common int (Integer), the main difference is the bit width: bytes are 8-bit and integers are 32-bit. You can read up on that here.
If you do not know what an array is, an array is basically a sequence of items (in your case a sequence of bytes, declared as byte[]).
The function a.getBytes() takes a, which is a String, and returns an array of bytes. This can be done because the human-readable characters in a String can be represented as 8-bit numbers, where the mapping between number and character is determined by the CharSet. Examples of two common CharSets are ASCII and UTF-8. Now, arr is an array of bytes, where each byte in the array represents each character in the original string a. In both ASCII and UTF-8, the String "32" is represented by the bytes 51 and 50 in decimal, and 0x33 and 0x32 in hexadecimal.
Byte arrays are commonly used in applications that read and write data byte-wise, such as socket connections that send data in byte streams through TCP or UDP protocols.
Hope I could help!

Bit masking java, only showing last 6 bites of a hex

I am playing around on how to manipulate bytes from an inputted Hex number. Data is a Hex:
0x022DA822 == 10001011011010100000100010. After I run the following code:
byte mask= (byte) data;
mask will = 100010, only those last bits. How come it only shows the last 6 bits or 22 in the hex?
Does it mask the first 20 bits by default?
Your cast is causing a loss of data. A byte can hold (you guessed it), one byte of data. Thus the range of a byte is [-128, 127]. Note that the most significant bit is reserved as the sign bit. So basically when you are saying: (byte)data, you are converting your hex data into a variable of type byte, which has a smaller range than your hex string. And thus only the last byte of your data can be stored in the byte.

Java- gzip member trailer

This is part of a larger assignment that I've mostly got done except for this one part, which is a bit embarrassing because it sounds really simply on paper.
So basically, I've got a large amount of compressed data. I've been keeping track of the length using a CRC32
CRC32 checksum = new CRC32();
...
//read input into buffer
checksum.update(buff, 0, bytesRead);
So it updates everytime more info is read in. I've also kept track of the uncompress length using
uncompressedLength += manage.read(buff);
So it is an int value that has the number of bytes of the original file. This is a little Endian machine.
From what I can tell, what I need is four byte CRC, which I used
public byte[] longToBytes(long x) {
ByteBuffer buffer = ByteBuffer.allocate(8);
buffer.putLong(x);
return buffer.array();
}
byte[] c = longToBytes(checksum.getValue());
BUT this is 8 bytes. CRC32.getValue returns a long. Can I convert it to an int in this case without losing information I need?
And then the ISIZE is supposed to be...the four byte compressed length modulo 2^32. I've got the variable uncompresedLength which is an int. I think I just have to convert it to bytes and that's all?
I've been hexdumping the result from gzip and the result from my program and my header and data are right, I'm just missing my trailer.
As for why I'm doing this manually, it's because of an assignment. Trust me, I'd love to just use GZIPOoutputStream if I could.
CRC32 has 32 bits... the class returns long because of the super interface.
uncompressed length should be long, since nowadays files larger than 2G isn't uncommon.
so in both cases, you need to convert the lowest 32 bits of a long to 4 bytes.
static byte[] lower4bytes(long v)
{
return new byte[] {
(byte)(v ),
(byte)(v>> 8),
(byte)(v>>16),
(byte)(v>>24)
};
}
To write an integer in little-endian form, simply write the low byte of the integer (i.e. modulo 256 or anded with 0xff), then shift it down eight bits or divide by 256, then write the resulting low byte, and repeat that two more times. You'll write four bytes. Since you only write four, you will automatically be writing the length modulo 232.

How to determine if 8bit WAV File is signed or unsigned, using Java and without javax.sound

I need to know whether a ".wav" of 8bits, is signed or unsigned PCM, by only reading file. I cannot use "javax.sound.sampled.*" or AudioSystem libraries.
8 bit (or lower) WAV files are always unsigned. 9 bit or higher are always signed:
Each sample is contained in an integer i. The size of i is the smallest number of bytes required to contain the specified sample size. The least significant byte is stored first. The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero.
For example, if the sample size (recorded in nBitsPerSample) is 12 bits, then each sample is stored in a two-byte integer. The least significant four bits of the first (least significant) byte is set
to zero.
The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows:
Multimedia Programming Interface
and Data Specifications 1.0 - IBM/Microsoft, August 1991
In the wav File, 8-bit samples are stored as unsigned bytes, ranging from 0 to 255.
The 16-bit samples are stored as signed integers in 2's-complement.

Categories