This is an excerpt of code from a music tuner application. A byte[] array is created, audio data is read into the buffer arrays, and then the for loop iterates through buffer and combines the values at indices n,n+1, to create an array of 16-bit numbers that is half the length.
byte[] buffer = new byte[2*1200];
targetDataLine.read(buffer, 0, buffer.length)
for ( int i = 0; i < n; i+=2 ) {
int value = (short)((buffer[i]&0xFF) | ((buffer[i+1]&0xFF) << 8)); //**Don't understand**
a[i >> 1] = value;
}
So far, what I have is this:
From a different SO post, I learned that every byte being stored in a larger type must be & with 0xFF, due to its conversion to a 32-bit number. I guess the leading 24 bits are filled with 1s (though I don't know why it isn't filled with zeros... wouldn't leading with 1s change the value of the number? 000000000010 (2) is different from 111111110010 (-14), after all.), so the purpose of 0xff is to only grab the last 8 bits (which is the whole byte).
When buffer[i+1] is shifted left by 8 bits, this makes it so that, when ORing, the eight bits from buffer[i+1] are in the most significant positions, and the eight bits from buffer[i] are in the least significant eight bits. We wind up with a 16-bit number that is of the form buffer[i+1] + buffer[i]. (I'm using + but I understand it's closer to concatenation.)
First, why are we ORing buffer[i] | buffer[i+1] << 8? This seems to destroy the original sound information unless we pull it back out in the same way; while I understand that OR will combine them into one value, I don't see how that value can be useful or used in calculations later. And the only way this data is accessed later is as its literal values:
diff += Math.abs(a[j]-a[i+j];
If I have 101 and 111, added together I should get 12, or 1100. Yet 101 | 111 << 3 gives 111101, which is equal to 61. The closest I got to understanding was that 101 (5) | 111000 (56) is the same as adding 5+56=61. But the order matters -- doing the reverse 101 <<3 | 111 is completely different. I really don't understand how the data can remain useful, when it is OR'd in this way.
The other problem I'm having is that, because Java uses signed bytes, the eighth position doesn't indicate the value, but the sign. If I'm ORing two binary signed numbers, then in the resulting 16-bit number, the bit at 2⁷ is now acting as a value instead of a placeholder. If I had a negative byte before running the OR, then in my final value post-operation, it would now erroneously be acting as though the original number had a positive 2⁷ in it. 0xff doesn't get rid of this, because it preserves the eighth, signed byte, so shouldn't this be a problem?
For example, 1111 (-1) and 0101, when OR'd, might give 01011111. But 1111 wasn't representing POSITIVE 1111, it was representing the signed version; yet in the final answer, it now is acting as a positive 2³.
UPDATE: I marked the accepted answer, but it took that + a little extra work to figure out where I went wrong. For anyone who may read this in the future:
As far as the signing goes, the code I have uses signed bytes. My only guess as to why this doesn't mess anything up is because all of the values received might be of positive sign. Except that this doesn't make sense, given a waveform varies amplitude from [-1,1]. I'm going to play around with this to try and figure it out. If there are negative signs, the implementation of code here doesn't seem to remove the 1 when ORing, so I suspect that it doesn't affect the computation too much (given that we're dealing with really large values (diff += means diff will be really large -- a few extra 1s shouldn't hurt the outcome given the code and the comparisons it relies on. So this was all wrong. I gave it some more thought and it's really simple, actually -- the only reason this was such a problem is because I didn't know about big-endian, and then once I read about it, I misunderstood exactly how it is implemented. Endian-ness explained in the next bulletpoint.
Regarding the order in which the bits are placed, destroying the sound, etc. The code I'm using sets bigEndian=false, meaning that the byte order goes from least significant byte to most significant byte. For this reason, combining the two indices of buffer requires taking the second index, placing its bits first, and placing the first index as second (so we are now in big-endian byte order). One of the problems I had was the impression that "endian-ness" determines the bit order. I thought 10010101 big-endian would become 10101001 small-endian. Turns out this is not the case -- the bits in each byte remain in their original order; the difference is that the bytes are ordered "backward". So 10110101 111000001 big-endian becomes 11100001 10110101 -- same bit order within each byte; however, different byte order.
Finally, I'm not sure why, but the accepted answer is correct: targetDataLine.read() may place the bits into a byte array only (not just in my code, but in all Java code using targetDataLine -- read() only accepts arguments where the destination var is a byte array), but the data is in fact one short split into two bytes. It is for this reason that every two indices must be combined together.
Coming back to the signing goes, it should be obvious by now why this isn't an issue. This is the commenting that I now have in the code, which more coherently explains what it took all of this^ to explain before:
/* The Javadoc explains that the targetDataLine will only read to a byte-typed array.
However, because the sample size is 16-bit, it is actually storing 16-bit numbers
there (shorts), auto-parsing them every eight bits. Additionally, because it is storing
them in little-endian, bits [2^0,2^7] are stored in index[i] in normal order (powers 76543210)
while bits [2^8,2^15] are stored in index[i+1]. So, together they currently read as [7-6-5-4-3-2-1-0 15-14-13-12-11-10-9-8],
which is a problem. In the next for loop, we take care of this and re-organize the bytes by swapping every pair (remember the bits are ok, but the bytes are out of order).
Also, although the array is signed, this will not matter when we combine bytes, because the sign-bit (2^15) will be placed
back at the beginning like it normally is; although 2^7 currently exists as the most significant bit in its byte,
it is not a sign-indicating bit,
because it is really the middle of the short which was split. */
This is combining the byte stream from input in low bytes first byte order to a stream of shorts in internal byte order.
With sign extesion it is more a question of the sign encoding of the original byte stream. If the original byte stream is unsigned (coding values from 0 to 255), then the overcomes the then unwanted effects of java treating values as signed. So educated guess is taht the external byte strem encodes unsigned bytes.
Judging whether the code is plausible needs information on what externel encoding is being treated and what internal encoding is used. E.g. (wild guess could be totally wrong!): the two byte junks read coud belong to 2 channels of a stereo sound encoding and are put into a single short for ease of internal processing. You should look at the encoding being read and the use of the converted data within the application.
I have Huffman coding project that in first step we obtain code of each character depends on Huffman tree.I obtain code of each character for example : a = 01 , b= 101 , c = 111.these codes are String and i want to save them in a file with .cmp extension in binary for example we have a text such : abc and encoding is:01101111 how can i write them to a file with binary value in a file with .cmp extension and after that read them and decode them?
Hopefully you know that bytes and integers consist of bits, so you just need to build a little queue of bits that is a single integer containing the bits and another integer that tracks the number of bits in the first integer, accumulating bits using the shift and or operators. Once you have accumulated a byte, write it out and shift it out of your queue. E.g. to put n bits in buf |= val << bits; bits += n;, and then to pull bits out if you have enough: while (bits >= 8) { write_out(buf & 0xff); buf >>= 8; bits -= 8;. Make sure that you integer is large enough to handle the largest value of n you will have. I.e., buf needs to be able to hold maxn+7 bits, since the while loop will never leave more than 7 bits in the buffer.
if you want to work with bit streams then it is easier to take completed framework, for instance JBBP (java binary block parser) which has JBBPBitOutputStream class providing bit write operations (also there is JBBPBitInputStream class to read bits from streams)
In my android app I receive from a sensor data with the size of 8 Byte
via Bluetooth Smart using Android BluetoothGatt. The data contains values for temperature, pressure and humidity. The values are splitted up in the following way.
PRESSURE:
Byte 1 + Byte 2 + first 4 Bits of Byte 3, other 4 bits are 0
TEMPERATURE:
Byte 4 + Byte 5 + first 4 bits of Byte 6, other 4 bits are 0
HUMIDITY:
Byte 7 + Byte 8
Now at the moment I have a Byte Array that contains the 8 Byte.
My Problem is that I don't know how to extract or isolate the bits for temperature, pressure and humidity as described above.
Does anyone have an idea how to solve this?
You need to use bit manipulation operations to extract the values.
For example
int pressure = (byte[0]&0xff)<<16+(byte[1]&0xff)<<8+(byte[2]&0xff)
(the 16s and 8s may need to be 12 and 4 or 0 4 and 12 depending on exactly what it means).
The operations you need are << which shifts the bits inside the byte around and the &0xff which changes the signed byte into a signed integer before you shift it. Otherwise the sign bit will mess things up.
I have a spec which reads the column specifies a 16 bit offset to a structure and another column specifies a 24 bit offset to a structure. I am implementing the spec using java.
I am not clear of what 16 bit offset /24 bit offset means and how to perform such operation in java.
It sounds like you have single piece of data that is segmented into bit series.
Bits 16-23 are the "16 bit offset"
Bits 24-? are the "24 bit offset"
The data is probably a int (32-bit signed integer) or a long (64-bit signed integer) with certain parts of the bit sequence allocated to storing separate smaller pieces of data.
One way to simply get at the values is to use a bit mask and right shift, like this:
int mask = 0xf00; // only bits 16-23 are 1
int data;
int value = data & mask; // zero other bits
value >>= 16; // shift the value down to the end
An offset is a relative address in some stream and/or storage medium.
A 16bit offset is an offset that's stored in a 16 bit variable/slot.
So if some file format specification says that "the next field is the 16 bit offset" that means you must read the next 2 byte and treat it as a relative address.
What exactly that addresses depends on the specification: it might be bytes, it might be "entries" or anything else.
Note also, that Java doesn't have any built-in 24 bit data types, so you'll have to work around that using int, which has 32 bit.
It probably means the boundary of the data starts 16 bits or 24 bits after the start of that structure.
How you access it depends on how you got access to the structure to begin with.
If it's just something you read into a byte array, some data stored offset 16 bits could be accessed by ignoring the first two elements of the array (2 byte = 8 bits * 2), or 3 for 24 bits. If it's an long or an int, it depends if you can use the >> and << shift operators.
I have code that stores values in the range 0..255 in a Java byte to save space in large data collections (10^9 records spread over a couple hundred arrays).
Without additional measures on recovery, the larger values are interpreted as being negative (because the Java integer types use two's complement representation).
I got this helpful hint from starblue in response to a related question, and I'm wondering if this technique is safe to rely on:
int iOriginal = 128, iRestore;
byte bStore = (byte) iOriginal; // reading this value directly would yield -128
iRestore = 0xff & bStore;
Yes, it's safe, indeed it's the most effective way of converting a byte into an (effectively) unsigned integer.
The byte half of the and operation will be sign-extended to an int, i.e. whatever was in bit 7 will be expanded into bits 8-31.
Masking off the bottom eight bits (i.e. & 0xff) then gives you an int that has zero in every bit from 8 - 31, and must therefore be in the range 0 ... 255.
See a related answer I gave here.