I am attempting to convert 16 bit audio into 12 bit audio. However, I am quite inexperienced with such conversions and believe my approach is possibly incorrect or flawed.
The use case, as context for the code snippets below, is an Android app which the user can speak into and that audio is transmitted to an IoT device for immediate playback. The IoT device expects audio in mono 12 bit, 8k sample rate, little endian, unsigned, with the data stored in the first twelve bits (0-11) and final four bits (12-15) are zeroes. Audio data needs to be received in packets of 1000 bytes.
The audio is being created in the Android app through the use of AudioRecord. The instantiation of which is as follows:
int bufferSize = 1000;
this.audioRecord = new AudioRecord(
MediaRecorder.AudioSource.MIC,
8000,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT,
bufferSize
);
In a while loop, the AudioRecord is being read from by 1000 byte packets and modified to the specifications in the use case. Not sure this part is relevant, but for completeness:
byte[] buffer = new byte[1000];
audioRecord.read(buffer, 0, buffer.length);
byte[] modifiedBytes = convert16BitTo12Bit(buffer);
Then the modifiedBytes are sent off to the device.
Here are the methods which modify the bytes. Basically, to conform to the specifications, I am shifting the bits in each 16 bit set (tossing the least significant 4) and adding zeroes to the final four spots. I do this through BitSet.
/**
* Takes a byte array presented as 16 bit audio and converts it to 12 bit audio through bit
* manipulation. Packets must be of 1000 bytes or no manipulation will occur and the input
* will be immediately returned.
*/
private byte[] convert16BitTo12Bit(byte[] input) {
if (input.length == 1000) {
for (int i = 0; i < input.length; i += 2) {
Log.d(TAG, "convert16BitTo12Bit: pass #" + (i / 2));
byte[] chunk = new byte[2];
System.arraycopy(input, i, chunk, 0, 2);
if (!isEmptyByteArray(chunk)) {
byte[] modifiedBytes = convertChunk(chunk);
System.arraycopy(
modifiedBytes,
0,
input,
i,
modifiedBytes.length
);
}
}
return input;
}
Log.d(TAG, "convert16BitTo12Bit: Failed - input is not 1000 in length; it is " + input.length);
return input;
}
/**
* Converts 2 bytes 16 bit audio into 12 bit audio. If the input is not 2 bytes, the input
* will be returned without manipulation.
*/
private byte[] convertChunk(byte[] chunk) {
if (chunk.length == 2) {
BitSet bitSet = BitSet.valueOf(chunk);
Log.d(TAG, "convertChunk: bitSet starts as " + bitSet.toString());
modifyBitSet(bitSet);
Log.d(TAG, "convertChunk: bitSet ends as " + bitSet.toString());
return bitSet.toByteArray();
}
Log.d(TAG, "convertChunk: Failed = chunk is not 2 in length; it is " + chunk.length);
return chunk;
}
/**
* Removes the first four bits and shifts the rest to leave the final four bits as 0.
*/
private void modifyBitSet(BitSet bitSet) {
for (int i = 4; i < bitSet.length(); i++) {
bitSet.set(i - 4, bitSet.get(i));
}
if (bitSet.length() > 8) {
bitSet.clear(12, 16);
} else {
bitSet.clear(4, 8);
}
}
/**
* Returns true if the byte array input contains all zero bits.
*/
private boolean isEmptyByteArray(byte[] input) {
BitSet bitSet = BitSet.valueOf(input);
return bitSet.isEmpty();
}
Unfortunately, this approach produces subpar results. The audio is quite noisy and it is difficult to make out what someone is saying (but you can hear that words are being spoken).
I also have been playing around with just saving the bytes to a file and playing it back on Android through AudioTrack. I noticed that if I just remove the first four bits and do not shift anything, the audio actually sounds good, as such:
private void modifyBitSet(BitSet bitSet) {
bitSet.clear(0, 4);
}
However, when played through the device, it sounds even worse, and I don't even think I can make out any words.
Clearly, my approach is not working here. Central question is how would one convert a 16 bit chunk into 12 bit audio and maintain audio quality given the requirement that the final four bits must be zero? Additionally, given my larger approach of using AudioRecord to obtain the audio, would such a solution for the prior question fit this use case?
Please let me know if there is anything more I can provide to clarify these questions and my intent.
Given that the audio is 16 bits but must be changed to 12 with four zeros at the end, four bits somewhere do have to be tossed.
Yes, of course and there is no other way, is there?
This is something quick that I can comeout with right now. Certainly not fully tested though. Only tested with input of 2 and 4 bytes. I'll leave it to you to test it.
//Reminder :: Convert as many as possible.
//Reminder :: To calculate the required size for store:
//if((bytes.length & 1) == 0) Math.round((bytes.length * 6) / 8F) : Math.round(((bytes.length - 1) * 6) / 8F).
//Return :: Amount of converted bytes.
public static final int convert16BitTo12Bit(final byte[] bytes, final byte[] store)
{
final int size = bytes.length;
int storeIndex = 0;
//Copy the first 2 bytes into store.
store[storeIndex++] = bytes[0]; store[storeIndex] = bytes[1];
if(size < 4) {
store[storeIndex] = (byte)(store[storeIndex] & 0xF0);
return 2;
}
final int result;
final byte tmp;
// 11111111 11110000 00000000 00000000
//+ 11111111 11110000 (<< 12)
//= 11111111 11111111 11111111 00000000 (1)
//-----------------------------------------
// 11111111 00000000 00000000 00000000 (1)
//+ 11111111 11110000 (<< 16)
//= 11111111 11111111 11110000 00000000 (2)
//-----------------------------------------
// 11110000 00000000 00000000 00000000 (2)
//+ 1111 11111111 0000 (<< 20)
//= 11111111 11111111 00000000 00000000 (3)
//-----------------------------------------
// 00000000 00000000 00000000 00000000 (3)
//+ 11111111 11110000 (<< 24)
//= 11111111 11110000 00000000 00000000
for(int i=2, shiftBits = 12; i < size; i += 2) {
if(shiftBits == 24) {
//Copy 2 bytes from bytes[] into store[] and move on.
store[storeIndex] = bytes[i];
//Never store byte 0 (Garbage).
tmp = (byte)(bytes[i + 1] & 0xF0); //Bit order: 11110000.
if(tmp != 0) store[++storeIndex] = tmp;
shiftBits = 12; //Reset
} else if(shiftBits == 20) {
result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
| (((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));
store[storeIndex] = (byte)((result >> 24) & 0xFF);
tmp = (byte)((result >> 16) & 0xFF);
//Never store byte 0 (Garbage).
if(tmp != 0) store[++storeIndex] = tmp;
shiftBits = 24;
} else if(shiftBits == 16) {
result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
| (((bytes[i] & 0xFF) << 16) | ((bytes[i + 1] & 0xFF) << 8));
store[storeIndex] = (byte)((result >> 16) & 0xFF);
tmp = (byte)((result >> 8) & 0xF0);
//Never store byte 0 (Garbage).
if(tmp != 0) store[++storeIndex] = tmp;
shiftBits = 20;
} else {
result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
| (((bytes[i] & 0xFF) << 12) | ((bytes[i + 1] & 0xFF) << 4));
store[storeIndex] = (byte)((result >> 16) & 0xFF);
tmp = (byte)((result >> 8) & 0xFF);
//Never store byte 0 (Garbage).
if(tmp != 0) store[++storeIndex] = tmp;
shiftBits = 16;
}
}
return ++storeIndex;
}
Explanations
result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
| (((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));
What this does is basically merge two integers into one.
((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
The first one is make an integer with same constant bit position.
(((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));
The latter is for 2 current bytes with different bit positions.
(...) | (...)
Pipe or vertical bar at the middle is to merge these two integers we've just created into one.
Usage
To use this method is pretty straight forward.
byte[] buffer = new byte[1000];
byte[] store;
if((buffer.length & 1) == 0) { //Even.
store = new byte[Math.round((bytes.length * 6) / 8F)];
} else { //Odd.
store = new byte[Math.round(((bytes.length - 1) * 6) / 8F)];
}
audioRecord.read(buffer, 0, buffer.length);
int convertedByteSize = convert16BitTo12Bit(buffer, store);
System.out.println("size: " + convertedByteSize);
I have discovered a solution that produces clear audio. First, it is important to recount the requirements for the use case, which is 12 bit unsigned mono audio which will be read in little endian by the device in packets of 1000 bytes.
The initialization and configuration of the AudioRecord as described in the question is fine.
Once the 1000 bytes of audio is read from AudioRecord, it can be put into a ByteBuffer and defined as little endian for modification, and then put into a ShortBuffer to do manipulation on the 16 bit level:
// Audio specifications of device is in little endian.
ByteBuffer byteBuffer = ByteBuffer.wrap(input).order(ByteOrder.LITTLE_ENDIAN);
// Turn into a ShortBuffer so bitwise manipulation can occur on the 16 bit level.
ShortBuffer shortBuffer = byteBuffer.asShortBuffer();
Next, in a loop, take each short and modify it to 12 bit unsigned:
for (int i = 0; i < shortBuffer.capacity(); i++) {
short currentShort = shortBuffer.get(i);
shortBuffer.put(i, convertShortTo12Bit(currentShort));
}
This can be accomplished by shifting the 16 bits four spaces to the right to turn it into 12 bit signed. Then, to convert to unsigned, add 2048. For our purposes as a safety step, we also mask the least significant four bits as required by device, but given the shifting and adding, it shouldn't be the case that any bits actually remain there:
private static short convertShortTo12Bit(short input) {
int inputAsInt = input;
inputAsInt >>>= 4;
inputAsInt += 2048;
input = (short) (inputAsInt & 0B0000111111111111);
return input;
}
If one wishes to return 12 bits to 16 bits, do the reverse for each short (subtract 2048 and shift four spaces to the left).
I'm trying to convert this snippet from C# to java. The C# snippet is correctly returning the value 3259945, the java code is incorrectly returning -16855. I'm completely useless at bit manipulation and have no idea where to even start. Can anyone help?
If people need the input variables I'll try to get the buffer byte array as a hex string so I can put it up. The startIndex I'm using is 26.
C# snippet:
Int64 mantissa = ((Int64)(buffer[startIndex] & 0x7F) << (8 * 2))
| ((Int64)buffer[startIndex + 3] << (8 * 1))
| ((Int64)buffer[startIndex + 2] << (8 * 0));
Java Snippet:
long mantissa = ((long)(buffer[startIndex] & 0x7F) << (8 * 2))
| ((long)buffer[startIndex + 3] << (8 * 1))
| ((long)buffer[startIndex + 2] << (8 * 0));
As mentioned in the comments, in .NET a byte is unsigned (0 to 255) and in Java it is signed (-128 to 127). To normalize it, you need to use the & 0xFF mask.
long mantissa = ((long)(buffer[startIndex] & 0x7F) << (8 * 2))
| ((long)(buffer[startIndex + 3] & 0xFF) << (8 * 1))
| ((long)(buffer[startIndex + 2] & 0xFF) << (8 * 0));
In the first case, you don't need this mask because the sign bit has been cleared by 0x7F.
I'm trying to learn bit shifting/masking. Here is my code:
int health = 511; // max 512, 9 bits
int aimAngle = 510; // max 512, 9 bits
int test = 511; // max 512, 9 bits
boolean bool = false; // max 1, 1 bit
int packed;
packed = health | aimAngle << 9 | test << 18 | (bool?1:0) << 19;
Debug.log("health: " + ((packed ) & 0b111111111));
Debug.log("aimAngle: " + ((packed >> 9) & 0b111111111));
Debug.log("test: " + ((packed >> 18) & 0b111111111));
Debug.log("bool: " + ((packed >> 19) & 0b1));
I'm getting all the values correctly except bool. It's always 1. What is wrong? Can't I shift zero to the beginning?
test is up to nine bits long. You shift it right 18 places. Therefore it occupies bits 18 to 27. You need to shift bool to place 28 to avoid it, not to place 19.
The 19th digit of packed is the second digit of test, which is a 1.
So I'm trying to understand base64 encoding better and I came across this implementation on wikipedia
private static String base64Encode(byte[] in) {
StringBuffer out = new StringBuffer((in.length * 4) / 3);
int b;
for (int i = 0; i < in.length; i += 3) {
b = (in[i] & 0xFC) >> 2;
out.append(codes.charAt(b));
b = (in[i] & 0x03) << 4;
if (i + 1 < in.length) {
b |= (in[i + 1] & 0xF0) >> 4;
out.append(codes.charAt(b));
b = (in[i + 1] & 0x0F) << 2;
if (i + 2 < in.length) {
b |= (in[i + 2] & 0xC0) >> 6;
out.append(codes.charAt(b));
b = in[i + 2] & 0x3F;
out.append(codes.charAt(b));
} else {
out.append(codes.charAt(b));
out.append('=');
}
} else {
out.append(codes.charAt(b));
out.append("==");
}
}
return out.toString();
}
And I'm following along and I get to the line:
b = (in[i] & 0xFC) >> 2;
and I don't get it...why would you bitwise and 252 to a number then shift it right 2...wouldn't it be the same if you just shifted the byte itself without doing the bitwise operation? example:
b = in[i] >> 2;
Say my in[i] was the letter e...represented as 101 or in binary 01100101. If I shift that 2 to the right I get 011001 or 25. If I bitwise & it I get
01100101
11111100
--------
01100100
but then the shift is going to chop off the last 2 anyway...so why bother doing it?
Can somebody clarify for me please. Thanks.
IN in[i] >> 2, in[i] is converted to an int first. If it was a negative byte (with the high bit set) it will be converted to a negative int (with the now-highest 24 bits set as well).
In (in[i] & 0xFC) >> 2, in[i] is converted to an int as above, and then & 0xFC makes sure the extra bits are all reset to 0.
You're partially right, in that (in[i] & 0xFF) >> 2 would give the same result. & 0xFF is a common way to convert a byte to a non-negative int in the range 0 to 255.
The only way to know for sure why the original developer used 0xFC, and not 0xFF, is to ask them - but I speculate that it's to make it more obvious which bits are being used.
I have a small bug here somewhere in my code! I must be blind because i really can't seem to find it or figure it out. I have a list of byte arrays. I'm parsing out the first 2 elements as well as the very last element in each array. If i get the value -16, -11 or -7 i want to keep the values. For some reason, the last value in on of the arrays is not being deleted. Why is this happening?
Have you tried debugging it?
When you have b = -110
it passed this condition:
if(b!= -15 && i + 2 < srec.length() && (Character.digit(srec.charAt(i + 2), 16) << 4) + Character.digit(srec.charAt(i + 3), 16) != -15
&& (Character.digit(srec.charAt(i + 2), 16) << 4) + Character.digit(srec.charAt(i + 3), 16) != -11)
b != 15 -> true
i + 2 = 124 < srec.length() = 142 -> true
(Character.digit(srec.charAt(i + 2), 16) << 4)+ Character.digit(srec.charAt(i + 3), 16) = -7 and -7 != -15 and also -7 != -11
hence the data.add(b) is executed.