For the past 2 days I've been trying to manipulate 16 bit PCM data on Android with little success. I'm currently using WAV recorder to capture audio. In the onPeriodicNotification(AudioRecord recorder) method before the buffer is written with the randomAccessWriter I send the buffer to a custom class, to manipulate the samples, and save the samples back into the buffer. The method in my custom class is as follows:
As the buffer is a byte array I first convert them into shorts, now one short represents a frame (there's only one channel). I will be implementing FFT algorithms, once I get past this hurdle, that need the input to be a float array - so I convert each short into a float. Now, the randomAccessWriter that writes the data into the WAV file accepts a byte array and is expecting each frame to be 2 bytes. Therefore I convert each float back into a short and use a ByteBuffer to reconstruct a byte array, which is then returned. When I run my recorder app, with the buffer being sent through the above code, everything is fine.
I try using a simple voice modulation algorithm to test if the recording is modified, the algorithm is placed where the TODO comment is:
Now if I used the above code on my iPhone the audio samples would be transformed, although the data is natively 32bit floats. However, on Android when I re-run the recorder app, with the above code inserted, all that's produced is white noise. Until I can successfully modify the samples with the above code, I can't proceed with my FFT algorithms.
Why is this occurring? I would be grateful if someone with knowledge on the topic could shed light on the topic.
SOLVED - By Bjorn Roche
Underlying cause: Recording was giving data in Little Endian whereas Java shorts are in Big Endian; when applying a function using the two different forms, white noise is produced. The below code shows how to take in a Little Endian byte array, convert to Big Endian float array and back to Little Endian byte array. Whilst floats you can do whatever you please, I'll now be using my FFT algorithms:
public byte[] manipulateSamples(byte[] data,
int samplingRate,
int numFrames,
short numChannels) {
// Convert byte[] to short[] (16 bit) to float[] (32 bit) (End result: Big Endian)
ShortBuffer sbuf = ByteBuffer.wrap(data).asShortBuffer();
short[] audioShorts = new short[sbuf.capacity()];
sbuf.get(audioShorts);
float[] audioFloats = new float[audioShorts.length];
for (int i = 0; i < audioShorts.length; i++) {
audioFloats[i] = ((float)Short.reverseBytes(audioShorts[i])/0x8000);
}
// Do your tasks here.
// Convert float[] to short[] to byte[] (End result: Little Endian)
audioShorts = new short[audioFloats.length];
for (int i = 0; i < audioFloats.length; i++) {
audioShorts[i] = Short.reverseBytes((short) ((audioFloats[i])*0x8000));
}
byte byteArray[] = new byte[audioShorts.length * 2];
ByteBuffer buffer = ByteBuffer.wrap(byteArray);
sbuf = buffer.asShortBuffer();
sbuf.put(audioShorts);
data = buffer.array();
return data;
}
Your problem is that shorts in java are bigendian, but if you got your data from a WAV file the data is little endian.
Related
Off Topic: Let me start by saying Java is completely new to me. I've been programming for over 15 years and never have had a need for it beyond modifying others' codebases, so please forgive my ignorance and possibly improper terminology. I'm also not very familiar with RF, so if I'm way left field here, please let me know!
I'm building an SDR (Software Defined Radio) radio transmitter, and while I can successfully transmit on a frequency, when I send the stream (either from the device's microphone or bytes from a tone generator), what is coming through my handheld receiver sounds like static.
I believe this to be due to my receiver being set up to receive NFM (Narrowband Frequency Modulation) and WFM (Wideband Frequency Modulation) while the transmission coming from my SDR is sending raw, unmodulated data.
My question is: how do I modulate audio bytes (i.e. an InputStream) so that the resulting bytes are modulated in FM (Frequency Modulation) or AM (Amplitude Modulation), which I can then transmit through the SDR?
I can't seem to find a class or package that handles modulation (eventually I'm going to have to modulate WFM, FM, AM, SB, LSB, USB, DSB, etc.) despite there being quite a few open-source SDR codebases, but if you know where I can find this, that basically answers this question. Everything I've found so far has been for demodulation.
This is a class I've built around Xarph's Answer here on StackOverflow, it simply returns a byte array containing a simple, unmodulated audio signal, which can then be used to play sound through speakers (or transmit over an SDR, but due to the result not being properly modulated, it doesn't come through correctly on the receiver's end, which is what I'm having trouble figuring out)
public class ToneGenerator {
public static byte[] generateTone() {
return generateTone(60, 1000, 8000);
}
public static byte[] generateTone(double duration) {
return generateTone(duration, 1000, 8000);
}
public static byte[] generateTone(double duration, double freqOfTone) {
return generateTone(duration, freqOfTone, 8000);
}
public static byte[] generateTone(double duration, double freqOfTone, int sampleRate) {
double dnumSamples = duration * sampleRate;
dnumSamples = Math.ceil(dnumSamples);
int numSamples = (int) dnumSamples;
double sample[] = new double[numSamples];
byte generatedSnd[] = new byte[2 * numSamples];
for (int i = 0; i < numSamples; ++i) { // Fill the sample array
sample[i] = Math.sin(freqOfTone * 2 * Math.PI * i / (sampleRate));
}
// convert to 16 bit pcm sound array
// assumes the sample buffer is normalized.
// convert to 16 bit pcm sound array
// assumes the sample buffer is normalised.
int idx = 0;
int i = 0 ;
int ramp = numSamples / 20 ; // Amplitude ramp as a percent of sample count
for (i = 0; i< ramp; ++i) { // Ramp amplitude up (to avoid clicks)
double dVal = sample[i];
// Ramp up to maximum
final short val = (short) ((dVal * 32767 * i/ramp));
// in 16 bit wav PCM, first byte is the low order byte
generatedSnd[idx++] = (byte) (val & 0x00ff);
generatedSnd[idx++] = (byte) ((val & 0xff00) >>> 8);
}
for (i = i; i< numSamples - ramp; ++i) { // Max amplitude for most of the samples
double dVal = sample[i];
// scale to maximum amplitude
final short val = (short) ((dVal * 32767));
// in 16 bit wav PCM, first byte is the low order byte
generatedSnd[idx++] = (byte) (val & 0x00ff);
generatedSnd[idx++] = (byte) ((val & 0xff00) >>> 8);
}
for (i = i; i< numSamples; ++i) { // Ramp amplitude down
double dVal = sample[i];
// Ramp down to zero
final short val = (short) ((dVal * 32767 * (numSamples-i)/ramp ));
// in 16 bit wav PCM, first byte is the low order byte
generatedSnd[idx++] = (byte) (val & 0x00ff);
generatedSnd[idx++] = (byte) ((val & 0xff00) >>> 8);
}
return generatedSnd;
}
}
An answer to this doesn't necessarily need to be code, actually theory and an understanding of how FM or AM modulation works when it comes to processing a byte array and converting it to the proper format would probably be more valuable since I'll have to implement more modes in the future.
There is a lot that I don't know about radio. But I think I can say a couple things about the basics of modulation and the problem at hand given the modicum of physics that I have and the experience of coding an FM synthesizer.
First off, I think you might find it easier to work with the source signal's PCM data points if you convert them to normalized floats (ranging from -1f to 1f), rather than working with shorts.
The target frequency of the receiver, 510-1700 kHz (AM radio) is significantly faster than the sample rate of the source sound (presumably 44.1kHz). Assuming you have a way to output the resulting data, the math would involve taking a PCM value from your signal, scaling it appropriately (IDK how much) and multiplying the value against the PCM data points generated by your carrier signal that corresponds to the time interval.
For example, if the carrier signal were 882 kHz, you would multiply a sequence of 20 carrier signal values with the source signal value before moving on to the next source signal value. Again, my ignorance: the tech may have some sort of smoothing algorithm for the transition between the source signal data points. I really don't know about that or not, or at what stage it occurs.
For FM, we have carrier signals in the MHz range, so we are talking orders of magnitude more data being generated per each source signal value than with AM. I don't know the exact algorithm used but here is a simple conceptual way to implement frequency modulation of a sine that I used with my FM synthesizer.
Let's say you have a table with 1000 data points that represents a single sine wave that ranges between -1f to 1f. Let's say you have a cursor that repeatedly traverses the table. If the cursor advanced exactly 1 data point at 44100 fps and delivered the values at that rate, the resulting tone would be 44.1 Hz, yes? But you can also traverse the table via intervals larger than 1, for example 1.5. When the cursor lands in between two table values, one can use linear interpolation to determine the value to output. The cursor increment of 1.5 would result in the sine wave being pitched at 66.2 Hz.
What I think is happening with FM is that this cursor increment is continuously varied, and the amount it is varied depends on some sort of scaling from the source signal translated into a range of increments.
The specifics of the scaling are unknown to me. But suppose a signal is being transmitted with a carrier of 10MHz and ranges ~1% (roughly from 9.9 MHz to 10.1 MHz), the normalized source signal would have some sort of algorithm where a PCM value of -1 match an increment that traverses the carrier wave causing it to produce the slower frequency and +1 match an increment that traverses the carrier wave causing it to produce the higher frequency. So, if an increment of +1 delivers 10 MHz, maybe a source wave PCM signal of -1 elicits a cursor increment of +0.99, a PCM value of -0.5 elicits an increment of +0.995, a value of +0.5 elicits an increment of +1.005, a value of +1 elicits a cursor increment of 1.01.
This is pure speculation on my part as to the relationship between the source PCM values and how that are used to modulate the carrier frequency. But maybe it helps give a concrete image of the basic mechanism?
(I use something similar, employing a cursor to iterate over wav PCM data points at arbitrary increments, in AudioCue (a class for playing back audio data based on the Java Clip), for real time frequency shifting. Code line 1183 holds the cursor that iterates over the PCM data that was imported from the wav file, with the variable idx holding the cursor increment amount. Line 1317 is where we fetch the audio value after incrementing the cursor. Code lines 1372 has the method readFractionalFrame() which performs the linear interpolation. Real time volume changes are also implemented, and I use smoothing on the values that are provided from the public input hooks.)
Again, IDK if any sort of smoothing is used between source signal values or not. In my experience a lot of the tech involves filtering and other tricks of various sorts that improve fidelity or processing calculations.
I need to construct a Java byte array out of mixed data types, but I don't know how to do this. These are my types:
byte version = 1; // at offset 0
short message_length = // the size of the byte[] message I am constructing here, at offset 1
short sub_version = 15346; // at offset 3
byte message_id = 2; // at offset 5
int flag1 = 10; // at offset 6
int flag2 = 0; // at offset 10
int flag3 = 0; // at offset 14
int flag4 = 0; // at offset 18
String message = "the quick brown fox jumps over the lazy dog"; // at offset 22
I know for the String, I can use
message.getBytes("US_ASCII");
I know for the int values, I can use
Integer.byteValue();
I know for the short values, I can use
Short.byteValue();
And the byte values are already bytes, I am just not sure of how to combine all of these into a single byte array. I have read about
System.arraycopy();
Is this the correct process, I just convert all the data to bytes, and start "concatenating" the byte array with arraycopy?
I am communicating with some distant server I have no control over, and this is the message process they require.
Wrap a DataOutputStream around a ByteArrayOutputStream. This way you can write all the primitive types like int and short directly to the DataOutputStream, which converts them to bytes and forwards them to the ByteArrayOutputStream, from which you can then retrieve the whole thing as one byte array:
ByteArrayOutputStream bOut = new ByteArrayOutputStream();
DataOutputStream dOut = new DataOutputStream(bOut);
dOut.writeByte(version);
dOut.writeShort(message_length);
dOut.writeShort(sub_version);
dOut.writeByte(message_id);
dOut.writeInt(flag1);
dOut.writeInt(flag2);
dOut.writeInt(flag3);
dOut.writeInt(flag4);
dOut.write(message.getBytes(), 0, message.length());
dOut.flush();
byte[] result = bOut.toByteArray();
The best thing about this is that you can do the exact opposite (extracting values from a byte array) with DataInputStream and ByteArrayInputStream completely analoguously to the above code.
If by a 'mixed type' you mean a class with different member field types, then one approach is to make your class serializable, and use ApacheUtils
byte[] data = SerializationUtils.serialize(yourObject);
All, I wanted to post my own solution to my problem here. I did a quick Google search on how to insert a short into java byte array. One of the results talked about a Java ByteBuffer. After some reading, I determined this was the best and quickest way for me to get the results I needed. One section in the Java API that really made me interested in the ByteBuffer was this:
Methods in this class that do not otherwise have a value to return are specified to return the buffer upon which they are invoked. This allows method invocations to be chained. The sequence of statements
bb.putInt(0xCAFEBABE);
bb.putShort(3);
bb.putShort(45);
can, for example, be replaced by the single statement
bb.putInt(0xCAFEBABE).putShort(3).putShort(45);
So, that is what I did:
byte version = 1;
short message_length = 72;
short sub_version = 15346;
byte message_id = 2;
int flag1 = 10;
int flag2 = 0;
int flag3 = 0;
int flag4 = 0;
String message = "the quick brown fox jumps over the lazy dog";
ByteBuffer messageBuffer = ByteBuffer.allocate(message_length);
messageBuffer.put(version).putShort(message_length).putShort(sub_version).put(message_id).putInt(flag1).putInt(flag2).putInt(flag3).putInt(flag4).put(message.getBytes());
byte[] myArray = messageBuffer.array();
That was fast and easy, and just what I needed. Thank you all who took the time to read and reply.
Certainly you can concatenate these values with arrayCopy, as you've suggested.
You can also append your bytes onto a ByteArrayOutputStream.
The key is to understand exactly what the receiving system is expecting. How does it know where one field ends and the next begins? How does it know what type it's reading at a given position in the stream? There are lots of ways they could have chosen to do that - with length headers in the protocol; with type headers; with null-termination of strings; with a set order of fields and their lengths; and so on.
Whatever method you choose, write unit tests that check for edge cases like negative numbers, very large numbers, non-ASCII text and so on. It's easy to get stung when everything has been working fine, then suddenly the server chokes on a Unicode character or a negative number that it interprets as a very large number.
One other option -- perhaps slight overkill for your needs, but flexible and with high performance -- is Google's protocol buffers library.
I am not so proficient in Java, so please keep it quite simple. I will, though, try to understand everything you post. Here's my problem.
I have written code to record audio from an external microphone and store that in a .wav. Storing this file is relevant for archiving purposes. What I need to do is a FFT of the stored audio.
My approach to this was loading the wav file as a byte array and transforming that, with the problem that 1. There's a header in the way I need to get rid of, but I should be able to do that and 2. I got a byte array, but most if not all FFT algorithms I found online and tried to patch into my project work with complex / two double arrays.
I tried to work around both these problems and finally was able to plot my FFT array as a graph, when I found out it was just giving me back "0"s. The .wav file is fine though, I can play it back without problems. I thought maybe converting the bytes into doubles was the problem for me, so here's my approach to that (I know it's not pretty)
byte ByteArray[] = Files.readAllBytes(wav_path);
String s = new String(ByteArray);
double[] DoubleArray = toDouble(ByteArray);
// build 2^n array, fill up with zeroes
boolean exp = false;
int i = 0;
int pow = 0;
while (!exp) {
pow = (int) Math.pow(2, i);
if (pow > ByteArray.length) {
exp = true;
} else {
i++;
}
}
System.out.println(pow);
double[] Filledup = new double[pow];
for (int j = 0; j < DoubleArray.length; j++) {
Filledup[j] = DoubleArray[j];
System.out.println(DoubleArray[j]);
}
for (int k = DoubleArray.length; k < Filledup.length; k++) {
Filledup[k] = 0;
}
This is the function I'm using to convert the byte array into a double array:
public static double[] toDouble(byte[] byteArray) {
ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
double[] doubles = new double[byteArray.length / 8];
for (int i = 0; i < doubles.length; i++) {
doubles[i] = byteBuffer.getDouble(i * 8);
}
return doubles;
}
The header still is in there, I know that, but that should be the smallest problem right now. I transformed my byte array to a double array, then filled up that array to the next power of 2 with zeroes, so that the FFT can actually work (it needs an array of 2^n values). The FFT algorithm I'm using gets two double arrays as input, one being the real, the other being the imaginary part. I read, that for this to work, I'd have to keep the imaginary array empty (but its length being the same as the real array).
Worth to mention: I'm recording with 44100 kHz, 16 bit and mono.
If necessary, I'll post the FFT I'm using.
If I try to print the values of the double array, I get kind of weird results:
...
-2.0311904060823147E236
-1.3309975624948503E241
1.630738286366793E-260
1.0682002560745842E-255
-5.961832069690704E197
-1.1476447092561027E164
-1.1008407401197794E217
-8.109566204271759E298
-1.6104556241572942E265
-2.2081172620352248E130
NaN
3.643749694745671E-217
-3.9085815506127892E202
-4.0747557114875874E149
...
I know that somewhere the problem lies with me overlooking something very simple I should be aware of, but I can't seem to find the problem. My question finally is: How can I get this to work?
There's a header in the way I need to get rid of […]
You need to use javax.sound.sampled.AudioInputStream to read the file if you want to "skip" the header. This is useful to learn anyway, because you would need the data in the header to interpret the bytes if you did not know the exact format ahead of time.
I'm recording with 44100 kHz, 16 bit and mono.
So, this almost certainly means the data in the file is encoded as 16-bit integers (short in Java nomenclature).
Right now, your ByteBuffer code makes the assumption that it's already 64-bit floating point and that's why you get strange results. In other words, you are reinterpreting the binary short data as if it were double.
What you need to do is read in the short data and then convert it to double.
For example, here's a rudimentary routine to do such as you're trying to do (supporting 8-, 16-, 32- and 64-bit signed integer PCM):
import javax.sound.sampled.*;
import javax.sound.sampled.AudioFormat.Encoding;
import java.io.*;
import java.nio.*;
static double[] readFully(File file)
throws UnsupportedAudioFileException, IOException {
AudioInputStream in = AudioSystem.getAudioInputStream(file);
AudioFormat fmt = in.getFormat();
byte[] bytes;
try {
if(fmt.getEncoding() != Encoding.PCM_SIGNED) {
throw new UnsupportedAudioFileException();
}
// read the data fully
bytes = new byte[in.available()];
in.read(bytes);
} finally {
in.close();
}
int bits = fmt.getSampleSizeInBits();
double max = Math.pow(2, bits - 1);
ByteBuffer bb = ByteBuffer.wrap(bytes);
bb.order(fmt.isBigEndian() ?
ByteOrder.BIG_ENDIAN : ByteOrder.LITTLE_ENDIAN);
double[] samples = new double[bytes.length * 8 / bits];
// convert sample-by-sample to a scale of
// -1.0 <= samples[i] < 1.0
for(int i = 0; i < samples.length; ++i) {
switch(bits) {
case 8: samples[i] = ( bb.get() / max );
break;
case 16: samples[i] = ( bb.getShort() / max );
break;
case 32: samples[i] = ( bb.getInt() / max );
break;
case 64: samples[i] = ( bb.getLong() / max );
break;
default: throw new UnsupportedAudioFileException();
}
}
return samples;
}
The FFT algorithm I'm using gets two double arrays as input, one being the real, the other being the imaginary part. I read, that for this to work, I'd have to keep the imaginary array empty (but its length being the same as the real array).
That's right. The real part is the audio sample array from the file, the imaginary part is an array of equal length, filled with 0's e.g.:
double[] realPart = mySamples;
double[] imagPart = new double[realPart.length];
myFft(realPart, imagPart);
More info... "How do I use audio sample data from Java Sound?"
The samples in a wave file are not going to be already 8-byte doubles that can be directly copied as per your posted code.
You need to look up (partially from the WAVE header format and from the RIFF specification) the data type, format, length and endianess of the samples before converting them to doubles.
Try 2 byte little-endian signed integers as a likely possibility.
Just now, I happened to have overcomplicated type conversion (I still do not understand them types perfectly).
I transferred 0 - 1024 values as 4 bytes (int) from Arduino to Processing via serial link. Soon I realised that I can as well send short (2 bytes) to get 2 times faster communication (and I need it very fast).
So this is what I have in C++ on arduino:
// variable to store the value coming from the sensor
unsigned short sensorValue = 0;
//Time when I last sent the buffer (serial link seems to need some rest)
unsigned long last_time_sent = millis();
//Buffer to save data I've collected
byte buffer[256];
//Position in buffer
byte buffer_pos = 0;
while(1) {
//Get 0 - 1024
sensorValue = analogRead(sensorPin);
//(Try to) convert Short to two bytes. I don't even which is first and which is last
for(byte i=0; i<2; i++) {
//Some veird bit-shifting, all saved in buffer with an offset
buffer[i+buffer_pos] = (byte)(sensorValue >> ((2-i) * 8));
}
//Iterate buffer position
buffer_pos+=2;
//Currently, I send the data allways
if(true||millis()-last_time_sent>30||buffer_pos+2>=255)
Serial.write(buffer, buffer_pos);
//Temporary delay for serial link to rest
delay(50);
}
Now, in Processing, the java code looks like that:
void serialEvent(Serial uselessParameter) {
while (myPort.available() >= 2) {
//java doesn't allow unsigned variables
short number = 0;
for(byte i=0; i<2; i++) {
byte received = (byte)myPort.read();
println("Byte received: "+Integer.toString((int)received));
number |= myPort.read() << (2-i)*8;
}
//Save data for further rendering
graph.add(number); //Array of integers, java doesn't let me make array of short
}
//Clean old data
while(graph.size()>MAX_GRAPH_SIZE)
graph.remove(0);
}
I think I've something wrong on the arduino side, because I see this in output:
Byte received: 0
Byte received: -1
Resulting 2 byte number: -256
Arduino should values send about 681. (I have a 1 digit display to check the values approximately).
On the C++ side, use htons() to turn your shorts from host endianness to network endianness (htons: Host TO Network Short).
On the Java side, read form the socket, write into a ByteArrayOutputStream; once you have all the data, wrap this ByteArrayOutputStream's underlying byte array into a ByteBuffer.
Read the shorts using the ByteBuffer's .getShort().
I'm using jLayer to decode MP3 data, with this call:
SampleBuffer output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);
This call which returns the decoded data, returns an array of short[].
output.getBuffer();
When I call AudioTrack write() with that method, it plays fine as I loop through the file:
at.write(output.getBuffer(), 0, output.getBuffer().length);
However, when I convert the short[] array to byte[] array using any of the methods in this answer: https://stackoverflow.com/a/12347176/1176436 the sound gets distorted and jittery:
at.write(output.getBuffer(), 0, output.getBuffer().length);
becomes:
byte[] array = ShortToByte_Twiddle_Method(output.getBuffer());
at.write(array, 0, array.length);
Am I doing anything wrong and what can I do to fix it? Unfortunately I need the pcm data to be in a byte array for another 3rd party library I'm using. The file is 22kHz if that matters and this is how at is being instantiated:
at = new AudioTrack(AudioManager.STREAM_MUSIC, 22050, AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT, 10000 /* 10 second buffer */,
AudioTrack.MODE_STREAM);
Thank you so much in advance.
Edit: This is how I'm instantiating the AudioTrack variable now. So for 44kHz files, the value that is getting sent is 44100, while for 22kHz files, the value is 22050.
at = new AudioTrack(AudioManager.STREAM_MUSIC, decoder.getOutputFrequency(),
decoder.getOutputChannels() > 1 ? AudioFormat.CHANNEL_OUT_STEREO : AudioFormat.CHANNEL_OUT_MONO,
AudioFormat.ENCODING_PCM_16BIT, 10000 /* 10 second buffer */,
AudioTrack.MODE_STREAM);
This is decode method:
public byte[] decode(InputStream inputStream, int startMs, int maxMs) throws IOException {
ByteArrayOutputStream outStream = new ByteArrayOutputStream(1024);
float totalMs = 0;
boolean seeking = true;
try {
Bitstream bitstream = new Bitstream(inputStream);
Decoder decoder = new Decoder();
boolean done = false;
while (!done) {
Header frameHeader = bitstream.readFrame();
if (frameHeader == null) {
done = true;
} else {
totalMs += frameHeader.ms_per_frame();
if (totalMs >= startMs) {
seeking = false;
}
if (!seeking) {
// logger.debug("Handling header: " + frameHeader.layer_string());
SampleBuffer output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);
short[] pcm = output.getBuffer();
for (short s : pcm) {
outStream.write(s & 0xff);
outStream.write((s >> 8) & 0xff);
}
}
if (totalMs >= (startMs + maxMs)) {
done = true;
}
}
bitstream.closeFrame();
}
return outStream.toByteArray();
} catch (BitstreamException e) {
throw new IOException("Bitstream error: " + e);
} catch (DecoderException e) {
throw new IOException("Decoder error: " + e);
}
}
This is how it sounds (wait a few seconds): https://vimeo.com/60951237 (and this is the actual file: http://www.tonycuffe.com/mp3/tail%20toddle.mp3)
Edit: I would have loved to have split the bounty, but instead I have given the bounty to Bill and the accepted answer to Neil. Both were a tremendous help. For those wondering, I ended up rewriting the Sonic native code which helped me move along the process.
As #Bill Pringlemeir says, the problem is that your conversion method doesn't actually convert. A short is a 16 bit number; a byte is an 8 bit number. The method you have chosen doesn't convert the contents of the shorts (ie go from 16 bits to 8 bits for the contents), it changes the way in which the same collection of bits is stored. As you say, you need something like this:
SampleBuffer output = (SampleBuffer) decoder.decodeFrame(frameHeader, bitstream);
byte[] array = MyShortToByte(output.getBuffer());
at.write(array, 0, array.length);
#Bill Pringlemeir's approach is equivalent to dividing all the shorts by 256 to ensure they fit in the byte range:
byte[] MyShortToByte(short[] buffer) {
int N = buffer.length;
ByteBuffer byteBuf = ByteBuffer.allocate(N);
while (N >= i) {
byte b = (byte)(buffer[i]/256); /*convert to byte. */
byteBuf.put(b);
i++;
}
return byteBuf.array();
}
This will work, but will probably give you very quiet, edgy tones. If you can afford the processing time, a two pass approach will probably give better results:
byte[] MyShortToByte(short[] buffer) {
int N = buffer.length;
short min = 0;
short max = 0;
for (int i=0; i<N; i++) {
if (buffer[i] > max) max = buffer[i];
if (buffer[i] < min) min = buffer[i];
}
short scaling = 1+(max-min)/256; // 1+ ensures we stay within range and guarantee no divide by zero if sequence is pure silence ...
ByteBuffer byteBuf = ByteBuffer.allocate(N);
for (int i=0; i<N; i++) {
byte b = (byte)(buffer[i]/scaling); /*convert to byte. */
byteBuf.put(b);
}
return byteBuf.array();
}
Again, beware signed / unsigned issue. The above works signed-> signed and unsigned->unsigned; but not between the two. It may be that you are reading signed shorts (-32768-32767), but need to output unsigned bytes (0-255), ...
If you can afford the processing time, a more precise (smoother) approach would be to go via floats (this also gets round the signed/unsigned issue):
byte[] MyShortToByte(short[] buffer) {
int N = buffer.length;
float f[] = new float[N];
float min = 0.0f;
float max = 0.0f;
for (int i=0; i<N; i++) {
f[i] = (float)(buffer[i]);
if (f[i] > max) max = f[i];
if (f[i] < min) min = f[i];
}
float scaling = 1.0f+(max-min)/256.0f; // +1 ensures we stay within range and guarantee no divide by zero if sequence is pure silence ...
ByteBuffer byteBuf = ByteBuffer.allocate(N);
for (int i=0; i<N; i++) {
byte b = (byte)(f[i]/scaling); /*convert to byte. */
byteBuf.put(b);
}
return byteBuf.array();
}
The issue is with your short to byte conversion. The byte conversion link preserves all information including the high and low byte portions. When you are converting from 16bit to 8bit PCM samples, you must discard the lower byte. My Java skills are weak, so the following may not work verbatim. See also: short to byte conversion.
ByteBuffer byteBuf = ByteBuffer.allocate(N);
while (N >= i) {
/* byte b = (byte)((buffer[i]>>8)&0xff); convert to byte. native endian */
byte b = (byte)(buffer[i]&0xff); /*convert to byte; swapped endian. */
byteBuf.put(b);
i++;
}
That is the following conversion,
AAAA AAAA SBBB BBBB -> AAAA AAAA, +1 if S==1 and positive else -1 if S==1
A is a bit that is kept. B is a discarded bit and S is a bit that you may wish to use for rounding. The rounding is not needed, but it may sound a little better. Basically, 16 bit PCM is higher resolution than 8 bit PCM. You lose those bits when the conversion is done. The short to byte routine tries to preserve all information.
Of course, you must tell the sound library that you are using 8-bit PCM. My guess,
at = new AudioTrack(AudioManager.STREAM_MUSIC, 22050, AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_8BIT, 10000 /* 10 second buffer */,
AudioTrack.MODE_STREAM);
If you can only use 16bit PCM to play audio, then you have to do the inverse and convert the 8bit PCM from the library to 16bit PCM for playback. Also note, that typically, 8bit samples are often NOT straight PCM but u-law or a-law encoded. If the 3rd party library uses these formats, the conversion is different but you should be able to code it from the wikipedia links.
NOTE: I have not included the rounding code as overflow and sign handling will complicate the answer. You must check for overflow (Ie, 0x8f + 1 gives 0xff or 255 + 1 giving -1). However, I suspect the library is not straight 8bit PCM.
See Also: Alsa PCM overview, Multi-media wiki entry on PCM - Ultimately Android uses ALSA for sound.
Other factors that must be correct for a PCM raw buffer are sample rate, number of channels (stereo/mono), PCM format including bits, companding, little/big endian and sample interleaving.
EDIT: After some investigation, the JLayer decoder typically returns big endian 16bit values. The Sonic filter, takes a byte but threats them as 16bit little endian underneath. Finally, the AudioTrack class expects 16 bit little endian underneath. I believe that for some reason the JLayer mp3 decoder will return 16bit little endian values. The decode() method in the question does a byte swap of the 16 bit values. Also, the posted audio sounds as if the bytes are swapped.
public byte[] decode(InputStream inputStream, int startMs, int maxMs, bool swap) throws IOException {
...
short[] pcm = output.getBuffer();
for (short s : pcm) {
if(swap) {
outStream.write(s & 0xff);
outStream.write((s >> 8) & 0xff);
} else {
outStream.write((s >> 8) & 0xff);
outStream.write(s & 0xff);
}
}
...
For 44k mp3s, you call the routine with swap = true;. For the 22k mp3 swap = false. This explains all the reported phenomena. I don't know why the JLayer mp3 decoder would sometimes output big endian and other times little endian. I imagine it depends on the source mp3 and not the sample rate.