I have 2 files. Once is an mp3 being decoded to pcm into a stream and I have a wav being read into pcm also. The samples are being held in a short data type.
Audio stats: 44,100 samples * 16 bits per sample * 2 channels = 1,411,200 bits/sec
I have X seconds of silence that I need to apply to the beginning of the mp3 pcm data and I am doing it like this:
private short[] mp3Buffer = null;
private short[] wavBuffer = null;
private short[] mixedBuffer = null;
double silenceSamples = (audioInfo.rate * padding) * 2;
for (int i = 0; i < minBufferSize; i++){
if (silenceSamples > 0 ){
mp3Buffer[i] = 0; //Add 0 to the buffer as silence
mixedBuffer[i] = (short)((mp3Buffer[i] + stereoWavBuffer[i])/2);
silenceSamples = silenceSamples - 0.5;
}
else
mixedBuffer[i] = (short)((mp3Buffer[i] + stereoWavBuffer[i])/2);
}
The audio is always off. Sometimes its a second or two too fast, sometimes its a second or two too slow too slow. I dont think its a problem with the timing as I start the audiorecord(wav) first and then set a start timer->start mediaplayer(already prepared)->end timer and setting the difference to the "padding" variable. I am also skipping the 44kb when from the wav header.
Any help would be much appreciated.
I'm assuming you are wanting to align two sources of audio in some way by inserting padding at the start of one of the streams? There are a few things wrong here.
mp3Buffer[i] = 0; //Add 0 to the buffer as silence
This is not adding silence to the beginning, is is just setting the entry at offest [i] in the array to 0. The next line:
mixedBuffer[i] = (short)((mp3Buffer[i] + stereoWavBuffer[i])/2);
Then just overwrites this value.
If you are wanting to align the streams in some way, the best way to go about it is not to insert silence at the beginning of either stream, but to just begin mixing in one of the streams at an offset from the other. Also it would be better to mix them into a 32 bit float and then normalise. Something like:
int silenceSamples = (audioInfo.rate * padding) * 2;
float[] mixedBuffer = new float[minBufferSize + silenceSamples]
for (int i = 0; i < minBufferSize + silenceSamples; i++){
if (i < silenceSamples )
{
mixedBuffer[i] = (float) stereoWavBuffer[i];
}
else if(i < minBufferSize)
{
mixedBuffer[i] = (float) (stereoWavBuffer[i] + mp3Buffer[i-silenceSamples]);
}
else
{
mixedBuffer[i] = (float) (mp3Buffer[i-silenceSamples]);
}
To normalise the data you need to run through the mixedBuffer and find the absolute largest value Math.abs(...), and then multiple all the values in the array by 32,767/largestValue - this will give you a buffer where the largest value fits back into a short without clipping. Then iterate through your float array moving each value back into a short array.
I'm not sure what your minBufferSize is - this will need to be large enough to get all your data mixed.
Related
While reading this example of music visualization in Java.
I was wondering where the Author got the variable eightBitByteArray from.
Can someone explain me how to create this Array or for what it stands for?
for (int t = 0; t < eightBitByteArray.length;) {
for (int channel = 0; channel < numChannels; channel++) {
int low = (int) eightBitByteArray[t];
t++;
int high = (int) eightBitByteArray[t];
t++;
int sample = getSixteenBitSample(high, low);
toReturn[channel][sampleIndex] = sample;
}
sampleIndex++;
}
I don't see any reference to eightBitByteArray in that link but I would assume it is just a byte[] being that each element has eight bits, it has a length field and the variable name says "ByteArray".
I was also confused with this tutorial and have found a complete example here. The eightBitByteArray appears to be buffer used to read audio data from the TargetDataLine.
I would guess it was renamed to this for clarity (because sample size was chosen to be 16 bits, but a byte can only store 8 and the name explicitly reflects that).
I have a file of size around 4-5 Gigs(nearly billion lines). From every line of the file, I have to parse the array of integers and the additional integer info and update my custom data structure. My class to hold such information looks like
class Holder {
private int[][] arr = new int[1000000000][5]; // assuming that max array size is 5
private int[] meta = new int[1000000000];
}
A sample line from the file looks like
(1_23_4_55) 99
Every index in the arr & meta corresponds to the line number in the file. From the above line, I extract the array of integers first and then the meta information. In that case,
--pseudo_code--
arr[line_num] = new int[]{1, 23, 4, 55}
meta[line_num]=99
Right now, I am using BufferedReader object and it's readLine method to read each line & use character level operations to parse the integer array and meta information from each line and populate the Holder instance. But, it takes almost half an hour to complete this entire operation.
I used both java Serialization & Externalizable(write the meta and arr) to serialize and deserialize this HUGE Holder instance. And with both of them, the time to serialize is almost half an hour and to deserialize is also almost half an hour.
I would appreciate your suggestions on dealing with this kind of problem & would definitely love to hear your part of story if any.
P.S. Main Memory is not a problem. I have almost 50 GB of RAM in my machine. I have also increased the BufferedReader size to 40 MB (Of course, I can increase this upto 100 MB considering that disk access takes approx. 100 MB/sec). Even cores and CPU is not a problem.
EDIT I
The code that I am using to do this task is provided below(after anonymizing very few information);
public class BigFileParser {
private int parsePositiveInt(final String s) {
int num = 0;
int sign = -1;
final int len = s.length();
final char ch = s.charAt(0);
if (ch == '-')
sign = 1;
else
num = '0' - ch;
int i = 1;
while (i < len)
num = num * 10 + '0' - s.charAt(i++);
return sign * num;
}
private void loadBigFile() {
long startTime = System.nanoTime();
Holder holder = new Holder();
String line;
try {
Reader fReader = new FileReader("/path/to/BIG/file");
// 40 MB buffer size
BufferedReader bufferedReader = new BufferedReader(fReader, 40960);
String tempTerm;
int i, meta, ascii, len;
boolean consumeNextInteger;
// GNU Trove primitive int array list
TIntArrayList arr;
char c;
while ((line = bufferedReader.readLine()) != null) {
consumeNextInteger = true;
tempTerm = "";
arr = new TIntArrayList(5);
for (i = 0, len = line.length(); i < len; i++) {
c = line.charAt(i);
ascii = c - 0;
// 95 is the ascii value of _ char
if (consumeNextInteger && ascii == 95) {
arr.add(parsePositiveInt(tempTerm));
tempTerm = "";
} else if (ascii >= 48 && ascii <= 57) { // '0' - '9'
tempTerm += c;
} else if (ascii == 9) { // '\t'
arr.add(parsePositiveInt(tempTerm));
consumeNextInteger = false;
tempTerm = "";
}
}
meta = parsePositiveInt(tempTerm);
holder.update(arr, meta);
}
bufferedReader.close();
long endTime = System.nanoTime();
System.out.println("#time -> " + (endTime - startTime) * 1.0
/ 1000000000 + " seconds");
} catch (IOException exp) {
exp.printStackTrace();
}
}
}
public class Holder {
private static final int SIZE = 500000000;
private TIntArrayList[] arrs;
private TIntArrayList metas;
private int idx;
public Holder() {
arrs = new TIntArrayList[SIZE];
metas = new TIntArrayList(SIZE);
idx = 0;
}
public void update(TIntArrayList arr, int meta) {
arrs[idx] = arr;
metas.add(meta);
idx++;
}
}
It sounds like the time taken for file I/O is the main limiting factor, given that serialization (binary format) and your own custom format take about the same time.
Therefore, the best thing you can do is to reduce the size of the file. If your numbers are generally small, then you could get a huge boost from using Google protocol buffers, which will encode small integers generally in one or two bytes.
Or, if you know that all your numbers are in the 0-255 range, you could use a byte[] rather than int[] and cut the size (and hence load time) to a quarter of what it is now. (assuming you go back to serialization or just write to a ByteChannel)
It simply can't take that long. You're working with some 6e9 ints, which means 24 GB. Writing 24 GB to the disk takes some time, but nothing like half an hour.
I'd put all the data in a single one-dimensional array and access it via methods like int getArr(int row, int col) which transform row and col onto a single index. According to how the array gets accessed (usually row-wise or usually column-wise), this index would be computed as N * row + col or N * col + row to maximize locality. I'd also store meta in the same array.
Writing a single huge int[] into memory should be pretty fast, surely no half an hour.
Because of the data amount, the above doesn't work as you can't have a 6e9 entries array. But you can use a couple of big arrays instead and all of the above applies (compute a long index from row and col and split it into two ints for accessing the 2D-array).
Make sure you aren't swapping. Swapping is the most probable reason for the slow speed I can think of.
There are several alternative Java file i/o libraries. This article is a little old, but it gives an overview that's still generally valid. He's reading about 300Mb per second with a 6-year old Mac. So for 4Gb you have under 15 seconds of read time. Of course my experience is that Mac IO channels are very good. YMMV if you have a cheap PC.
Note there is no advantage above a buffer size of 4K or so. In fact you're more likely to cause thrashing with a big buffer, so don't do that.
The implication is that parsing characters into the data you need is the bottleneck.
I have found in other apps that reading into a block of bytes and writing C-like code to extract what I need goes faster than the built-in Java mechanisms like split and regular expressions.
If that still isn't fast enough, you'd have to fall back to a native C extension.
If you randomly pause it you will probably see that the bulk of the time goes into parsing the integers, and/or all the new-ing, as in new int[]{1, 23, 4, 55}. You should be able to just allocate the memory once and stick numbers into it at better than I/O speed if you code it carefully.
But there's another way - why is the file in ASCII?
If it were in binary, you could just slurp it up.
I'm making a rhythm game and I need a quick way to get the length of an ogg file. The only way I could think would be to stream the file really fast without playing it but if I have hundreds of songs this would obviously not be practical. Another way would be to store the length of the file in some sort of properties file but I would like to avoid this. I know there must be some way to do this as most music players can tell you the length of a song.
The quickest way to do it is to seek to the end of the file, then back up to the last Ogg page header you find and read its granulePosition (which is the total number of samples per channel in the file). That's not foolproof (you might be looking at a chained file, in which case you're only getting the last stream's length), but should work for the vast majority of Ogg files out there.
If you need help with reading the Ogg page header, you can read the Jorbis source code... The short version is to look for "OggS", read a byte (should be 0), read a byte (only bit 3 should be set), then read a 64-bit little endian value.
I implemented the solution described by ioctlLR and it seems to work:
double calculateDuration(final File oggFile) throws IOException {
int rate = -1;
int length = -1;
int size = (int) oggFile.length();
byte[] t = new byte[size];
FileInputStream stream = new FileInputStream(oggFile);
stream.read(t);
for (int i = size-1-8-2-4; i>=0 && length<0; i--) { //4 bytes for "OggS", 2 unused bytes, 8 bytes for length
// Looking for length (value after last "OggS")
if (
t[i]==(byte)'O'
&& t[i+1]==(byte)'g'
&& t[i+2]==(byte)'g'
&& t[i+3]==(byte)'S'
) {
byte[] byteArray = new byte[]{t[i+6],t[i+7],t[i+8],t[i+9],t[i+10],t[i+11],t[i+12],t[i+13]};
ByteBuffer bb = ByteBuffer.wrap(byteArray);
bb.order(ByteOrder.LITTLE_ENDIAN);
length = bb.getInt(0);
}
}
for (int i = 0; i<size-8-2-4 && rate<0; i++) {
// Looking for rate (first value after "vorbis")
if (
t[i]==(byte)'v'
&& t[i+1]==(byte)'o'
&& t[i+2]==(byte)'r'
&& t[i+3]==(byte)'b'
&& t[i+4]==(byte)'i'
&& t[i+5]==(byte)'s'
) {
byte[] byteArray = new byte[]{t[i+11],t[i+12],t[i+13],t[i+14]};
ByteBuffer bb = ByteBuffer.wrap(byteArray);
bb.order(ByteOrder.LITTLE_ENDIAN);
rate = bb.getInt(0);
}
}
stream.close();
double duration = (double) (length*1000) / (double) rate;
return duration;
}
Beware, finding the rate this way will work only for vorbis OGG!
Feel free to edit my answer, it may not be perfect.
In java you can create a SourceDataLine like so:
AudioFormat af = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100.0, 16, 1, 2, 44100.0, false);
SourceDataLine sdl = AudioSystem.getSourceDataLine(af);
After which you can open and then write data to it:
byte[] data = new byte[1024];
fillwithsounds(data);
sdl.open()
sdl.start()
sdl.write(data, 0, 1024);
This all works fine for mono data.
What I'd like to do is to be able to write stereo data, and I can't find any documentation online on how I need to change my byte array to be able to write stereo data.
It seems like I need to increase the amount of channels when I create the AudioFormat - to make it stereo - and then I need to half the framerate (otherwise Java throws an IllegalArgumentException)
I don't understand why this is though, or what the new format should be for the data that I feed to the DataSourceLine.
Perhaps somebody with a little more experience in audio formats than I could shed some light on this problem. Thanks in advance!
The format I use for stereo is as follows:
audioFmt = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
44100, 16, 2, 4, 44100, false);
You probably want to double the bytes per frame instead of halving your bits-encoding. I'm not sure what 8-bit encoding sounds like, but it is definitely going to be noisier than 16-bit encoding!
The resulting file is twice as long. You can then take the two-byte pairs that make the 16-bit sample and copy them into the next two positions, for "mono" playback (both stereo channels identical).
Given:
frame = F
little end byte = A
big end byte = B
AB = 16-bit little-endian encoding
left channel = L
right channel = R
Your original mono:
F1A, F1B, F2A, F2B, F3A, F3B ...
Stereo using the above format:
F1AL, F1BL, F1AR, F1BR, F2AL, F2BL, F2AR, F2BR, F3AL, F3BL, F3AR, F3BR ...
I could very well have the order of left and right mixed up. But I hope you get the idea!
I found out the solution just now, and found Andrew Thompson's comment to explain exactly what I needed.
I figured that I'd have to write each frame twice, what caught me up was the fact that Java wouldn't let me just use the frame size I had for my mono channel. (It threw an IllegalArgumentException)
So I halved the framerate to satisfy Java, but I didn't remember to modify the byte array.
I've implemented code that takes the "2 bytes per frame, 1 channel" byte[] and converts it into a "1 byte per frame, 2 channel" byte[].
private static final float LEFT = -1;
private static final float CENTER = 0;
private static final float RIGHT = 1;
private byte[] monoToStereo(byte[] data, float pan){
byte[] output = new byte[data.length];
for (int i = 0; i < (data.length - 2); i+=2){
int currentvalue = (data[i+1]*256 + data[i])/(256*2);
if (pan == LEFT || pan == CENTER){
output[i] = (byte) currentvalue;
} else {
output[i] = 0;
}
if (pan == RIGHT || pan == CENTER){
output[i+1] = (byte) currentvalue;
} else {
output[i+1] = 0;
}
}
return output;
}
Using this, I can get stereo audio to playback (although there is soft static, I can clearly hear the original track)
I have a list of sampled data from the WAV file. I would like to pass in these values into a library and get the frequency of the music played in the WAV file. For now, I will have 1 frequency in the WAV file and I would like to find a library that is compatible with Android. I understand that I need to use FFT to get the frequency domain. Is there any good libraries for that? I found that [KissFFT][1] is quite popular but I am not very sure how compatible it is on Android. Is there an easier and good library that can perform the task I want?
EDIT:
I tried to use JTransforms to get the FFT of the WAV file but always failed at getting the correct frequency of the file. Currently, the WAV file contains sine curve of 440Hz, music note A4. However, I got the result as 441. Then I tried to get the frequency of G4, I got the result as 882Hz which is incorrect. The frequency of G4 is supposed to be 783Hz. Could it be due to not enough samples? If yes, how much samples should I take?
//DFT
DoubleFFT_1D fft = new DoubleFFT_1D(numOfFrames);
double max_fftval = -1;
int max_i = -1;
double[] fftData = new double[numOfFrames * 2];
for (int i = 0; i < numOfFrames; i++) {
// copying audio data to the fft data buffer, imaginary part is 0
fftData[2 * i] = buffer[i];
fftData[2 * i + 1] = 0;
}
fft.complexForward(fftData);
for (int i = 0; i < fftData.length; i += 2) {
// complex numbers -> vectors, so we compute the length of the vector, which is sqrt(realpart^2+imaginarypart^2)
double vlen = Math.sqrt((fftData[i] * fftData[i]) + (fftData[i + 1] * fftData[i + 1]));
//fd.append(Double.toString(vlen));
// fd.append(",");
if (max_fftval < vlen) {
// if this length is bigger than our stored biggest length
max_fftval = vlen;
max_i = i;
}
}
//double dominantFreq = ((double)max_i / fftData.length) * sampleRate;
double dominantFreq = (max_i/2.0) * sampleRate / numOfFrames;
fd.append(Double.toString(dominantFreq));
Can someone help me out?
EDIT2: I manage to fix the problem mentioned above by increasing the number of samples to 100000, however, sometimes I am getting the overtones as the frequency. Any idea how to fix it? Should I use Harmonic Product Frequency or Autocorrelation algorithms?
I realised my mistake. If I take more samples, the accuracy will increase. However, this method is still not complete as I still have some problems in obtaining accurate results for piano/voice sounds.