Android Audio - Streaming sine-tone generator odd behaviour - java

first time poster here. I usually like to find the answer myself (be it through research or trial-and-error), but I'm stumped here.
What I'm trying to do:
I'm building a simple android audio synthesizer. Right now, I'm just playing a sine-tone in real time, with a slider in the UI that changes the tone's frequency as the user adjusts it.
How I've built it:
Basically, I have two threads - a worker thread and an output thread. The worker thread simply fills a buffer with the sine wave data every time its tick() method is called. Once the buffer is filled, it alerts the output thread that the data is ready to be written to the audio track. The reason I am using two threads is because audiotrack.write() blocks, and I want the worker thread to be able to begin processing its data as soon as possible (rather than waiting for the audio track to finish writing). The slider on the UI simply changes a variable in the worker thread, so that any changes to the frequency (via the slider) will be read by the worker thread's tick() method.
What works:
Almost everything; The threads communicate well, there don't seem to be any gaps or clicks in the playback. Despite the large buffer size (thanks android), the responsiveness is OK. The frequency variable does change, as do the intermediate values used during the buffer calculations in the tick() method (verified by Log.i()).
What doesn't work:
For some reason, I can't seem to get a continuous change in audible frequency. When I adjust the slider, the frequency changes in steps, often as wide as fourths or fifths. Theoretically, I should be hearing changes as minute as 1Hz, but I'm not. Oddly enough, it seems as if changes to the slider is causing the sine wave to play through intervals in the harmonic series; However, I can verify that the frequency variable is NOT snapping to integral multiples of the default frequency.
My Audio track is set up as such:
_buffSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT);
_audioTrackOut = new AudioTrack(AudioManager.STREAM_MUSIC, _sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT, _buffSize, AudioTrack.MODE_STREAM);
The worker thread's buffer is being populated (via tick()) as such:
public short[] tick()
{
short[] outBuff = new short[_outBuffSize/2]; // (buffer size in Bytes) / 2
for (int i = 0; i < _outBuffSize/2; i++)
{
outBuff[i] = (short) (Short.MAX_VALUE * ((float) Math.sin(_currentAngle)));
//Update angleIncrement, as the frequency may have changed by now
_angleIncrement = (float) (2.0f * Math.PI) * _freq / _sampleRate;
_currentAngle = _currentAngle + _angleIncrement;
}
return outBuff;
}
The audio data is being written like this:
_audioTrackOut.write(fromWorker, 0, fromWorker.length);
Any help would be greatly appreciated. How can I get more gradual changes in frequency? I'm pretty confident that my logic in tick() is sound, as Log.i() verifies that the variables angleIncrement and currentAngle are being updated properly.
Thank you!
Update:
I found a similar problem here: Android AudioTrack buffering problems
The solution proposed that one must be able to produce samples fast enough for the audioTrack, which makes sense. I lowered my sample rate to 22050Hz, and ran some empirical tests - I can fill my buffer (via tick()) in approximately 6ms in the worst case. This is more than adequate. At 22050Hz, the audioTrack gives me a buffer size of 2048 samples (or 4096 Bytes). So, each filled buffer lasts for ~0.0928 seconds of audio, which is much longer than it takes to create the data (1~6 ms). SO, I know that I don't have any problems producing samples fast enough.
I should also note that for about the first 3 seconds of the applications lifecycle, it works fine - a smooth sweep of the slider produces a smooth sweep in the audio output. After this, it starts to get really choppy (sound only changes about every 100Mhz), and after that, it stops responding to slider input at all.
I also fixed one bug, but I don't think it has an effect. AudioTrack.getMinBufferSize() returns the smallest allowable buffer size in BYTES, and I was using this number as the length of the buffer in tick() - I now use half this number (2 Bytes per sample).

I've found it!
It turns out the problem has nothing to do with buffers or threading.
It sounds fine in the first couple of seconds, because the angle of the computation is relatively small. As the program runs and the angle grows, Math.sin(_currentAngle) begins to produce unreliable values.
So, I replaced Math.sin() with FloatMath.sin().
I also replaced
_currentAngle = _currentAngle + _angleIncrement;
with
_currentAngle = ((_currentAngle + _angleIncrement) % (2.0f * (float) Math.PI));, so the angle is always < 2*PI.
Works like a charm! Thanks very much for your help, praetorian droid!

Related

Consistent popping sound while playing audio through a SourceDataLine

I'm writing a basic synth at the moment and have run into a bit of a strange problem. I get a constant popping sound while playing an array of bytes, representing 16 bit mono audio, through a SourceDataLine.
The pops play at a constant rate, and from what I can hear, pitch. The pops do slightly differ in frequencies though (again, from what I can hear), some notes have low-passed sounding pops, and others sound high-passed. The pops are not overriding though, you can still hear the desired sound in the background.
Nothing changes the rate of the pops, not note pitch, not the SourceDataLine buffer size, not the number of bytes I write to it at a time, except sample rate.
Lowering the sample rate decreases the rate of the pops and vice-versa.
To test my side of the program, I printed out the data being written to the SourceDataLine for about half a second and looked through around 15 cycles of the played sine wave, and it was completely fine; no sudden jumps, clipping, or anything else.
The only two things I use the value of the sample rate for is some basic math to help my sampler sample at the correct frequency, which is only calculated once for each note, and is definitely working as pitch is perfect, and for creating the SourceDataLine.
Here's how I'm starting the SourceDataLine (Taken from multiple parts of the main method):
AudioFormat format = new AudioFormat(AudioEnvironment.SAMPLE_RATE, AudioEnvironment.BIT_DEPTH, 1, true, true);
SourceDataLine line = AudioSystem.getSourceDataLine(format);
line.open(format, 8000);
line.start();
My data is correctly in big-endian, tested by me changing the endian flag in the constructor and getting my ears blasted with white noise.
After the program has set everything up, it constantly writes data to the SourceDataLine in this infinite loop:
while (true) {
for (Channel channel : channelSystem.getChannels()) {
if (channel.pitch != 0) {
wave.sample(channel, buffer);
line.write(buffer, 0, AudioEnvironment.SUB_BUFFER_SIZE * 2);
}
}
}
(A Channel is a class I created that contains all the data for a single note (Though obviously the program is not set up correctly for polyphony at the moment), buffer is an array of bytes, wave.sample() is where I sample my data into buffer, and AudioEnvironment.SUB_BUFFER_SIZE * 2 is the size of buffer)
I don't necessarily need an example of how to fix this in code, but an explanation of why this might be happening would be great.
EDIT: Something I should also probably add is that I've tried putting a print statement in the infinite write loop to print out the number of available bytes in the SourceDataLine, and it stays constantly around 500 - 2000, occasionally getting up to around 5000, but never near 8000, so the buffer is never running out of data.
Well as it turns out, the problem was completely unrelated to what I thought it might be.
Turns out there was a single equation I had written in my sampler that was just blatantly wrong.
After 2048 samples had been played, I would just kinda loop back to the beginning of the waveform, causing the popping.
I honestly have no idea why I wrote that in, but hey, it works now.

Java Audio Byte Buffer takes varying times to fill

I am opening a targetdataline to accept audio input for a given format.
I start and open the line, and I have a buffer which fills with bytes. This runs on a constant loop until an external parameter is changed.
Now for a fixed sample rate and buffer size, I would expect this to always take the same amount of time to fill, ie if my buffer size was 48000 for an 8 bit stream, and my sample rate was 48kHz, I would expect my buffer to always take 1 second to fill. However I am finding this varying greatly.
The following is the code I have used:
DataLine.Info info1 = new DataLine.Info(TargetDataLine.class, format1);
try (TargetDataLine line = (TargetDataLine) m1.getLine(info1)) {
line.open(format1);
line.start();
while (!pauseInput){
long time1 = System.currentTimeMillis();
int numBytesRead1 = line.read(buffer1, 0, buffer1.length);
//chan1double = deinterleaveAudio(buffer1, chan1selectedchannel, chan1totalchannels);
long time2 = System.currentTimeMillis();
System.out.println(threadName + " Capture time = " + (time2-time1));
}
line.stop();
}
The commented line is a process I want to run each time the buffer is full. I realise I cannot place this here as it will interrupt the stream, so I need to find a different way to call this, hence I have commented out.
For testing purposes I have a buffer size of 4096. My audio format is 48kHz 16-bit, so I would expect my byte buffer to be filled in 42.6ms. ((1/48000) * 2048). (this is multiplied by half the buffer size as each sample is two bytes). However using the currentTimeMillies to measure each pass it is coming back with 123ms and 250ms and varying between those times.
Is there something I am missing out here that I have not done?
EDIT: I have copied just the code into a brand new application that doesn't even have a GUI or anything attached to it. Purely to output to the console and see what is happening, making sure there are no background threads to interfere, and sure enough the same happens. 95% of the time the buffer with predicted fill time of 250ms fills within 255-259ms. However occasionally this will drop to 127ms (which is physically impossible unless there is some weird buffer thing going on. Is this a bug in java somewhere?
I don't think it is a good idea to adjust timing such a way. It depends on many things e.g., bufferSize, mixer, etc. Moreover, your application is sharing the line's buffer with the mixer. If you have a real-time processing, store your data in a circular buffer with a length that is good enough to hold the amount of data that you need. In another thread, read the desired amount of data from the circular buffer, and do your processing at a constant time interval. Thus, sometimes, you may overlap or miss some bytes between two consecutive processings, but you always have the expected amount of bytes.
When you open the line, you can specify the line's buffer size by using open(format, bufferSize) or you can check actual buffer size by
calling DataLine.getBufferSize(). Then you need to specify the size of your short buffer that you are providing when you retrieve data through TargetDataLine.read(). Your short buffer size has to be smaller than the line's buffer size. I would consider short buffer size as 1/4th, 1/8th, 1/16th or so of the line's buffer size. Another idea is checking the available bytes DataLine.available() before calling read(). Note that read() is a blocking call (but it doesn't block line's buffer), i.e., it will be stuck until the requested amount of bytes have been read.
For low latency direct communication between your application and audio interface, you may consider ASIO.
For anyone looking at the same issue, I have been given an answer which half explains what is happening.
The thread scheduler decides when the code can run, and this can cause this to vary by 10-20ms. In the earlier days this was as much as 70ms.
This does not mean the stream is missing samples, but just that this buffer will not provide a continuous stream. So any application look at processing this data in realtime and passing it to be written to an audio output stream needs to be aware of this extra potential latency.
I am still looking at the reason for the short buffer fill time, every four or five passes. I was told it could be to do with the targetDataLine buffer size being different to my buffer size and just the remainder of that buffer being written on that pass, however I have changed this to be exactly the same and still no luck.

When using createBufferStrategy() in java, does it help to have more than 2 buffers? Is there a downside?

It seems like most people recommend just using 2 or 3. Is that just because more than 3 takes up too much processing power or something (forgive me I'm kind of new to this)? In what kind of programs would you use more than 3 buffers?
2 or 3 works fine for my program, I'm just curious.
Actually its pretty easy to understand once you know the benefits of buffer strategies.
Let's take a brief look at what happens in all three cases.
In single buffering you have only one display to write image data to. In contrast double buffering has two displays, the front and back buffer.
Commonly the rendering and logical process of an application are split and run in parallel (for example the monitor and the graphic card). Lets say the rendering process has a poll rate that displays an image every 15ms on the monitor. Imagine the logical process is currently performing some image manipulations (drawing a circle) but is not finished at the moment (the circle is only drawn half). In single buffering you will now see a half circle on the screen as the rendering process displays an unfinished image.
In double buffering the logical process will only write to the back buffer and only if it has ended the drawing process it will mark the back buffer as finished. Then you swap the contents of the back buffer with the front buffer, the rendering process will now display a finished image, you will not see any artifacts.
So the advantage of double buffering is that the user will not see any artefacts nor experience stuff like flickering or so.
However this comes at a cost of increased running time (swapping operation) and especially an increased cost of space (2x the whole image).
Now while tripple buffering comes at even more cost (3x image space) it will speed up the process. Here you have two back buffer and one front buffer.
Imagine you use double buffering and currently you are swapping the back buffer to the front buffer as you have just finished a drawing operation. This can take some time and meanwhile your graphic card (which is very fast - faster than your software code that exchanges the buffers) could begin with the next drawing operation but it can't as the buffer is blocked.
With triple buffering the graphic card could now just start drawing to the other buffer as one back buffer is always free and not involved in any swapping mechanism.
Now that we know how things work it is also clear why you do not see buffer solutions with more than 3 buffers (at least not in common applications) - it simply has no benefit from the view of the general concept.
Increased amounts of buffers can be seen when dealing with 3D virtual reality stuff (stereoscopic images) for example, you could use double buffering for the left and for the right channel respectively ending with quad buffering in total.
As a last side note vsync means synchronizing the swapping of back and front buffer with the poll rate of your monitor to minimize tearing effects.

Noise cancelling program

If you were to write a program that takes microphone input, reverses it (sets it out of phase by making 1's 0's and 0's 1's), and plays it back out of the speakers, could that cancel out sound? Wave physics says if crests align with troughs, destructive interference occurs, so can that be utilized here to achieve a lessened noise if not canceled out "completely." I can imagine that this wouldn't work due to either complication in reversing the audio, or even because it takes too long to reverse and play back, so that the sound wave has passed. If i had to associate a language to do this in it would have to be either c++ or java (I'm at least competent in both).
Yes it will cancel out sound. That's more or less how Surround Sound works: by subtracting the left/right channels, playing that in the 3rd speaker, and inverting the samples, playing those out of the 4th you get interesting spatial effects.
Also you wouldn't simply want to toggle all bits, you'd get noise; instead you want to negate.
With a small sample buffer you'd be fast enough to cancel out waves of certain frequencies. When these attack and decay, you'll be lagging, but as long as the wave sustains you can effectively cancel it out.
With bigger sample buffers, obviously the delay increases, since it takes longer to fill the buffer with samples. The size of the buffer determines how often a device interrupt occurs where the program would copy the input samples to an output buffer while applying an operation to them.
Typically recordings are made at 44.1kHz, meaning that many samples per second. If you set the buffer to say 256 samples, you would get notified 44100/256 times a second that there are 256 samples to be processed.
At 256 samples you'd lag behind 256/44100 = 0.0058 seconds or 5.8 milliseconds. Sound travels at around 340 m/s, so the sound wave would have moved 1.97 meters (340 * 5.8ms). This wavelength corresponds with the frequency 172 Hz (44100/256). That means that you can only effectively cancel out frequencies that have a lower frequency than that, because those of a higher frequency 'move' more than once during 5.8ms and are thus above the maximum 'sample rate', if you will.
For 64 samples, the frequency would be 44100/64 = 689 Hz. And, this is the maximum frequency! That means you could cancel out bass and the base frequency of the human voice, but not the harmonics.
A typical OS has it's clock frequency set to either 500, 1000, or 2000 Hz, meaning at best you could use a sample buffer of around two to three samples, giving you a maximum frequency of 500, 1000, or 2000 Hz. Telephones usually have a maximum frequency of about 3500 Hz.
You could get the system clock up to around 32kHz, and poll an ADC directly to reach such frequencies. However, you'd probably need to solder one to your LPT and run a custom OS, which means Java is out of the question, or use a pre-fab real-time embedded system that runs Java (see the comment by #zapl for links).
One thing I forgot to mention, is that you will need to take into account the position of the sound source, the microphone, and the speaker. Ideally all 3 are in the same place, so there is no delay. But this is almost never the case, which means you'd get an interference pattern: there will be spots in the room where the sound is cancelled, inbetween spots where it is not.
You cannot do this in software, with c++, or even assembly - the latency of just mirroring the the output on the speakers would be more than 6 ms on most computers. Even if you had a latency of only 0.1 ms, the resulting sound (assuming it is perfectly mixed) would at best sound like it was sampled at 10kHz (not very good).

Some signal processing /FFT questions

I need some help confirming some basic DSP steps. I'm in the process of implementing some smartphone accelerometer sensor signal processing software, but I've not worked in DSP before.
My program collects accelerometer data in real time at 32 Hz. The output should be the principal frequencies of the signal.
My specific questions are:
From the real-time stream, I am collecting a 256-sample window with 50% overlap, as I've read in the literature. That is, I add in 128 samples at a time to fill up a 256-sample window. Is this a correct approach?
The first figure below shows one such 256-sample window. The second figure shows the sample window after I applied a Hann/Hamming window function. I've read that applying a window function is a typical approach, so I went ahead and did it. Should I be doing so?
The third window shows the power spectrum (?) from the output of an FFT library. I am really cobbling together bits and pieces I've read. Am I correct in understanding that the spectrum goes up to 1/2 the sampling rate (in this case 16 Hz, since my sampling rate is 32 Hz), and the value of each spectrum point is spectrum[i] = sqrt(real[i]^2 + imaginary[i]^2)? Is this right?
Assuming what I did in question 3 is correct, is my understanding right that the third figure shows principal frequencies of about 3.25 Hz and 8.25 Hz? I know from collecting the data that I was running at about 3 Hz, so the spike at 3.25 Hz seems right. So there must be some noise other other factors causing the (erroneous) spike at 8.25 Hz. Are there any filters or other methods I can use to smooth away this and other spikes? If not, is there a way to determine "real" spikes from erroneous spikes?
Making a decision on sample size and overlap is always a compromise between frequency accuracy and timeliness: the bigger the sample, the more FFT bins and hence absolute accuracy, but it takes longer. I'm guessing you want regular updates on the frequency you're detecting, and absolute accuracy is not too important: so a 256 sample FFT seems a pretty good choice. Having an overlap will give a higher resolution on the same data, but at the expense of processing: again, 50% seems fine.
Applying a window will stop frequency artifacts appearing due to the abrupt start and finish of the sample (you are effectively applying a square window if you do nothing). A Hamming window is fairly standard as it gives a good compromise between having sharp signals and low side-lobes: some windows will reject the side-lobes better (multiples of the detected frequency) but the detected signal will be spread over more bins, and others the opposite. On a small sample size with the amount of noise you have on your signal, I don't think it really matters much: you might as well stick with a Hamming window.
Exactly right: the power spectrum is the square-root of the sum of the squares of the complex values. Your assumption about the Nyquist frequency is true: your scale will go up to 16Hz. I assume you are using a real FFT algorithm, which is returning 128 complex values (an FFT will give 256 values back, but because you are giving it a real signal, half will be an exact mirror image of the other), so each bin is 16/128 Hz wide. It is also common to show the power spectrum on a log scale, but that's irrelevant if you're just peak detecting.
The 8Hz spike really is there: my guess is that a phone in a pocket of a moving person is more than a 1st order system, so you are going to have other frequency components, but should be able to detect the primary. You could filter it out, but that's pointless if you are taking an FFT: just ignore those bins if you are sure they are erroneous.
You seem to be getting on fine. The only suggestion I would make is to develop some longer time heuristics on the results: look at successive outputs and reject short-term detected signals. Look for a principal component and see if you can track it as it moves around.
To answer a few of your questions:
Yes, you should be applying a window function. The idea here is that when you start and stop sampling a real-world signal, what you're doing anyway is applying a sharp rectangular window. Hann and Hamming windows are much better at reducing frequencies you don't want, so this is a good approach.
Yes, the strongest frequencies are around 3 and 8 Hz. I don't think the 8 Hz spike is erroneous. With such a short data set you almost certainly can't control the exact frequencies your signal will have.
Some insight on question 4 (from staring at accelerometer signals of people running for months of my life):
Are you running this analysis on a single accelerometer axis channel or are you combining them to create the magnitude of acceleration? If you are interested in the overall magnitude of acceleration of signal, then you should combine x y z such as mag_acc = sqrt((x - 0g_offset)^2 + (y - 0g_offset)^2 + (z - 0g_offset)^2). This signal should be at 1g when the device is still. If you are only looking at a single axis, then you will get components from the dominant running motion and also from the orientation of the phone changing contributing to your signal (because the contribution from gravity will be transitioning around). So if the phone orientation is moving around while you are running from how you are holding it, it can contribute a significant amount to the signal, but the magnitude will not show the orientation changes as much. A person running should have a really clean dominant frequency at the persons step rate.

Categories