I'm writing a basic synth at the moment and have run into a bit of a strange problem. I get a constant popping sound while playing an array of bytes, representing 16 bit mono audio, through a SourceDataLine.
The pops play at a constant rate, and from what I can hear, pitch. The pops do slightly differ in frequencies though (again, from what I can hear), some notes have low-passed sounding pops, and others sound high-passed. The pops are not overriding though, you can still hear the desired sound in the background.
Nothing changes the rate of the pops, not note pitch, not the SourceDataLine buffer size, not the number of bytes I write to it at a time, except sample rate.
Lowering the sample rate decreases the rate of the pops and vice-versa.
To test my side of the program, I printed out the data being written to the SourceDataLine for about half a second and looked through around 15 cycles of the played sine wave, and it was completely fine; no sudden jumps, clipping, or anything else.
The only two things I use the value of the sample rate for is some basic math to help my sampler sample at the correct frequency, which is only calculated once for each note, and is definitely working as pitch is perfect, and for creating the SourceDataLine.
Here's how I'm starting the SourceDataLine (Taken from multiple parts of the main method):
AudioFormat format = new AudioFormat(AudioEnvironment.SAMPLE_RATE, AudioEnvironment.BIT_DEPTH, 1, true, true);
SourceDataLine line = AudioSystem.getSourceDataLine(format);
line.open(format, 8000);
line.start();
My data is correctly in big-endian, tested by me changing the endian flag in the constructor and getting my ears blasted with white noise.
After the program has set everything up, it constantly writes data to the SourceDataLine in this infinite loop:
while (true) {
for (Channel channel : channelSystem.getChannels()) {
if (channel.pitch != 0) {
wave.sample(channel, buffer);
line.write(buffer, 0, AudioEnvironment.SUB_BUFFER_SIZE * 2);
}
}
}
(A Channel is a class I created that contains all the data for a single note (Though obviously the program is not set up correctly for polyphony at the moment), buffer is an array of bytes, wave.sample() is where I sample my data into buffer, and AudioEnvironment.SUB_BUFFER_SIZE * 2 is the size of buffer)
I don't necessarily need an example of how to fix this in code, but an explanation of why this might be happening would be great.
EDIT: Something I should also probably add is that I've tried putting a print statement in the infinite write loop to print out the number of available bytes in the SourceDataLine, and it stays constantly around 500 - 2000, occasionally getting up to around 5000, but never near 8000, so the buffer is never running out of data.
Well as it turns out, the problem was completely unrelated to what I thought it might be.
Turns out there was a single equation I had written in my sampler that was just blatantly wrong.
After 2048 samples had been played, I would just kinda loop back to the beginning of the waveform, causing the popping.
I honestly have no idea why I wrote that in, but hey, it works now.
Related
I'm implementing an application which records and analyzes audio in real time (or at least as close to real time as possible), using the JDK Version 8 Update 201. While performing a test which simulates typical use cases of the application, I noticed that after several hours of recording audio continuously, a sudden delay of somewhere between one and two seconds was introduced. Up until this point there was no noticeable delay. It was only after this critical point of recording for several hours when this delay started to occur.
What I've tried so far
To check if my code for timing the recording of the audio samples is wrong, I commented out everything related to timing. This left me essentially with this update loop which fetches audio samples as soon as they are ready (Note: Kotlin code):
while (!isInterrupted) {
val audioData = read(sampleSize, false)
listener.audioFrameCaptured(audioData)
}
This is my read method:
fun read(samples: Int, buffered: Boolean = true): AudioData {
//Allocate a byte array in which the read audio samples will be stored.
val bytesToRead = samples * format.frameSize
val data = ByteArray(bytesToRead)
//Calculate the maximum amount of bytes to read during each iteration.
val bufferSize = (line.bufferSize / BUFFER_SIZE_DIVIDEND / format.frameSize).roundToInt() * format.frameSize
val maxBytesPerCycle = if (buffered) bufferSize else bytesToRead
//Read the audio data in one or multiple iterations.
var bytesRead = 0
while (bytesRead < bytesToRead) {
bytesRead += (line as TargetDataLine).read(data, bytesRead, min(maxBytesPerCycle, bytesToRead - bytesRead))
}
return AudioData(data, format)
}
However, even without any timing from my side the problem was not resolved. Therefore, I went on to experiment a bit and let the application run using different audio formats, which lead to very confusing results (I'm going to use a PCM signed 16 bit stereo audio format with little endian and a sample rate of 44100.0 Hz as default, unless specified otherwise):
The critical amount of time that has to pass before the delay appears seems to be different depending on the machine used. On my Windows 10 desktop PC it is somewhere between 6.5 and 7 hours. On my laptop (also using Windows 10) however, it is somewhere between 4 and 5 hours for the same audio format.
The amount of audio channels used seems to have an effect. If I change the amount of channels from stereo to mono, the time before the delay starts to appear is doubled to somewhere between 13 and 13.5 hours on my desktop.
Decreasing the sample size from 16 bits to 8 bits also results in a doubling of the time before the delay starts to appear. Somewhere between 13 and 13.5 hours on my desktop.
Changing the byte order from little endian to big endian has no effect.
Switching from stereomix to a physical microphone has no effect either.
I tried opening the line using different buffer sizes (1024, 2048 and 3072 sample frames) as well as its default buffer size. This also didn't change anything.
Flushing the TargetDataLine after the delay has started to occur results in all bytes being zero for approximately one to two seconds. After this I get non-zero values again. The delay, however, is still there. If I flush the line before the critical point, I don't get those zero-bytes.
Stopping and restarting the TargetDataLine after the delay appeared also does not change anything.
Closing and reopening the TargetDataLine, however, does get rid of the delay until it reappears after several hours from there on.
Automatically flushing the TargetDataLines internal buffer every ten minutes does not help to resolve the issue. Therefore, a buffer overflow in the internal buffer does not seem to be the cause.
Using a parallel garbage collector to avoid application freezes also does not help.
The used sample rate seems to be important. If I double the sample rate to 88200 Hertz, the delay starts occurring somewhere between 3 and 3.5 hours of runtime.
If I let it run under Linux using my "default" audio format, it still runs fine after about 9 hours of runtime.
Conclusions that I've drawn:
These results let me come to the conclusion that the time for which I can record audio before this issue starts to happen is dependent on the machine on which the application is run and dependent on the byte rate (i.e. frame size and sample rate) of the audio format. This seems to hold true (although I can't completely confirm this as of now) because if I combine the changes made in 2 and 3, I would assume that I can record audio samples for four times as long (which would be somewhere between 26 and 27 hours) as when using my "default" audio format before the delay starts to appear. As I didn't find the time to let the application run for this long yet, I can only tell that it did run fine for about 15 hours before I had to stop it due to time constraints on my side. So, this hypothesis is still to be confirmed or denied.
According to the result of bullet point 13, it seems like the whole issue only appears when using Windows. Therefore, I think that it might be a bug in the platform specific parts of the javax.sound.sampled API.
Even though I think I might have found a way to change when this issue starts to happen, I'm not satisfied with the result. I could periodically close and reopen the line to avoid the problem from starting to appear at all. However, doing this would result in some arbitrary small amount of time where I wouldn't be able to capture audio samples. Furthermore, the Javadoc states that some lines can't be reopened at all after being closed. Therefore, this is not a good solution in my case.
Ideally, this whole issue shouldn't be happening at all. Is there something I am completely missing or am I experiencing limitations of what is possible with the javax.sound.sampled API? How can I get rid of this issue at all?
Edit: By suggestion of Xtreme Biker and gidds I created a small example application. You can find it inside this Github repository.
I have (a rather) vast experience with Java audio interfacing.
Here are a few points that may be useful in guiding you towards a proper solution:
It's not a matter of JVM version - the java audio system have barely been upgraded since Java 1.3 or 1.5
The java audio system is a poor-man's wrapper around whatever audio interface API the operating system has to offer. In linux it's the Pulseaudio library, For windows, it's the direct show audio API (if I'm not mistaken about the latter).
Again, the audio system API is kind of a legacy API - some of the features are not working or not implemented, other behaviors are straight out weird, as they are dependent on an obsolete design (I can provide examples, if required).
It's not a matter of Garbage Collection - If your definition of "delay" is what I understand it to be (audio data is delayed by 1-2 seconds, meaning you start hearing stuff 1-2 seconds later), well, the garbage collector cannot cause blank data to magically be captured by the target data line and then append data as usual in an 2 seconds worth byte offset.
What's most likely happening here is either the hardware or driver providing you with 2 seconds worth of garbled data at some point, and then, streams the rest of the data as usual, resulting in the "delay" you are experiencing.
The fact that it works perfectly on linux means it's not a hardware issue, but rather a driver related one.
To affirm that suspicion, you can try capturing audio via FFmpeg for the same duration and see if the issue is reproduced.
If you are using specialized audio capturing hardware, better approach your hardware manufacturer and inquire him about the issue you are facing on windows.
In any case, when writing an audio capturing application from scratch I'd strongly suggest keeping away from the Java audio-system if possible. It's nice for POCs, but it's an un-maintained legacy API. JNA is always a viable option (I've used it in Linux with ALSA/Pulse-audio to control audio hardware attributes the Java audio system could not alter), so you could look for audio capturing examples in C++ for windows and translate them to Java. It'll give you fine grain control over audio capture devices, much more than what the JVM provide OOTB. If you want to have a look at a living/breathing usable JNA example, check out my JNA AAC encoder project.
Again, if you use special capturing harwdare, there's a good chance the manufacturer already provides it's own low-level C api for interfacing with the hardware, and you should consider having a look at it as well.
If that's not the case, maybe you and your company/client should
consider using specialized capturing hardware (doesn't have to be
that expensive).
I am opening a targetdataline to accept audio input for a given format.
I start and open the line, and I have a buffer which fills with bytes. This runs on a constant loop until an external parameter is changed.
Now for a fixed sample rate and buffer size, I would expect this to always take the same amount of time to fill, ie if my buffer size was 48000 for an 8 bit stream, and my sample rate was 48kHz, I would expect my buffer to always take 1 second to fill. However I am finding this varying greatly.
The following is the code I have used:
DataLine.Info info1 = new DataLine.Info(TargetDataLine.class, format1);
try (TargetDataLine line = (TargetDataLine) m1.getLine(info1)) {
line.open(format1);
line.start();
while (!pauseInput){
long time1 = System.currentTimeMillis();
int numBytesRead1 = line.read(buffer1, 0, buffer1.length);
//chan1double = deinterleaveAudio(buffer1, chan1selectedchannel, chan1totalchannels);
long time2 = System.currentTimeMillis();
System.out.println(threadName + " Capture time = " + (time2-time1));
}
line.stop();
}
The commented line is a process I want to run each time the buffer is full. I realise I cannot place this here as it will interrupt the stream, so I need to find a different way to call this, hence I have commented out.
For testing purposes I have a buffer size of 4096. My audio format is 48kHz 16-bit, so I would expect my byte buffer to be filled in 42.6ms. ((1/48000) * 2048). (this is multiplied by half the buffer size as each sample is two bytes). However using the currentTimeMillies to measure each pass it is coming back with 123ms and 250ms and varying between those times.
Is there something I am missing out here that I have not done?
EDIT: I have copied just the code into a brand new application that doesn't even have a GUI or anything attached to it. Purely to output to the console and see what is happening, making sure there are no background threads to interfere, and sure enough the same happens. 95% of the time the buffer with predicted fill time of 250ms fills within 255-259ms. However occasionally this will drop to 127ms (which is physically impossible unless there is some weird buffer thing going on. Is this a bug in java somewhere?
I don't think it is a good idea to adjust timing such a way. It depends on many things e.g., bufferSize, mixer, etc. Moreover, your application is sharing the line's buffer with the mixer. If you have a real-time processing, store your data in a circular buffer with a length that is good enough to hold the amount of data that you need. In another thread, read the desired amount of data from the circular buffer, and do your processing at a constant time interval. Thus, sometimes, you may overlap or miss some bytes between two consecutive processings, but you always have the expected amount of bytes.
When you open the line, you can specify the line's buffer size by using open(format, bufferSize) or you can check actual buffer size by
calling DataLine.getBufferSize(). Then you need to specify the size of your short buffer that you are providing when you retrieve data through TargetDataLine.read(). Your short buffer size has to be smaller than the line's buffer size. I would consider short buffer size as 1/4th, 1/8th, 1/16th or so of the line's buffer size. Another idea is checking the available bytes DataLine.available() before calling read(). Note that read() is a blocking call (but it doesn't block line's buffer), i.e., it will be stuck until the requested amount of bytes have been read.
For low latency direct communication between your application and audio interface, you may consider ASIO.
For anyone looking at the same issue, I have been given an answer which half explains what is happening.
The thread scheduler decides when the code can run, and this can cause this to vary by 10-20ms. In the earlier days this was as much as 70ms.
This does not mean the stream is missing samples, but just that this buffer will not provide a continuous stream. So any application look at processing this data in realtime and passing it to be written to an audio output stream needs to be aware of this extra potential latency.
I am still looking at the reason for the short buffer fill time, every four or five passes. I was told it could be to do with the targetDataLine buffer size being different to my buffer size and just the remainder of that buffer being written on that pass, however I have changed this to be exactly the same and still no luck.
If you were to write a program that takes microphone input, reverses it (sets it out of phase by making 1's 0's and 0's 1's), and plays it back out of the speakers, could that cancel out sound? Wave physics says if crests align with troughs, destructive interference occurs, so can that be utilized here to achieve a lessened noise if not canceled out "completely." I can imagine that this wouldn't work due to either complication in reversing the audio, or even because it takes too long to reverse and play back, so that the sound wave has passed. If i had to associate a language to do this in it would have to be either c++ or java (I'm at least competent in both).
Yes it will cancel out sound. That's more or less how Surround Sound works: by subtracting the left/right channels, playing that in the 3rd speaker, and inverting the samples, playing those out of the 4th you get interesting spatial effects.
Also you wouldn't simply want to toggle all bits, you'd get noise; instead you want to negate.
With a small sample buffer you'd be fast enough to cancel out waves of certain frequencies. When these attack and decay, you'll be lagging, but as long as the wave sustains you can effectively cancel it out.
With bigger sample buffers, obviously the delay increases, since it takes longer to fill the buffer with samples. The size of the buffer determines how often a device interrupt occurs where the program would copy the input samples to an output buffer while applying an operation to them.
Typically recordings are made at 44.1kHz, meaning that many samples per second. If you set the buffer to say 256 samples, you would get notified 44100/256 times a second that there are 256 samples to be processed.
At 256 samples you'd lag behind 256/44100 = 0.0058 seconds or 5.8 milliseconds. Sound travels at around 340 m/s, so the sound wave would have moved 1.97 meters (340 * 5.8ms). This wavelength corresponds with the frequency 172 Hz (44100/256). That means that you can only effectively cancel out frequencies that have a lower frequency than that, because those of a higher frequency 'move' more than once during 5.8ms and are thus above the maximum 'sample rate', if you will.
For 64 samples, the frequency would be 44100/64 = 689 Hz. And, this is the maximum frequency! That means you could cancel out bass and the base frequency of the human voice, but not the harmonics.
A typical OS has it's clock frequency set to either 500, 1000, or 2000 Hz, meaning at best you could use a sample buffer of around two to three samples, giving you a maximum frequency of 500, 1000, or 2000 Hz. Telephones usually have a maximum frequency of about 3500 Hz.
You could get the system clock up to around 32kHz, and poll an ADC directly to reach such frequencies. However, you'd probably need to solder one to your LPT and run a custom OS, which means Java is out of the question, or use a pre-fab real-time embedded system that runs Java (see the comment by #zapl for links).
One thing I forgot to mention, is that you will need to take into account the position of the sound source, the microphone, and the speaker. Ideally all 3 are in the same place, so there is no delay. But this is almost never the case, which means you'd get an interference pattern: there will be spots in the room where the sound is cancelled, inbetween spots where it is not.
You cannot do this in software, with c++, or even assembly - the latency of just mirroring the the output on the speakers would be more than 6 ms on most computers. Even if you had a latency of only 0.1 ms, the resulting sound (assuming it is perfectly mixed) would at best sound like it was sampled at 10kHz (not very good).
I want to play a certain parts of a wav file. Like playing the first ten seconds and then playing it from 50th-60th seconds and so on. I know how to play a entire wave file in Java using the start method of SourceDataLine class. Could anybody give me some pointers as to how I can seek a particular time position for audio and play it?
Find the length of a frame, in bytes, from the AudioFormat
Find the length in bytes of a second, by multiplying the frame size by the frame rate.
skip() that amount of bytes.
Play until the 2nd number of bytes calculated using the same formula.
As far as I can see, nothing happens when you just call start. You are responsible for pushing the bytes of your choice into the line. So open a RandomAccessFile, seek to the appropriate offset, and execute a loop that transports the file data to the SourceDataLine.
first time poster here. I usually like to find the answer myself (be it through research or trial-and-error), but I'm stumped here.
What I'm trying to do:
I'm building a simple android audio synthesizer. Right now, I'm just playing a sine-tone in real time, with a slider in the UI that changes the tone's frequency as the user adjusts it.
How I've built it:
Basically, I have two threads - a worker thread and an output thread. The worker thread simply fills a buffer with the sine wave data every time its tick() method is called. Once the buffer is filled, it alerts the output thread that the data is ready to be written to the audio track. The reason I am using two threads is because audiotrack.write() blocks, and I want the worker thread to be able to begin processing its data as soon as possible (rather than waiting for the audio track to finish writing). The slider on the UI simply changes a variable in the worker thread, so that any changes to the frequency (via the slider) will be read by the worker thread's tick() method.
What works:
Almost everything; The threads communicate well, there don't seem to be any gaps or clicks in the playback. Despite the large buffer size (thanks android), the responsiveness is OK. The frequency variable does change, as do the intermediate values used during the buffer calculations in the tick() method (verified by Log.i()).
What doesn't work:
For some reason, I can't seem to get a continuous change in audible frequency. When I adjust the slider, the frequency changes in steps, often as wide as fourths or fifths. Theoretically, I should be hearing changes as minute as 1Hz, but I'm not. Oddly enough, it seems as if changes to the slider is causing the sine wave to play through intervals in the harmonic series; However, I can verify that the frequency variable is NOT snapping to integral multiples of the default frequency.
My Audio track is set up as such:
_buffSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT);
_audioTrackOut = new AudioTrack(AudioManager.STREAM_MUSIC, _sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT, _buffSize, AudioTrack.MODE_STREAM);
The worker thread's buffer is being populated (via tick()) as such:
public short[] tick()
{
short[] outBuff = new short[_outBuffSize/2]; // (buffer size in Bytes) / 2
for (int i = 0; i < _outBuffSize/2; i++)
{
outBuff[i] = (short) (Short.MAX_VALUE * ((float) Math.sin(_currentAngle)));
//Update angleIncrement, as the frequency may have changed by now
_angleIncrement = (float) (2.0f * Math.PI) * _freq / _sampleRate;
_currentAngle = _currentAngle + _angleIncrement;
}
return outBuff;
}
The audio data is being written like this:
_audioTrackOut.write(fromWorker, 0, fromWorker.length);
Any help would be greatly appreciated. How can I get more gradual changes in frequency? I'm pretty confident that my logic in tick() is sound, as Log.i() verifies that the variables angleIncrement and currentAngle are being updated properly.
Thank you!
Update:
I found a similar problem here: Android AudioTrack buffering problems
The solution proposed that one must be able to produce samples fast enough for the audioTrack, which makes sense. I lowered my sample rate to 22050Hz, and ran some empirical tests - I can fill my buffer (via tick()) in approximately 6ms in the worst case. This is more than adequate. At 22050Hz, the audioTrack gives me a buffer size of 2048 samples (or 4096 Bytes). So, each filled buffer lasts for ~0.0928 seconds of audio, which is much longer than it takes to create the data (1~6 ms). SO, I know that I don't have any problems producing samples fast enough.
I should also note that for about the first 3 seconds of the applications lifecycle, it works fine - a smooth sweep of the slider produces a smooth sweep in the audio output. After this, it starts to get really choppy (sound only changes about every 100Mhz), and after that, it stops responding to slider input at all.
I also fixed one bug, but I don't think it has an effect. AudioTrack.getMinBufferSize() returns the smallest allowable buffer size in BYTES, and I was using this number as the length of the buffer in tick() - I now use half this number (2 Bytes per sample).
I've found it!
It turns out the problem has nothing to do with buffers or threading.
It sounds fine in the first couple of seconds, because the angle of the computation is relatively small. As the program runs and the angle grows, Math.sin(_currentAngle) begins to produce unreliable values.
So, I replaced Math.sin() with FloatMath.sin().
I also replaced
_currentAngle = _currentAngle + _angleIncrement;
with
_currentAngle = ((_currentAngle + _angleIncrement) % (2.0f * (float) Math.PI));, so the angle is always < 2*PI.
Works like a charm! Thanks very much for your help, praetorian droid!