Real time audio processing in Android

Real time audio processing in Android - java

I am using AudioRecord.read to capture PCM data to bytes.
However, I found that it restricted to initialize the AudioRecord object with at least 3904 buffers. Where the sampling rate is 44100.
Since I need to perform FFT of the data so I increased the samples to 4096.
As a result, the callback runs every 40-60ms by setPositionNotificationPeriod to 500. Since a further decrease the duration doesn't make any changes.
I'm wondoring if it is the fastest callback time with below configuration?
Sampling Rate: 44100
Channel: Mono
Encoding: PCM 16 BIT
BufferSize: 4096
(Im not sure if it is 4096 or 2048 since I read 4096 bytes every time and it can only fill 2048 2bytes buffer)
even 40-60ms is acceptable, I then perform FFT which eventually block each callback around 200-300ms. And there is still many noise affecting the accuracy.
I'm using these source code: FFT in Java and Complex class
Is there any other choice that perform fast, reliable and consume less memory processing FFT?
I found that the above classes new too much objects and pop up loads of gragarbage collection's messages.
In conclude, I have 3 questions:
Is the initial bufferSize equal to the buffers that I can read from the .read method?
Is 40-60ms the limitation to capture audio data with 44100 sampling rate?
Could you suggest some FFT library so that I can have a better performance in processing FFT? (I think if it is better to use C code library?)
Sorry for my bad english, also thank you for spending your time on my question.
P.S I tried it on iOS and it can just take 512 samples with 44100 sampling rate. So every callback takes around 10ms only.

Regarding question #3: Maybe not as fast as a native library, but I've started using these classes, and they seem to be fine for real-time work (although I am reading from files, not the microphone): FFTPack.
The most common native library is KissFFT, which you can find compiled for Android as part of libGDX.

Related

Sudden delay while recording audio over long time periods inside the JVM

I'm implementing an application which records and analyzes audio in real time (or at least as close to real time as possible), using the JDK Version 8 Update 201. While performing a test which simulates typical use cases of the application, I noticed that after several hours of recording audio continuously, a sudden delay of somewhere between one and two seconds was introduced. Up until this point there was no noticeable delay. It was only after this critical point of recording for several hours when this delay started to occur.
What I've tried so far
To check if my code for timing the recording of the audio samples is wrong, I commented out everything related to timing. This left me essentially with this update loop which fetches audio samples as soon as they are ready (Note: Kotlin code):
while (!isInterrupted) {
val audioData = read(sampleSize, false)
listener.audioFrameCaptured(audioData)
}
This is my read method:
fun read(samples: Int, buffered: Boolean = true): AudioData {
//Allocate a byte array in which the read audio samples will be stored.
val bytesToRead = samples * format.frameSize
val data = ByteArray(bytesToRead)
//Calculate the maximum amount of bytes to read during each iteration.
val bufferSize = (line.bufferSize / BUFFER_SIZE_DIVIDEND / format.frameSize).roundToInt() * format.frameSize
val maxBytesPerCycle = if (buffered) bufferSize else bytesToRead
//Read the audio data in one or multiple iterations.
var bytesRead = 0
while (bytesRead < bytesToRead) {
bytesRead += (line as TargetDataLine).read(data, bytesRead, min(maxBytesPerCycle, bytesToRead - bytesRead))
}
return AudioData(data, format)
}
However, even without any timing from my side the problem was not resolved. Therefore, I went on to experiment a bit and let the application run using different audio formats, which lead to very confusing results (I'm going to use a PCM signed 16 bit stereo audio format with little endian and a sample rate of 44100.0 Hz as default, unless specified otherwise):
The critical amount of time that has to pass before the delay appears seems to be different depending on the machine used. On my Windows 10 desktop PC it is somewhere between 6.5 and 7 hours. On my laptop (also using Windows 10) however, it is somewhere between 4 and 5 hours for the same audio format.
The amount of audio channels used seems to have an effect. If I change the amount of channels from stereo to mono, the time before the delay starts to appear is doubled to somewhere between 13 and 13.5 hours on my desktop.
Decreasing the sample size from 16 bits to 8 bits also results in a doubling of the time before the delay starts to appear. Somewhere between 13 and 13.5 hours on my desktop.
Changing the byte order from little endian to big endian has no effect.
Switching from stereomix to a physical microphone has no effect either.
I tried opening the line using different buffer sizes (1024, 2048 and 3072 sample frames) as well as its default buffer size. This also didn't change anything.
Flushing the TargetDataLine after the delay has started to occur results in all bytes being zero for approximately one to two seconds. After this I get non-zero values again. The delay, however, is still there. If I flush the line before the critical point, I don't get those zero-bytes.
Stopping and restarting the TargetDataLine after the delay appeared also does not change anything.
Closing and reopening the TargetDataLine, however, does get rid of the delay until it reappears after several hours from there on.
Automatically flushing the TargetDataLines internal buffer every ten minutes does not help to resolve the issue. Therefore, a buffer overflow in the internal buffer does not seem to be the cause.
Using a parallel garbage collector to avoid application freezes also does not help.
The used sample rate seems to be important. If I double the sample rate to 88200 Hertz, the delay starts occurring somewhere between 3 and 3.5 hours of runtime.
If I let it run under Linux using my "default" audio format, it still runs fine after about 9 hours of runtime.
Conclusions that I've drawn:
These results let me come to the conclusion that the time for which I can record audio before this issue starts to happen is dependent on the machine on which the application is run and dependent on the byte rate (i.e. frame size and sample rate) of the audio format. This seems to hold true (although I can't completely confirm this as of now) because if I combine the changes made in 2 and 3, I would assume that I can record audio samples for four times as long (which would be somewhere between 26 and 27 hours) as when using my "default" audio format before the delay starts to appear. As I didn't find the time to let the application run for this long yet, I can only tell that it did run fine for about 15 hours before I had to stop it due to time constraints on my side. So, this hypothesis is still to be confirmed or denied.
According to the result of bullet point 13, it seems like the whole issue only appears when using Windows. Therefore, I think that it might be a bug in the platform specific parts of the javax.sound.sampled API.
Even though I think I might have found a way to change when this issue starts to happen, I'm not satisfied with the result. I could periodically close and reopen the line to avoid the problem from starting to appear at all. However, doing this would result in some arbitrary small amount of time where I wouldn't be able to capture audio samples. Furthermore, the Javadoc states that some lines can't be reopened at all after being closed. Therefore, this is not a good solution in my case.
Ideally, this whole issue shouldn't be happening at all. Is there something I am completely missing or am I experiencing limitations of what is possible with the javax.sound.sampled API? How can I get rid of this issue at all?
Edit: By suggestion of Xtreme Biker and gidds I created a small example application. You can find it inside this Github repository.

I have (a rather) vast experience with Java audio interfacing.
Here are a few points that may be useful in guiding you towards a proper solution:
It's not a matter of JVM version - the java audio system have barely been upgraded since Java 1.3 or 1.5
The java audio system is a poor-man's wrapper around whatever audio interface API the operating system has to offer. In linux it's the Pulseaudio library, For windows, it's the direct show audio API (if I'm not mistaken about the latter).
Again, the audio system API is kind of a legacy API - some of the features are not working or not implemented, other behaviors are straight out weird, as they are dependent on an obsolete design (I can provide examples, if required).
It's not a matter of Garbage Collection - If your definition of "delay" is what I understand it to be (audio data is delayed by 1-2 seconds, meaning you start hearing stuff 1-2 seconds later), well, the garbage collector cannot cause blank data to magically be captured by the target data line and then append data as usual in an 2 seconds worth byte offset.
What's most likely happening here is either the hardware or driver providing you with 2 seconds worth of garbled data at some point, and then, streams the rest of the data as usual, resulting in the "delay" you are experiencing.
The fact that it works perfectly on linux means it's not a hardware issue, but rather a driver related one.
To affirm that suspicion, you can try capturing audio via FFmpeg for the same duration and see if the issue is reproduced.
If you are using specialized audio capturing hardware, better approach your hardware manufacturer and inquire him about the issue you are facing on windows.
In any case, when writing an audio capturing application from scratch I'd strongly suggest keeping away from the Java audio-system if possible. It's nice for POCs, but it's an un-maintained legacy API. JNA is always a viable option (I've used it in Linux with ALSA/Pulse-audio to control audio hardware attributes the Java audio system could not alter), so you could look for audio capturing examples in C++ for windows and translate them to Java. It'll give you fine grain control over audio capture devices, much more than what the JVM provide OOTB. If you want to have a look at a living/breathing usable JNA example, check out my JNA AAC encoder project.
Again, if you use special capturing harwdare, there's a good chance the manufacturer already provides it's own low-level C api for interfacing with the hardware, and you should consider having a look at it as well.
If that's not the case, maybe you and your company/client should
consider using specialized capturing hardware (doesn't have to be
that expensive).

Writing a big text file fast

How can I write a big text file very fast on Android?
I made some tests using a PrintWriter, a BufferedWriter and a FileWriter, but there are no significative time difference, and writing takes about three times the time of reading.

Android, just like every single other device with a storage unit, has limited writing speed.
The only thing you can do to speed up file writing and reading is get a better storage unit (which isn't as easy on an Android device as you can't just screw it open and replace the unit).
So when it comes to file reading and writing, you're limited by hardware.

Each memory has a specific block and page size. Due to how writes are done on Flash memories, it matters if you write it in small chunks.
Here is a benchmark with dd run in adb shell:
sailfish:/storage/emulated/0 $ dd if=/dev/zero of=test bs=4k count=$((32*1024))
32768+0 records in
32768+0 records out
134217728 bytes transferred in 0.521 secs (257615600 bytes/sec)
Another one for 1k block:
sailfish:/storage/emulated/0 $ dd if=/dev/zero of=test bs=1k count=$((128*1024))
131072+0 records in
131072+0 records out
134217728 bytes transferred in 1.369 secs (98040707 bytes/sec)
Writing data in small chunks makes it almost 3x slower.
I encourage you to benchmark your memory to see if you hit a hardware limitation. Experiment with write buffer sizes. If your memory is genuinely slower during writes, there is nothing more you can do.

For faster file read/write take a look at java.nio package.
This is a good article about java NIO, here you can find comparison with IO package

Give Okio a spin. Not sure if it's actually faster though.

High performance file IO in Android

I'm creating an app which communicates with an ECG monitor. Data is read at a rate of 250 samples pr second. Each package from the ECG monitor contains 80 bytes and this is received 40 times per second.
I've tried using a RandomAcccessFile but packages were lost both in sync
RandomAccessFile(outputFile, "rws") and async RandomAccessFile(outputFile, "rw") mode.
In a recent experiment I've tried using the MappedByteBuffer. This should be extremely performant, but when I create the buffer I have to specify a size map(FileChannel.MapMode.READ_WRITE, 0, 10485760) for a 10MB buffer. But this results in a file that's always 10MB in size. Is it possible to use a MappedByteBuffer where the file size is only the actual amount of data stored?
Or is there another way to achieve this? Is it naive to write to a file this often?
On a side note this wasn't an issue at all on iOS - this can be achieved with no buffering at all.

Noise cancelling program

If you were to write a program that takes microphone input, reverses it (sets it out of phase by making 1's 0's and 0's 1's), and plays it back out of the speakers, could that cancel out sound? Wave physics says if crests align with troughs, destructive interference occurs, so can that be utilized here to achieve a lessened noise if not canceled out "completely." I can imagine that this wouldn't work due to either complication in reversing the audio, or even because it takes too long to reverse and play back, so that the sound wave has passed. If i had to associate a language to do this in it would have to be either c++ or java (I'm at least competent in both).

Yes it will cancel out sound. That's more or less how Surround Sound works: by subtracting the left/right channels, playing that in the 3rd speaker, and inverting the samples, playing those out of the 4th you get interesting spatial effects.
Also you wouldn't simply want to toggle all bits, you'd get noise; instead you want to negate.
With a small sample buffer you'd be fast enough to cancel out waves of certain frequencies. When these attack and decay, you'll be lagging, but as long as the wave sustains you can effectively cancel it out.
With bigger sample buffers, obviously the delay increases, since it takes longer to fill the buffer with samples. The size of the buffer determines how often a device interrupt occurs where the program would copy the input samples to an output buffer while applying an operation to them.
Typically recordings are made at 44.1kHz, meaning that many samples per second. If you set the buffer to say 256 samples, you would get notified 44100/256 times a second that there are 256 samples to be processed.
At 256 samples you'd lag behind 256/44100 = 0.0058 seconds or 5.8 milliseconds. Sound travels at around 340 m/s, so the sound wave would have moved 1.97 meters (340 * 5.8ms). This wavelength corresponds with the frequency 172 Hz (44100/256). That means that you can only effectively cancel out frequencies that have a lower frequency than that, because those of a higher frequency 'move' more than once during 5.8ms and are thus above the maximum 'sample rate', if you will.
For 64 samples, the frequency would be 44100/64 = 689 Hz. And, this is the maximum frequency! That means you could cancel out bass and the base frequency of the human voice, but not the harmonics.
A typical OS has it's clock frequency set to either 500, 1000, or 2000 Hz, meaning at best you could use a sample buffer of around two to three samples, giving you a maximum frequency of 500, 1000, or 2000 Hz. Telephones usually have a maximum frequency of about 3500 Hz.
You could get the system clock up to around 32kHz, and poll an ADC directly to reach such frequencies. However, you'd probably need to solder one to your LPT and run a custom OS, which means Java is out of the question, or use a pre-fab real-time embedded system that runs Java (see the comment by #zapl for links).
One thing I forgot to mention, is that you will need to take into account the position of the sound source, the microphone, and the speaker. Ideally all 3 are in the same place, so there is no delay. But this is almost never the case, which means you'd get an interference pattern: there will be spots in the room where the sound is cancelled, inbetween spots where it is not.

You cannot do this in software, with c++, or even assembly - the latency of just mirroring the the output on the speakers would be more than 6 ms on most computers. Even if you had a latency of only 0.1 ms, the resulting sound (assuming it is perfectly mixed) would at best sound like it was sampled at 10kHz (not very good).

GZIP Compressing http response before sending to the client

I have gzipped the response using filter. The data received has been compressed from 50 MB to 5 MB however, it didn't result in much saving of time. The time taken has reduced from 12 seconds to 10 seconds. Is there anything else which can be done to reduce the time period?
Initially, the data transfer over the network took 9 seconds, now it takes 6 seconds after compression and 1 sec to decompress approximately
What else can be done?

For the filter the possible measures are little:
There exist different compression levels, the more compression the slower. The default or GZIPOutputStream should be fast enough.
GZIPOutputStream has constructors with size to set.
Then there is buffered streaming, and not doing byte-wise int read().
Code review for plausibility: the original Content-Length header must be removed.
For the static content:
.bmp are a waste of space
.pdf can be optimized when images repeat, w.r.t. fonts.
.docx is a zip format, so inner image files might be optimized too
For dynamic content generation:
Fixed documents can be stored (xxxxxx.yyy.gz) with timestamp and then the generation time forfalls. Only of interest after measuring the real bottle neck; likely the network.
The code for delivery should be fast. In general chain streams, try not to write to a ByteArrayOutputStream, but immediately to a BufferedOutputStream(original output stream). Check that the buffering is not done twice. Some wrapping streams check that the wrapped stream is an instanceof a buffered.
Production environment:
Maybe you even need throttling (slowing down delivery) in order to serve multiple simultaneous requests.
You may need to do the delivery on another server.
Buy speed from the provider. Inquire from the provider, whether the througput was too high, and the provider slowed things down.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.