I'm wondering how I can, in Java, preferably using DataLine capture audio from the microphone, and play it directly to the speakers, even if there is some delay.
Basically, I want to be able to take the audio from the microphone, store a buffer of a finite number of samples, be able to modify each sample in some way, and play it back out through the speakers with as little time for each sample between being recorded and played. Sort of like writing a Java program to use my computer as an effects pedal; is this possible?(Assuming I already know how to modify the samples). Just to be clear, I don't want to record a finite number of samples from the microphone, stop recording, modify, then play; I want it to be continuously recording and playing.
This is a matter of reading from a TargetDataLine into a byte buffer and then writing it to a SourceDataLine in a loop indefinitely.
The resulting latency will be highly dependent on the size of audio buffer you use. The larger your buffer the larger the latency.
Take a look at the AudioLoop example here.
Related
I'm writing a basic synth at the moment and have run into a bit of a strange problem. I get a constant popping sound while playing an array of bytes, representing 16 bit mono audio, through a SourceDataLine.
The pops play at a constant rate, and from what I can hear, pitch. The pops do slightly differ in frequencies though (again, from what I can hear), some notes have low-passed sounding pops, and others sound high-passed. The pops are not overriding though, you can still hear the desired sound in the background.
Nothing changes the rate of the pops, not note pitch, not the SourceDataLine buffer size, not the number of bytes I write to it at a time, except sample rate.
Lowering the sample rate decreases the rate of the pops and vice-versa.
To test my side of the program, I printed out the data being written to the SourceDataLine for about half a second and looked through around 15 cycles of the played sine wave, and it was completely fine; no sudden jumps, clipping, or anything else.
The only two things I use the value of the sample rate for is some basic math to help my sampler sample at the correct frequency, which is only calculated once for each note, and is definitely working as pitch is perfect, and for creating the SourceDataLine.
Here's how I'm starting the SourceDataLine (Taken from multiple parts of the main method):
AudioFormat format = new AudioFormat(AudioEnvironment.SAMPLE_RATE, AudioEnvironment.BIT_DEPTH, 1, true, true);
SourceDataLine line = AudioSystem.getSourceDataLine(format);
line.open(format, 8000);
line.start();
My data is correctly in big-endian, tested by me changing the endian flag in the constructor and getting my ears blasted with white noise.
After the program has set everything up, it constantly writes data to the SourceDataLine in this infinite loop:
while (true) {
for (Channel channel : channelSystem.getChannels()) {
if (channel.pitch != 0) {
wave.sample(channel, buffer);
line.write(buffer, 0, AudioEnvironment.SUB_BUFFER_SIZE * 2);
}
}
}
(A Channel is a class I created that contains all the data for a single note (Though obviously the program is not set up correctly for polyphony at the moment), buffer is an array of bytes, wave.sample() is where I sample my data into buffer, and AudioEnvironment.SUB_BUFFER_SIZE * 2 is the size of buffer)
I don't necessarily need an example of how to fix this in code, but an explanation of why this might be happening would be great.
EDIT: Something I should also probably add is that I've tried putting a print statement in the infinite write loop to print out the number of available bytes in the SourceDataLine, and it stays constantly around 500 - 2000, occasionally getting up to around 5000, but never near 8000, so the buffer is never running out of data.
Well as it turns out, the problem was completely unrelated to what I thought it might be.
Turns out there was a single equation I had written in my sampler that was just blatantly wrong.
After 2048 samples had been played, I would just kinda loop back to the beginning of the waveform, causing the popping.
I honestly have no idea why I wrote that in, but hey, it works now.
If you were to write a program that takes microphone input, reverses it (sets it out of phase by making 1's 0's and 0's 1's), and plays it back out of the speakers, could that cancel out sound? Wave physics says if crests align with troughs, destructive interference occurs, so can that be utilized here to achieve a lessened noise if not canceled out "completely." I can imagine that this wouldn't work due to either complication in reversing the audio, or even because it takes too long to reverse and play back, so that the sound wave has passed. If i had to associate a language to do this in it would have to be either c++ or java (I'm at least competent in both).
Yes it will cancel out sound. That's more or less how Surround Sound works: by subtracting the left/right channels, playing that in the 3rd speaker, and inverting the samples, playing those out of the 4th you get interesting spatial effects.
Also you wouldn't simply want to toggle all bits, you'd get noise; instead you want to negate.
With a small sample buffer you'd be fast enough to cancel out waves of certain frequencies. When these attack and decay, you'll be lagging, but as long as the wave sustains you can effectively cancel it out.
With bigger sample buffers, obviously the delay increases, since it takes longer to fill the buffer with samples. The size of the buffer determines how often a device interrupt occurs where the program would copy the input samples to an output buffer while applying an operation to them.
Typically recordings are made at 44.1kHz, meaning that many samples per second. If you set the buffer to say 256 samples, you would get notified 44100/256 times a second that there are 256 samples to be processed.
At 256 samples you'd lag behind 256/44100 = 0.0058 seconds or 5.8 milliseconds. Sound travels at around 340 m/s, so the sound wave would have moved 1.97 meters (340 * 5.8ms). This wavelength corresponds with the frequency 172 Hz (44100/256). That means that you can only effectively cancel out frequencies that have a lower frequency than that, because those of a higher frequency 'move' more than once during 5.8ms and are thus above the maximum 'sample rate', if you will.
For 64 samples, the frequency would be 44100/64 = 689 Hz. And, this is the maximum frequency! That means you could cancel out bass and the base frequency of the human voice, but not the harmonics.
A typical OS has it's clock frequency set to either 500, 1000, or 2000 Hz, meaning at best you could use a sample buffer of around two to three samples, giving you a maximum frequency of 500, 1000, or 2000 Hz. Telephones usually have a maximum frequency of about 3500 Hz.
You could get the system clock up to around 32kHz, and poll an ADC directly to reach such frequencies. However, you'd probably need to solder one to your LPT and run a custom OS, which means Java is out of the question, or use a pre-fab real-time embedded system that runs Java (see the comment by #zapl for links).
One thing I forgot to mention, is that you will need to take into account the position of the sound source, the microphone, and the speaker. Ideally all 3 are in the same place, so there is no delay. But this is almost never the case, which means you'd get an interference pattern: there will be spots in the room where the sound is cancelled, inbetween spots where it is not.
You cannot do this in software, with c++, or even assembly - the latency of just mirroring the the output on the speakers would be more than 6 ms on most computers. Even if you had a latency of only 0.1 ms, the resulting sound (assuming it is perfectly mixed) would at best sound like it was sampled at 10kHz (not very good).
The basic Idea is to create an application that can record audio from one device and send it over Wlan using sockets to another device that will play it. In nutshell a Lan voice chat program.
I am recording live audio from mic using a AudioRecord object and then read the recorded data into byte array ,then write the byte array to a TCP socket. The receiving device then reads that byte array from the socket and writes it to buffer of an AudioTrack object.
its like
Audio Record-->byte array-->socket--->LAN--->socket-->byte array-->AudioTrack
The process is repeated using while loops.
Although the audio is playing there its lagging between frames. i.e when I say Hello the receiver hears He--ll--O. Although the audio is complete but there is lag between the buffer blocks.
As far as I know the lag is due to delay in Lan transmission
How do I improve it?
What approach should I use so it is smooth as it is in commercial online chat applications like skype and gtalk?
Sounds like you need a longer buffer somewhere to deal with the variance of the audio transmission over lan. To deal with this you could create an intermediary buffer between the socket byte array and the audio track. Your buffer can be x times the size of the buffer used in the AudioTrack object. So something like this:
Socket bytes -> Audio Buffer -> Buffer to get fed to Audio Track -> Audio Track
When audio starts recording, don't play anything back until it completely fills up the longer buffer. And after that you can feed blocks of the size of your Audio Track buffer to your Audio Track object.
I want to play a certain parts of a wav file. Like playing the first ten seconds and then playing it from 50th-60th seconds and so on. I know how to play a entire wave file in Java using the start method of SourceDataLine class. Could anybody give me some pointers as to how I can seek a particular time position for audio and play it?
Find the length of a frame, in bytes, from the AudioFormat
Find the length in bytes of a second, by multiplying the frame size by the frame rate.
skip() that amount of bytes.
Play until the 2nd number of bytes calculated using the same formula.
As far as I can see, nothing happens when you just call start. You are responsible for pushing the bytes of your choice into the line. So open a RandomAccessFile, seek to the appropriate offset, and execute a loop that transports the file data to the SourceDataLine.
I'm trying to make a simple "virtual scratcher" but I don't know the theory behind it. Since I've found nothing useful on google I'm asking it here:
What happen when I scratch (move the track forward)? Do I raise the pitch and/or rate of the sample?
How can I emulate this phenomena with audio processing algorithms?
Sample code / tutorials would be reeeally appreciated :-)
What happen when I scratch (move the track forward)? Do I raise the
pitch and/or rate of the sample?
Think about what is actually happening: A record contains audio data. The record needle reads the audio data from the record. As the record spins, the playback position changes. (It's very similar to watching the playhead move through an audio file in a digital audio editor.)
When you physically spin the record faster, you are increasing the playback rate. The audio is both quicker and higher in pitch. Increase the playback rate by two and the audio will playback an octave higher.
When you physically spin the record slower, you are decreasing the playback rate. The audio is both slower and lower in pitch. Decrease the playback rate by two and the audio will playback an octave lower.
Records can only modify the audio playback by speeding up or slowing down the physical record, this effects both pitch and playback rate together. Some audio software can change the pitch and rate independently. Record players can not.
(Get a record player and experiment to hear what it sounds like.)
How can I emulate this phenomena with audio processing algorithms?
To emulate a DJ scratching a record you need to be able to adjust the playback rate of the audio as the user is "scratching".
When the user speeds up the record, speed up the playback rate. When the user slows the record, slow the playback rate.
When the user stops the record, stop playback altogether.
When the user spins the record in reverse, reverse the playback.
You don't need to change the pitch of the audio. Changing the playback rate will do that automatically. Any further adjustments to the pitch will sound incorrect.
I don't have any advice with regard to libraries but something like this isn't too difficult to implement if you take your time.
When you scratch, you're just moving the record back and forth under the needle. So it's equivalent to looping forward and backwards repeatedly over the same part of the audio file. I would guess that the speed is somewhere between a sine wave and a triangle wave. You'll need to do linear interpolation on the sample.
For a mobile app, I'd start by mapping one axis of the screen to a time range in the audio file. Keep a buffer containing the last MotionEvent. When the next MotionEvent arrives, calculate the mapped start and end positions based on the (X,Y) coordinates. Then, calculate the elapsed time between the two MotionEvents. This gives you enough information to play from the file at the right position and speed, constantly updating with each motion event.
You might need to do some kind of smoothing on the data, but this is a starting point.
A variable rate resampler might be appropriate. This will speed up the play rate and increase the pitch by the same ratio.
You would track ratio between the angular velocity of the "scratch" movement and the normal rate of platter rotation, and use that as the local resampling ratio.
There are a lot better (higher audio quality) DSP methods of doing rate resampling than linear interpolation, such as using a variable width Sinc interpolation kernel.