How to emulate Vinyl Scratch effect in Audio Processing? - java

I'm trying to make a simple "virtual scratcher" but I don't know the theory behind it. Since I've found nothing useful on google I'm asking it here:
What happen when I scratch (move the track forward)? Do I raise the pitch and/or rate of the sample?
How can I emulate this phenomena with audio processing algorithms?
Sample code / tutorials would be reeeally appreciated :-)

What happen when I scratch (move the track forward)? Do I raise the
pitch and/or rate of the sample?
Think about what is actually happening: A record contains audio data. The record needle reads the audio data from the record. As the record spins, the playback position changes. (It's very similar to watching the playhead move through an audio file in a digital audio editor.)
When you physically spin the record faster, you are increasing the playback rate. The audio is both quicker and higher in pitch. Increase the playback rate by two and the audio will playback an octave higher.
When you physically spin the record slower, you are decreasing the playback rate. The audio is both slower and lower in pitch. Decrease the playback rate by two and the audio will playback an octave lower.
Records can only modify the audio playback by speeding up or slowing down the physical record, this effects both pitch and playback rate together. Some audio software can change the pitch and rate independently. Record players can not.
(Get a record player and experiment to hear what it sounds like.)
How can I emulate this phenomena with audio processing algorithms?
To emulate a DJ scratching a record you need to be able to adjust the playback rate of the audio as the user is "scratching".
When the user speeds up the record, speed up the playback rate. When the user slows the record, slow the playback rate.
When the user stops the record, stop playback altogether.
When the user spins the record in reverse, reverse the playback.
You don't need to change the pitch of the audio. Changing the playback rate will do that automatically. Any further adjustments to the pitch will sound incorrect.
I don't have any advice with regard to libraries but something like this isn't too difficult to implement if you take your time.

When you scratch, you're just moving the record back and forth under the needle. So it's equivalent to looping forward and backwards repeatedly over the same part of the audio file. I would guess that the speed is somewhere between a sine wave and a triangle wave. You'll need to do linear interpolation on the sample.
For a mobile app, I'd start by mapping one axis of the screen to a time range in the audio file. Keep a buffer containing the last MotionEvent. When the next MotionEvent arrives, calculate the mapped start and end positions based on the (X,Y) coordinates. Then, calculate the elapsed time between the two MotionEvents. This gives you enough information to play from the file at the right position and speed, constantly updating with each motion event.
You might need to do some kind of smoothing on the data, but this is a starting point.

A variable rate resampler might be appropriate. This will speed up the play rate and increase the pitch by the same ratio.
You would track ratio between the angular velocity of the "scratch" movement and the normal rate of platter rotation, and use that as the local resampling ratio.
There are a lot better (higher audio quality) DSP methods of doing rate resampling than linear interpolation, such as using a variable width Sinc interpolation kernel.

Related

Noise cancelling program

If you were to write a program that takes microphone input, reverses it (sets it out of phase by making 1's 0's and 0's 1's), and plays it back out of the speakers, could that cancel out sound? Wave physics says if crests align with troughs, destructive interference occurs, so can that be utilized here to achieve a lessened noise if not canceled out "completely." I can imagine that this wouldn't work due to either complication in reversing the audio, or even because it takes too long to reverse and play back, so that the sound wave has passed. If i had to associate a language to do this in it would have to be either c++ or java (I'm at least competent in both).
Yes it will cancel out sound. That's more or less how Surround Sound works: by subtracting the left/right channels, playing that in the 3rd speaker, and inverting the samples, playing those out of the 4th you get interesting spatial effects.
Also you wouldn't simply want to toggle all bits, you'd get noise; instead you want to negate.
With a small sample buffer you'd be fast enough to cancel out waves of certain frequencies. When these attack and decay, you'll be lagging, but as long as the wave sustains you can effectively cancel it out.
With bigger sample buffers, obviously the delay increases, since it takes longer to fill the buffer with samples. The size of the buffer determines how often a device interrupt occurs where the program would copy the input samples to an output buffer while applying an operation to them.
Typically recordings are made at 44.1kHz, meaning that many samples per second. If you set the buffer to say 256 samples, you would get notified 44100/256 times a second that there are 256 samples to be processed.
At 256 samples you'd lag behind 256/44100 = 0.0058 seconds or 5.8 milliseconds. Sound travels at around 340 m/s, so the sound wave would have moved 1.97 meters (340 * 5.8ms). This wavelength corresponds with the frequency 172 Hz (44100/256). That means that you can only effectively cancel out frequencies that have a lower frequency than that, because those of a higher frequency 'move' more than once during 5.8ms and are thus above the maximum 'sample rate', if you will.
For 64 samples, the frequency would be 44100/64 = 689 Hz. And, this is the maximum frequency! That means you could cancel out bass and the base frequency of the human voice, but not the harmonics.
A typical OS has it's clock frequency set to either 500, 1000, or 2000 Hz, meaning at best you could use a sample buffer of around two to three samples, giving you a maximum frequency of 500, 1000, or 2000 Hz. Telephones usually have a maximum frequency of about 3500 Hz.
You could get the system clock up to around 32kHz, and poll an ADC directly to reach such frequencies. However, you'd probably need to solder one to your LPT and run a custom OS, which means Java is out of the question, or use a pre-fab real-time embedded system that runs Java (see the comment by #zapl for links).
One thing I forgot to mention, is that you will need to take into account the position of the sound source, the microphone, and the speaker. Ideally all 3 are in the same place, so there is no delay. But this is almost never the case, which means you'd get an interference pattern: there will be spots in the room where the sound is cancelled, inbetween spots where it is not.
You cannot do this in software, with c++, or even assembly - the latency of just mirroring the the output on the speakers would be more than 6 ms on most computers. Even if you had a latency of only 0.1 ms, the resulting sound (assuming it is perfectly mixed) would at best sound like it was sampled at 10kHz (not very good).

Play Audio from Microphone through Speakers

I'm wondering how I can, in Java, preferably using DataLine capture audio from the microphone, and play it directly to the speakers, even if there is some delay.
Basically, I want to be able to take the audio from the microphone, store a buffer of a finite number of samples, be able to modify each sample in some way, and play it back out through the speakers with as little time for each sample between being recorded and played. Sort of like writing a Java program to use my computer as an effects pedal; is this possible?(Assuming I already know how to modify the samples). Just to be clear, I don't want to record a finite number of samples from the microphone, stop recording, modify, then play; I want it to be continuously recording and playing.
This is a matter of reading from a TargetDataLine into a byte buffer and then writing it to a SourceDataLine in a loop indefinitely.
The resulting latency will be highly dependent on the size of audio buffer you use. The larger your buffer the larger the latency.
Take a look at the AudioLoop example here.

How many audio clips can Java handle?

I'm making a game in Java. I want for there to be about 100 different samples and at any given time, 10 samples could be playing. However, for each of these 10 samples, I want to be able to manipulate their volume and pan.
As of right now, I request a line as follows: new DataLine.Info(Clip.class, format);
I do not specify the controls that I need for this line, but it appears that Clips always have MASTER_GAIN and BALANCE controls.
Is this correct?
Could I just create an array of 100 clips and preload all of the samples? I don't quite understand if Java's lines correspond with physical lines into a physical mixer or if they are virtualized.
If I am limited, then how can I swap samples in and out of lines? Is there a way to do this so that all of my say 100 samples are preloaded? Or, does preloading only help when you already have a line designated?
Again, if I am limited, is this the wrong approach? Should I either:
a. use a different programming language, and/or
b. combine audio streams manually and put them all through the same line.
Wow, that's a lot of questions. I didn't find answers in the documentation and I really hope that you guys can help. Please number your answers 1 to 4. Thank you very much!
1) I do NOT think it is safe to assume there will always be a BALANCE or even a MASTER_GAIN. Maybe there is. My experience with Java Controls for audio was vexing and short. I quickly decided to write my own mixer, and have done so. I'm willing to share this code. It includes basic provisions for handling volume and panning.
Even when they work, the Java Controls have a granularity that is limited by the buffer size being used, and this severely limits how fast you can fade in or out without creating clicks, if you trying to do fades. Setting and holding a single volume is no problem, though.
Another java library (bare bones but vetted by several game programmers at java-gaming.org) is "TinySound" which is available via github. I've looked it over but not used it myself. It also mixes all sounds down to a single output SourceDataLine. I can't recall how volume or panning is handled. He included provisions for ogg/vorbis files.
2) I'm not sure how you are envisioning using Clips work when you mention "Samples". Yes, you can preload an array of 100 Clips. And you would directly play one or another of these Clips on it's own thread (assuming using raw Java instead of an audio-mixing library), then reset them back to frame 0, then play them again. But you can only have one thread playing a given Clip at a time: they do not accommodate concurrent playback. (You can "retrigger" though by stopping a given playback and moving the position back to frame #0 then replaying.)
How long are the Clips? 100 of them could be a LOT of memory. If each is a second long, 100 seconds * 44100 frames per second * 4 bytes per frame = 17,640,000 bytes (almost 18MB just dedicated to RAM for sound!).
I guess, if you know you'll only need a few at a time and you can predict which ones will be needed, you can pre-load those and reuse them. But don't fall into the trap of thinking that Clips are meant to be loaded at the time of playback. If you are doing that, you should use SourceDataLines instead. They start playing back quicker since they don't have to wait until the entire external file has been put into memory (as Clips do). I'd recommend only using a Clip if you plan to reset it to the 0th frame and replay it (or loop it)!
3) Once it is loaded as a Clip, it is basically ready to go, there really isn't an additional stage. There really isn't any intermediate stage between an external file and a Clip in memory that I can think of that might be helpful.
Ah, another thought: You might want to create a thread pool ( = max number of concurrent sounds) and manage that. I don't know at what point the scaling justifies the extra management.
4) It IS possible to run concurrent SourceDataLines in many contexts, which relieves the need for holding the entire file in RAM. In that case, the only thing you can preload are the Strings for the File locations, I think. I may be wrong and you can preload the Files as well, but maybe not. Definitely can't reuse an AudioInputLine! On the plus side, an SDL kicks off pretty quick compared to an UNLOADED Clip.
HOWEVER! There are systems (e.g., some Linux OS) that limit you to a single output, which might be either a Clip or a SourceDataLine. That was the clincher for me when I decided to build my own mixer.
I think if only 8 or 10 tones are playing at one time, you will probably be okay as long as the graphics are not too ambitious (not counting the above mentioned Linux OS situation). You'll have to test it.
I don't know what alternative languages you are considering. Some flavor of C is the only alternative I know of. Most everything else that I know of, except Java, is not low-level or fast enough to handle that much audio processing. But I am only modestly experienced, and do not have a sound engineering background but am self-taught.

Some signal processing /FFT questions

I need some help confirming some basic DSP steps. I'm in the process of implementing some smartphone accelerometer sensor signal processing software, but I've not worked in DSP before.
My program collects accelerometer data in real time at 32 Hz. The output should be the principal frequencies of the signal.
My specific questions are:
From the real-time stream, I am collecting a 256-sample window with 50% overlap, as I've read in the literature. That is, I add in 128 samples at a time to fill up a 256-sample window. Is this a correct approach?
The first figure below shows one such 256-sample window. The second figure shows the sample window after I applied a Hann/Hamming window function. I've read that applying a window function is a typical approach, so I went ahead and did it. Should I be doing so?
The third window shows the power spectrum (?) from the output of an FFT library. I am really cobbling together bits and pieces I've read. Am I correct in understanding that the spectrum goes up to 1/2 the sampling rate (in this case 16 Hz, since my sampling rate is 32 Hz), and the value of each spectrum point is spectrum[i] = sqrt(real[i]^2 + imaginary[i]^2)? Is this right?
Assuming what I did in question 3 is correct, is my understanding right that the third figure shows principal frequencies of about 3.25 Hz and 8.25 Hz? I know from collecting the data that I was running at about 3 Hz, so the spike at 3.25 Hz seems right. So there must be some noise other other factors causing the (erroneous) spike at 8.25 Hz. Are there any filters or other methods I can use to smooth away this and other spikes? If not, is there a way to determine "real" spikes from erroneous spikes?
Making a decision on sample size and overlap is always a compromise between frequency accuracy and timeliness: the bigger the sample, the more FFT bins and hence absolute accuracy, but it takes longer. I'm guessing you want regular updates on the frequency you're detecting, and absolute accuracy is not too important: so a 256 sample FFT seems a pretty good choice. Having an overlap will give a higher resolution on the same data, but at the expense of processing: again, 50% seems fine.
Applying a window will stop frequency artifacts appearing due to the abrupt start and finish of the sample (you are effectively applying a square window if you do nothing). A Hamming window is fairly standard as it gives a good compromise between having sharp signals and low side-lobes: some windows will reject the side-lobes better (multiples of the detected frequency) but the detected signal will be spread over more bins, and others the opposite. On a small sample size with the amount of noise you have on your signal, I don't think it really matters much: you might as well stick with a Hamming window.
Exactly right: the power spectrum is the square-root of the sum of the squares of the complex values. Your assumption about the Nyquist frequency is true: your scale will go up to 16Hz. I assume you are using a real FFT algorithm, which is returning 128 complex values (an FFT will give 256 values back, but because you are giving it a real signal, half will be an exact mirror image of the other), so each bin is 16/128 Hz wide. It is also common to show the power spectrum on a log scale, but that's irrelevant if you're just peak detecting.
The 8Hz spike really is there: my guess is that a phone in a pocket of a moving person is more than a 1st order system, so you are going to have other frequency components, but should be able to detect the primary. You could filter it out, but that's pointless if you are taking an FFT: just ignore those bins if you are sure they are erroneous.
You seem to be getting on fine. The only suggestion I would make is to develop some longer time heuristics on the results: look at successive outputs and reject short-term detected signals. Look for a principal component and see if you can track it as it moves around.
To answer a few of your questions:
Yes, you should be applying a window function. The idea here is that when you start and stop sampling a real-world signal, what you're doing anyway is applying a sharp rectangular window. Hann and Hamming windows are much better at reducing frequencies you don't want, so this is a good approach.
Yes, the strongest frequencies are around 3 and 8 Hz. I don't think the 8 Hz spike is erroneous. With such a short data set you almost certainly can't control the exact frequencies your signal will have.
Some insight on question 4 (from staring at accelerometer signals of people running for months of my life):
Are you running this analysis on a single accelerometer axis channel or are you combining them to create the magnitude of acceleration? If you are interested in the overall magnitude of acceleration of signal, then you should combine x y z such as mag_acc = sqrt((x - 0g_offset)^2 + (y - 0g_offset)^2 + (z - 0g_offset)^2). This signal should be at 1g when the device is still. If you are only looking at a single axis, then you will get components from the dominant running motion and also from the orientation of the phone changing contributing to your signal (because the contribution from gravity will be transitioning around). So if the phone orientation is moving around while you are running from how you are holding it, it can contribute a significant amount to the signal, but the magnitude will not show the orientation changes as much. A person running should have a really clean dominant frequency at the persons step rate.

Mixing Sound in Java?

How do I play the same sound more than once at any given time with numerous other sounds going on at the same moment? Right now I've got a "Clip" playing but it won't overlap with itself. ( I hear one bullet fire, the sound finishes then it plays again ). I'm writing a game with a fast bullet firing system but i can't get the sound to work nicely. It just doesn't sound "right" to hear only one bullet shot every half second when you spawn 20+ on the screen each second.
Any help? Pointers? :D
This seems to answer your question:
http://my.safaribooksonline.com/9781598634761/ch09lev1sec3
Quote:
"In other words, a single Clip object cannot mix with itself, only with other sounds. This process works quite well if you use short sound effects, but can sound odd if your sound clips are one second or more in length. [...] If you want to repeatedly mix a single clip, there are two significant options (and one other unlikely option):
1) Load the sound file into multiple Clip objects (such as an array), and then play each one in order. Whenever you need to play this specific sound, just iterate through the array and locate a clip that has finished playing, and then start playing it again."
So in principle Java does do mixing, just not inside a single clip.
Playing 20 bullit clips at once might be a little cpu intensive. It might be fine. I made a windchime once that played 7 chimes, overlapping (each was about 3 or 4 seconds long), and got away with setting it to play about 100 chimes per 5 second block. But the program wasn't doing anything else.
With Clips, to do this you would need to make multiple copies, and all that audio data would be sitting there, taking up RAM. If they are really short, it's not such a sacrifice. But for rapid fire, the solution most games use is to just cut off the sound and restart. You don't have to play the sound through to the end.
myClip.stop();
myClip.setFramePosition(0);
myClip.start();
with each bullit start. This is what is most often done. It uses a lot less CPU and less RAM than the overlapping Clip solution.
AudioClip might be what you're looking for - I use it in games playing short .wav sound effects, it's not perfect but it works fine most of the time.

Categories