frequency / pitch detection for dummies - java

While I have many questions on this site dealing with the concept of pitch detection... They all deal with this magical FFT with which I am not familiar. I am trying to build an Android application that needs to implement pitch detection. I have absolutely no understanding for the algorithms that are used to do this.
It can't be that hard can it? There are around 8 billion guitar tuner apps on the android market after all.
Can someone help?

The FFT is not really the best way to implement pitch detection or pitch tracking. One issue is that the loudest frequency is not always the fundamental frequency. Another is that the FFT, by itself, requires a pretty large amount of data and processing to obtain the resolution you need to tune an instrument, so it can appear slow to respond (i.e. latency). Yet another issue is that the result of an FFT is necessarily intuitive to work with: you get an array of complex numbers and you have to know how to interpret them.
If you really want to use an FFT, here is one approach:
Low-pass your signal. This will help prevent noise and higher harmonics from creating spurious results. Conceivably, you could do skip this step and instead weight your results towards the lower values of the FFT instead. For some instruments with strong fundamental frequencies, this might not be necessary.
Window your signal. Windows should be at lest 4096 in size. Larger is better to a point because it gives you better frequency resolution. If you go too large, it will end up increasing your computation time and latency. The hann function is a good choice for your window. http://en.wikipedia.org/wiki/Hann_function
FFT the windowed signal as often as you can. Even overlapping windows are good.
The results of the FFT are complex numbers. Find the magnitude of each complex number using sqrt( real^2 + imag^2 ). The index in the FFT array with the largest magnitude is the index with your peak frequency.
You may want to average multiple FFTs for more consistent results.
How do you calculate the frequency from the index? Well, let's say you've got a window of size N. After you FFT, you will have N complex numbers. If your peak is the nth one, and your sample rate is 44100, then your peak frequency will be near (44100/2)*n/N. Why near? well you have an error of (44100/2)*1/N. For a bin size of 4096, this is about 5.3 Hz -- easily audible at A440. You can improve on that by 1. taking phase into account (I've only described how to take magnitude into account), 2. using larger windows (which will increase latency and processing requirements as the FFT is an N Log N algorithm), or 3. use a better algorithm like YIN http://www.ircam.fr/pcm/cheveign/pss/2002_JASA_YIN.pdf
You can skip the windowing step and just break the audio into discrete chunks of however many samples you want to analyze. This is equivalent to using a square window, which works, but you may get more noise in your results.
BTW: Many of those tuner apps license code form third parties, such as z-plane, and iZotope.
Update: If you want C source code and a full tutorial for the FFT method, I've written one. The code compiles and runs on Mac OS X, and should be convertible to other platforms pretty easily. It's not designed to be the best, but it is designed to be easy to understand.

A Fast Fourier Transform changes a function from time domain to frequency domain. So instead of f(t) where f is the signal that you are getting from the microphone and t is the time index of that signal, you get g(θ) where g is the FFT of f and θ is the frequency. Once you have g(θ), you just need to find which θ with the highest amplitude, meaning the "loudest" frequency. That will be the primary pitch of the sound that you are picking up.
As for actually implementing the FFT, if you google "fast fourier transform sample code", you'll get a bunch of examples.

Related

Step detection in real-time 1D data

For a small project we're trying to implement an autopilot for a slot car. A gyro sensor is attached to the car and delivers the Z-value (meaning the amount of centrifugal force acting on the car/sensor) 20 times per second. One crucial part of this is the detection of whether or not the car is in a curve or on a straight part and when exactly it was entered and when it left that part. Only so we can have reliable prediction of what'll happen in the future.
As for now, we're working with a sliding window to smooth the data and then have hardcoded limits (-400 for a left curve and +400 for a right curve) to detect what kind of sector (left, right, straight) we're in.
Obviously this takes too long, as it takes a couple of messages until the program detects that it's a direction change because of the smoothing and the hardcoded limits.
Here's an example of two rounds on a simple track, starting at the checkered area:
A perfect algorithm would detect the sectors S R S R S L S R S R S R S for one round, with a delay of only a couple of data points.
We thought about using the first derivative of the gyro values, but in the sample graph right after the first left curve, the following right curve (between 22:36:40 and 22:36:42) shows signs of swerving. Here the first derivative would be close to 0 and indicate a straight part...
Also, there we'd need to set a hardcoded threshold again, but with the noise of the data it could be that a small bump in the track could result in such a noise level that it's derivative would exceed the threshold.
Now we're not sure about what would be the easiest/fastest/most reliable way to handle this sort of detection. Would using a derivative be a good idea? Is there a better way?
Any input would be greatly appreciated :)
The existing software is written in Java.
In such problems, you have to trade robustness for immediacy. If you don't know what happens in the future, you can only make assumptions. And these assumptions may hold or may not.
From the looks of your data, there shouldn't be any smoothing necessary. If you define a reasonable threshold, the curves should be recognized quite reliably. If, however, this is not the case, here are some things you could try:
You already mentioned smoothing. The crucial point is how you smooth. An asymmetric smoothing kernel is probably desirable (a half triangle filter can be updated in constant time). You can directly weigh robustness and immediacy by modifying the kernel width.
A simple alternative to filtering is counting. If your data is above the curve threshold, don't call it a curve just yet. Count how many data points are above the threshold in a row. If there are more than n data points above the threshold, then you're most likely in a curve.
Using derivatives is potentially problematic. The main reason against derivatives is that a curve is not defined by any derivative at all (at least no derivative of the force). The second problem is that you can only estimate the derivatives numerically, which is quite unstable with lots of noise. So you would have to smooth your data (or find a numerical scheme for your noise model), which again requires some latency.

Level out FFT graph (Processing)

I am trying to make a music visualizer in Processing, not that that part is super important, and I'm using a fast fourier transform through Minim. It's working perfectly (reading the data), but there is a large spike on the left (bass) end. What's the best way to 'level' this out?
My source code is here, if you want to take a look.
Thanks in advance,
-tlf
The spectrum you show looks fairly typical of a complex musical sound where you have a complex section at lower frequencies, but also some clear harmonics emerging from the low frequency mess. And, actually, these harmonics are atypically clear... music in general is complicated. Sometimes, for example, if a flute is playing a single clear note one will get a single nice peak or two, but it's much more common that transients and percussive sounds lead to a very complicated spectrum, especially at low frequencies.
For comparing directly to the video, it seems to me that the video is a bit odd. My guess is that the spectrum they show is either a zoom in a small section of the spectrum far from zero, or that it's just a graphical algorithm that's based off the music but doesn't correspond to an actual spectrum. That is, if you really want something to look very similar to this video, you'll need more than the spectrum, though the spectrum will likely be a good starting point. Here are a few points to note:
1) there is a prominent peak which occasionally appears right above the "N" in the word anchor. A single dominant peak should be clear in the audio as an approximately pure tone.
2) occasionally there's another peak that varies temporally with this peak, which would normally be a sign that the second peak is a harmonic, but many times this second peak isn't there.
3) A good examples of odd behavior, is a 2:26. This time just follows a little laser sound effect, and then there's basically a quite hiss. A hiss should be a broad spectrum sound without peaks, often weighted to lower frequencies. At 2:26, though, there's just this single large peak above the "N" with nothing else.
It turns out what I had to do was multiply the data by
Math.log(i + 2) / 3
where i is the index of the data being referenced, zero-indexed from the left (bass).
You can see this in context here

How should I implement accurate pitch-detection in Java for Android phones?

I want to develop an application that would require accurate pitch-detection for musical instruments through the Android phone's microphone. Most suggestions I read of involve using Fast Fourier Transforms (FFT), but they mentioned it having issues with accuracy and processing power (considering it should run smoothly on a smartphone). One answer suggested a 5Hz error margin, which would be quite noticeable in the low frequency tone range. If the error is logarithmic rather than static in nature, the error margin from each note should be less than 10 Cents (1 Cent=100th root of a semitone, 1,200th root of an octave), which would be <(1.005792941...).
Are those issues correct? If so, are there better approaches for pitch detection? Are there open source libraries that do this, or will I have to write the algorithm on my own (while aiming to accurately detect possibly more than one note per second)? I have a basic Java knowledge.
Also, what are the expected limitations for frequency range sensitivity of the average smartphone's microphone?

How to implement an Equalizer

I know there are a lot of questions about equalizers in so, but I didn't get what I was looking. What I want to do is an equalizer for modifying audio samples in such a way like:
equalizer.eqAudio(audiosamples, band, gain)
I'm not sure if that is the exact interface that I want because I know little about DSP in terms of implementing them (I used filters, limiters, compressors but not made them).
So Googling about this I read that I must do a FFT to the samples so I get the data per frequency ranges instead of amplitude, process it the way I want and then make the inverse of the FFT so I get the result in audio samples again. I looked for an implementation of this FFT and found JTransform for Java. This library has an implementation of a FFT related algorithm called Discrete Cosine Transform (DCT).
My questions are :
Well, Am I in the right way?
Since FFT gives me data about frequency, I should pass to the FFT algorithm a chunk of samples. How big this chunk must be?
Is there a good book about DSP programming that explains equalizers ?
Thanks!
FFT wouldn't be my first choice for audio equalisation. I would default to building an EQ with IIR or FIR filters. FFT might be useful for special circumstances.
A commonly recommended reference is the Cookbook Formulae for Audio EQ Biquad Filter Coefficients.
A java tutorial for programming biquad filters. http://arachnoid.com/BiQuadDesigner/index.html
Is there a good book about DSP programming that explains equalizers ?
Understanding Digital Signal Processing is a good introduction to DSP. There are chapters on FIR and IIR filters.
Intoduction To Digital Filters with Audio Applications by Julius O. Smith III.
Graphic Equalizer Design Using Higher-Order Recursive Filters by Martin Holters and Udo Zolzer is a short paper detailing one EQ filter design approach.
There are many different ways to obtain an equalizer, and as Shannon explains, the IIR/FIR filter way is one of them. However, if your goal is to quickly get an equalizer up and running, going the FFT way might be easier for you, as there exist a wealth of reference implementations.
As to your question of FFT size, it depends on what frequency resolution you want your equalizer to have. If you choose a size of 16, you will get 9 (8 complex + 1 real) channels in the frequency domain equally spaced from 0 to fs/2. The 1st is centered around 0Hz, and the 9th around fs/2 Hz. And note, some implementations return 16 channels where the high part is a mirrored and complex conjugated version of the low part.
As to the implementation of the equalizer functionality, multiply each channel with the wanted gain. And if the spectrum have the mirrored part, mirror the gains as well. If this is not done, the result of the following IFFT will not be a real valued signal. After multiplication, apply the IFFT.
As to the difference between a FFT and filter based equalizer, remember that a FFT is simply a fast way of calculating a set of FIR filters with sines as impulse, critically sampled (downsampled with the filter length) and evenly spaced center frequencies.
Regards

Useful audio data from FFT

I'm trying to do a simple music visualization in java. I have two threads set up, one for playing the clip, and another for extracting a chunk of bytes from the clip to process with an FFT. The processed array can then be sent to the JFrame that will handle drawing, and used as a parameter for some sort of visual.
I'm not exactly sure what to do with the data, however. I've been just using a power spectrum for now, which gives me very limited response, and I realize is too general for what I am trying to do. I'm open to using any FFT library out there, if there is a specific one that will be especially helpful. But, in general, what can I get from my data after doing an FFT, and how can I use to show decently accurate results in the visuals?
All FFTs will do pretty much the same thing given the same data. The FFT parameters you can vary are the scale factor, the length of the FFT (longer will give you higher frequency resolution, shorter will give you better time response), and (pre)windowing the data, which will cause less "splatter" or spectral leakage of spectral peaks. You can zero-pad an FFT for interpolating smoother looking results. You can average the magnitude results of several successive FFTs to reduce the noise floor. You can also use a scaling function such a log scaling (or log log, e.g. log on both axis) for presenting the FFT magnitude results.
The phase of a complex FFT is usually unimportant for any visualization unless you are doing some type of phase vocoder analysis+resynthesis.

Categories