My goal is to be able to process a single note from a guitar (or other instrument), and convert it into a frequency value.
This does not have to be in real time- I figured it would be much easier to record a one-second sound and analyse the data afterwards.
I understand that to do this I need to use a Fourier transform (and have a class that will perform a FFT). However, I don't really understand the input / output of a FFT algorithm- the class I am using seems to use a complex vector input and give a complex vector output. What do these represent?
Also, could anyone recommend any Java classes that can detect and record an input (and if possible, give frequency or values that can be plugged into FFT?)?
Thanks in advance.
Input to your FFT will be a time-domain signal representing the audio. If you record some sound for a second from the mic, this will really contain a wave that is made up of various frequencies at different amounts - hopefully mostly the frequency/frequencies corresponding to the note which you are playing, plus some outside noise and noise introduced by the microphone and electronics. If in that 1 second you happen to have, say, 512 time points (so the mic can sample at 512 times a second), then each of those time points represents the intensity picked up by the mic. These sound intensity values can be turned from their time-domain representation to a frequency-domain representation using the FFT.
If you now give this to the FFT, as it is a real-valued input, you will get a symmetric complex output (symmetric around the central value) and can ignore the second half of the complex vector output and use only the first half - i.e. the second half will be symmetric (and thus "identical") to the first half. The output represents the contributions of each frequency to the input waveform - in essence, each "bin" or array index contains information about that frequency's amplitude. To extract the amplitude you want to do:
magnitudeFFTData[i] = Math.sqrt((real * real) + (imaginary * imaginary));
where real and imaginary are the real and imaginary parts of the complex number at that frequency bin. To get the frequency corresponding to a given bin, you need the following:
frequency = i * Fs / N;
where i is bin or array index number, Fs the sampling frequency and N the number of data points. From a project of mine wherein I recently used the FFT:
for (int i = (curPersonFFTData.length / 64); i < (curPersonFFTData.length / 40); i++) {
double rr = (curPersonFFTData[i].getReal());
double ri = (curPersonFFTData[i].getImaginary());
magnitudeCurPersonFFTData[i] = Math.sqrt((rr * rr) + (ri * ri));
ds.addValue(magnitudeCurPersonFFTData[i]);
}
The divisions by 64 and 40 are arbitrary and useful for my case only, to only get certain frequency components, as opposed to all frequencies, which you might want. You can easily do all this in real time.
Related
Initially I have an array of time and an array of voltage and I have applied FFT and converted that time domain into frequency domain.After applying FFT I got an array of frequencies. Now I have cut off frequency and I need to implement Low pass filter on the same. I need to do this using JAVA. Could someone please refer me if there is any open source available or any idea of implementing the same. Any references that would be implemented using frequency values and cut off frequency would help.
I am completely new to this topic so my approach of question might be little weird. Thanks in advance for support!!!
Since you already have a array with the FFT values, you can implement a very crude low pass filter by just setting those FFT coefficients that correspond to frequencies over your cut-off value to zero.If you need a nicer filter you can implement a digital filter or find an LPF implementation online and just use that.
EDIT: After computing the FFT you don't get an array of frequencies, you get an array of complex numbers representing the magnitude and phase of the data.You should be able to know what frequency each and every complex number in the array corresponds to because the FFT result will correspond to evenly spaced frequencies ranging from 0 to f_s, where f_s is the sampling frequency you used to get your data.
A useful exercise might be to first try and plot a frequency spectrum, because after plotting it, it will be clear how you can discard high frequencies thus realising a LPF.This slightly similar post might help you: LINK
EDIT: 1) First you need to find the sampling frequency (f_s) of your data, this is the number of samples have taken every second.It can be computed using f_s = 1/T, where T is the time interval between any two consecutive samples in the time domain.
2) After this you divide f_c by f_s, where f_c is the cut-off frequency to get a constant k.
3) You then set all COMPLEX numbers above index ( k times N) in your array to zero, where N is the number of elements in your array, simple as that, that will give you a basic Low pass filter (LPF).
Rough, indicative (pseudo)code below:
Complex[] fftData = FFT(myData);
int N = fftData.Length;
float T = 0.001;
float f_c = 500; //f_c = 500Hz
float f_s = 1/T; //f_s = 1000Hz
float k = f_c/f_s;
int index = RoundToNextLargestInteger(k * N);
//Low pass filter
for(int i = index; index < N; index++)
fftData[i] = 0;
The fftData you receive in your case will not already be in the form of elements from the Complex class, so make sure you know how your data is represented and which data elements to set to zero.
This is not really a good way to do it though as a single frequency in your data can be spread over several bins because of leakage so the results would be nasty in that case.Ideally you would want to design a proper digital filter or just use some software library.So if you need a very accurate LPF, you can go through the normal process of designing analog LPF and then warping it to a digital filter as discussed in THIS document.
I need to compare two audio signals, signal1 and signal2. Signal1 is a white noise. Signal2 is the same signal of signal1, with a particular equalization for cutting or for attenuating some frequencies.
How can I get the ratio of two audio signals in the frequencies domain? (e.g.: at the frequency of 100Hz, signal2 is attenuated by 50% compared to signal1).
I need this information to process a third signal by applying the same equalization applied to transform signal1 in signal2.
I used this library to process my data and pass from the time domain to the frequencies domain. This code is the same for signal1 and signal2.
DoubleFFT_1D fft1 = new DoubleFFT_1D (FFT_SIZE);
double[] input1 = new double[FFT_SIZE];
double[] fftBuff1 = new double[FFT_SIZE * 2];
this.wavFileDoubleInputStreamMic.read(input1, 0, FFT_SIZE);
for (int i = 0; i < FFT_SIZE; i++){
fftBuff1[2*i] = input1[i];
fftBuff1[2*i+1] = 0;
}
fft1.complexForward(fftBuff1);
How can I use FFT results (from signal1 and signal2) to reach my goal?
You need to calculate the magnitude of each signal in the frequency domain to get the power spectrum estimate for each, and then do a divsion, i.e.
get signal 1 and signal 2
apply suitable window function to both signals (e.g. von Hann)
apply FFT to windowed signals
calculate magnitude of FFT output, mag = sqrt(re*re+im*im) - this gives a real-valued power spectrum
divide power spectrum of signal 1 by power spectrum of signal 2 to get real-valued ratio versus frequency
To apply this correction to signal 3 you can use the overlap-add or overlap-save method - essentially you take the FFT of signal3, multiply each complex value by the real value obtained above, then use an inverse FFT to get back to the time domain. The only slight complication is the need to overlap successive sample windows and process this overlap correctly (see links to overlap-add/overlap-save methods above.)
I'm not really sure what's the right title for my question
So here's the question
Suppose I have N number of samples, eg:
1
2
3
4
.
.
.
N
Now I want to "reduce" the size of the sample from N to M, by dumping (N-M) data from the N samples.
I want the dumping to be as "distributed" as possible,
so like if I have 100 samples and want to compress it to 50 samples, I would throw away every other sample. Another example, say the data is 100 samples and I want to compress it to 25 samples. I would throw away 1 sample in the each group of 100/25 samples, meaning I iterate through each sample and count, and every time my count reaches 4 I would throw away the sample and restart the count.
The problem is how do I do this if the 4 above was to be 2.333 for example. How do I treat the decimal point to throw away the sample distributively?
Thanks a lot..
The terms you are looking for are resampling, downsampling and decimation. Note that in the general case you can't just throw away a subset of your data without risking aliasing. You need to low pass filter your data first, prior to decimation, so that there is no information above your new Nyquist rate which would be aliased.
When you want to downsample by a non-integer value, e.g. 2.333 as per your example above you would normally do this by upsampling by an integer factor M and then downsampling by a different integer factor N, where the fraction M/N gives you the required resampling factor. In your example M = 3 and N = 7, so you would upsample by a factor of 3 and then downsample by a factor of 7.
You seem to be talking about sampling rates and digital signal processing
Before you reduce, you normally filter the data to make sure high frequencies in your sample are not aliased to lower frequencies. For instance, in your (take every fourth value), a frequency of that repeats every four samples will alias to the "DC" or zero cycle frequency (for example "234123412341" starting with the first of every grouping will get "2,2,2,2", which might not be what you want. (a 3 cycle would also alias to a cycle like itself (231231231231) => 231... (unless I did that wrong because I'm tired). Filtering is a little beyond what I would like to discuss right now as it's a pretty advanced topic.
If you can represent your "2.333" as some sort of fraction, lets see, that's 7/3. you were talking 1 out of every 4 samples (1/4) sou I would say you're taking 3 out of every 7 samples. so you might (take, drop, take, drop, take, drop, drop). but there might be other methods.
For audio data that you want to sound decent (as opposed to aliased and distorted in the frequency domain), see Paul R.'s answer involving resampling. One method of resampling is interpolation, such as using a windowed-Sinc interpolation kernel which will properly low-pass filter the data as well as allow creating interpolated intermediate values.
For non-sampled and non-audio data, where you just want to throw away some samples in a close-to-evenly distributed manner, and don't care about adding frequency domain noise and distortion, something like this might work:
float myRatio = (float)(N-1) / (float)(M-1); // check to make sure M > 1 beforehand
for (int i=0; i < M; i++) {
int j = (int)roundf(myRatio * (float)i); // nearest bin decimation
myNewArrayLengthM[i] = myOldArrayLengthN[j];
}
I am implementing MFCC algorithm in Java.
There is a sample code here: http://www.ee.columbia.edu/~dpwe/muscontent/practical/mfcc.m at Matlab. However I have some problems with mel filter banking process. How to generate triangular windows and how to use them?
PS1: An article which has a part that describes MFCC: http://arxiv.org/pdf/1003.4083
PS2: If there is a document about MFCC algorithms steps basically, it will be good.
PS3: My main question is related to that: MFCC with Java Linear and Logarithmic Filters some implementations use both linear and logarithmic filter and some of them not. What is that filters and what is the center frequent concept. I follow that code:MFCC Java , what is the difference of it between that code: MFCC Matlab
Triangular windows as frequency band filters aren't hard to implement. You basically want to integrate the FFT data within each band (defined as the frequency space between center frequency i-1 and center frequency i+1).
You're basically looking for something like,
for(int bandIdx = 0; bandIdx < numBands; bandIdx++) {
int startFreqIdx = centerFreqs[bandIdx-1];
int centerFreqIdx = centerFreqs[bandIdx];
int stopFreqIdx = centerFreqs[bandIdx+1];
for(int freq = startFreqIdx; i < centerFreqIdx; i++) {
magnitudeScale = centerFreqIdx-startFreqIdx;
bandData[bandIdx] += fftData[freq]*(i-startFreqIdx)/magnitudeScale;
}
for(int freq = centerFreqIdx; i <= stopFreqIdx; i++) {
magnitudeScale = centerFreqIdx-stopFreqIdx;
bandData[bandIdx] += fftData[freq]*(i-stopFreqIdx)/magnitudeScale;
}
}
If you do not understand the concept of a "center frequency" or a "band" or a "filter," pick up an elementary signals textbook--you shouldn't be implementing this algorithm without understanding what it does.
As for what the exact center frequencies are, it's up to you. Experiment and pick (or find in publications) values that capture the information you want to isolate from the data. The reason that there are no definitive values, or even scale for values, is because this algorithm tries to approximate a human ear, which is a very complicated listening device. Whereas one scale may work better for, say, speech, another may work better for music, etc. It's up to you to choose what is appropriate.
Answer for the second PS: I found this tutorial that really helped me computing the MFCCs.
As for the triangular windows and the filterbanks, from what I understood, they do overlap, they do not extend to negative frequences and the whole process of computing them from the FFT spectrum and applying them back to it goes something like this:
Choose a minimum and a maximum frequency for the filters (for example, min freq = 300Hz - the minimum voice frequency and max frequency = your sample rate / 2. Maybe this is where you should choose the 1000Hz limit you were talking about)
Compute the mel values from the min and max chosen frequences. Formula here.
Compute N equally distanced values between these two mel values. (I've seen examples of different values for N, you can even find a efficiency comparison for different of values in this work, for my tests I've picked 26)
Convert these values back to Hz. (you can find the formula on the same wiki page) => array of N + 2 filter values
Compute a filterbank (filter triangle) for each three consecutive values, either how Thomas suggested above (being careful with the indexes) or like in the turorial recommended at the beginning of this post) => an array of arrays, size NxM, asuming your FFT returned 2*M values and you only use M.
Pass the whole power spectrum (M values obtained from FFT) through each triangular filter to get a "filterbank energy" for each filter (for each filterbank (N loop), multiply each magnitude obtained after FFT to each value in the corresponding filterbank (M loop) and add the M obtained values) => N-sized array of energies.
These are your filterbank energies that you can further apply a log to, apply the DCT and extract the MFCCs...
I want to plot the pitch of a sound into a graph.
Currently I can plot the amplitude. The graph below is created by the data returned by getUnscaledAmplitude():
AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream(file)));
byte[] bytes = new byte[(int) (audioInputStream.getFrameLength()) * (audioInputStream.getFormat().getFrameSize())];
audioInputStream.read(bytes);
// Get amplitude values for each audio channel in an array.
graphData = type.getUnscaledAmplitude(bytes, 1);
public int[][] getUnscaledAmplitude(byte[] eightBitByteArray, int nbChannels)
{
int[][] toReturn = new int[nbChannels][eightBitByteArray.length / (2 * nbChannels)];
int index = 0;
for (int audioByte = 0; audioByte < eightBitByteArray.length;)
{
for (int channel = 0; channel < nbChannels; channel++)
{
// Do the byte to sample conversion.
int low = (int) eightBitByteArray[audioByte];
audioByte++;
int high = (int) eightBitByteArray[audioByte];
audioByte++;
int sample = (high << 8) + (low & 0x00ff);
toReturn[channel][index] = sample;
}
index++;
}
return toReturn;
}
But I need to show the audio's pitch, not amplitude. Fast Fourier transform appears to get the pitch, but it needs to know more variables than the raw bytes I have, and is very complex and mathematical.
Is there a way I can do this?
Frequency (an objective metric) is not the same as pitch (a subjective quantity). In general, pitch detection is a very tricky problem.
Assuming you just want to graph the frequency response for now, you have little choice but to use the FFT, as it is THE method to obtain the frequency response of time-domain data. (Well, there are other methods, such as the discrete cosine transform, but they're just as tricky to implement, and more tricky to interpret).
If you're struggling with the implementation of the FFT, note that it's really just an efficient algorithm for calculating the discrete Fourier transform (DFT); see http://en.wikipedia.org/wiki/Discrete_Fourier_transform. The basic DFT algorithm is much easier (just two nested loops), but runs a lot slower (O(N^2) rather than O(N log N)).
If you wish to do anything more complex than simply plotting frequency content (like pitch detection, or windowing (as others have suggested)), I'm afraid you are going to have learn what the maths means.
Fast Fourier Transform doesn't need to know more then the input bytes you have. Don't be scared off by the Wikipedia article. An FFT algorithm will take your input signal (with the common FFT algorithms the number of samples is required to be a power of 2, e.g. 256, 512, 1024) and return a vector of complex numbers with the same size. Because your input is real, not complex, (imaginary portion set to zero) the returned vector will be symmetric. Only half of it will contain data. Since you do not care about the phase you can simply take the magnitude of the complex numbers, which is sqrt(a^2+b^2). Just taking the absoulte value of a complex number may also work, in some languages this is equivalent to the previous expression.
There are Java implementations of FFT available, e.g.: http://www.cs.princeton.edu/introcs/97data/FFT.java.html
Pseudo code will look something like:
Complex in[1024];
Complex out[1024];
Copy your signal into in
FFT(in, out)
for every member of out compute sqrt(a^2+b^2)
To find frequency with highest power scan for the maximum value in the first 512 points in out
The output will contain entires for frequencies between zero and half your sampling frequency.
Since FFT assumes a repeating signal you may want to apply a window to your input signal. But don't worry about this at first.
You can find more information on the web, e.g.: FFT for beginners
Also as Oli notes when multiple frequencies are present the perceived pitch is a more complex phenomenon.
There are several other questions on stackoverflow about this problem. Maybe these will help.
Instead, you could try to find a copy of Digital Audio with Java by Craig Lindley. I don't think it's in print anymore, but the copy on my desk has a section on the FFT and also a sample application of a guitar tuner.