I want to plot the pitch of a sound into a graph.
Currently I can plot the amplitude. The graph below is created by the data returned by getUnscaledAmplitude():
AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream(file)));
byte[] bytes = new byte[(int) (audioInputStream.getFrameLength()) * (audioInputStream.getFormat().getFrameSize())];
audioInputStream.read(bytes);
// Get amplitude values for each audio channel in an array.
graphData = type.getUnscaledAmplitude(bytes, 1);
public int[][] getUnscaledAmplitude(byte[] eightBitByteArray, int nbChannels)
{
int[][] toReturn = new int[nbChannels][eightBitByteArray.length / (2 * nbChannels)];
int index = 0;
for (int audioByte = 0; audioByte < eightBitByteArray.length;)
{
for (int channel = 0; channel < nbChannels; channel++)
{
// Do the byte to sample conversion.
int low = (int) eightBitByteArray[audioByte];
audioByte++;
int high = (int) eightBitByteArray[audioByte];
audioByte++;
int sample = (high << 8) + (low & 0x00ff);
toReturn[channel][index] = sample;
}
index++;
}
return toReturn;
}
But I need to show the audio's pitch, not amplitude. Fast Fourier transform appears to get the pitch, but it needs to know more variables than the raw bytes I have, and is very complex and mathematical.
Is there a way I can do this?
Frequency (an objective metric) is not the same as pitch (a subjective quantity). In general, pitch detection is a very tricky problem.
Assuming you just want to graph the frequency response for now, you have little choice but to use the FFT, as it is THE method to obtain the frequency response of time-domain data. (Well, there are other methods, such as the discrete cosine transform, but they're just as tricky to implement, and more tricky to interpret).
If you're struggling with the implementation of the FFT, note that it's really just an efficient algorithm for calculating the discrete Fourier transform (DFT); see http://en.wikipedia.org/wiki/Discrete_Fourier_transform. The basic DFT algorithm is much easier (just two nested loops), but runs a lot slower (O(N^2) rather than O(N log N)).
If you wish to do anything more complex than simply plotting frequency content (like pitch detection, or windowing (as others have suggested)), I'm afraid you are going to have learn what the maths means.
Fast Fourier Transform doesn't need to know more then the input bytes you have. Don't be scared off by the Wikipedia article. An FFT algorithm will take your input signal (with the common FFT algorithms the number of samples is required to be a power of 2, e.g. 256, 512, 1024) and return a vector of complex numbers with the same size. Because your input is real, not complex, (imaginary portion set to zero) the returned vector will be symmetric. Only half of it will contain data. Since you do not care about the phase you can simply take the magnitude of the complex numbers, which is sqrt(a^2+b^2). Just taking the absoulte value of a complex number may also work, in some languages this is equivalent to the previous expression.
There are Java implementations of FFT available, e.g.: http://www.cs.princeton.edu/introcs/97data/FFT.java.html
Pseudo code will look something like:
Complex in[1024];
Complex out[1024];
Copy your signal into in
FFT(in, out)
for every member of out compute sqrt(a^2+b^2)
To find frequency with highest power scan for the maximum value in the first 512 points in out
The output will contain entires for frequencies between zero and half your sampling frequency.
Since FFT assumes a repeating signal you may want to apply a window to your input signal. But don't worry about this at first.
You can find more information on the web, e.g.: FFT for beginners
Also as Oli notes when multiple frequencies are present the perceived pitch is a more complex phenomenon.
There are several other questions on stackoverflow about this problem. Maybe these will help.
Instead, you could try to find a copy of Digital Audio with Java by Craig Lindley. I don't think it's in print anymore, but the copy on my desk has a section on the FFT and also a sample application of a guitar tuner.
Related
I have been reading the DSP Guide for what seems like a good 3 years now. Only until recently getting comfortable enough with programming did I decide to start experimenting around with FFT based convolution, 1D to be exact, one real in, a complex out.
I stumbled across some code for getting the FFT and IFFT and got it to work, I am getting a correct DFT in the complex frequency domain which after a little bit of bit manipulation (separating the complex array back into two separate real and imaginary arrays) and running it back through the IFFT followed by again more bit manipulation yeilds the original input like expected.
Now maybe I have the concept of Convolution all wrong but simply multiplying like elements of the complex arrays and then running the result back through the IFFT is giving bogus numbers. To test I am simply making a ramping kernel, and a dirac delta function as the input and plugging the result into the IFFT, the code is not giving the original kernel as expected.
public void convolution(double [] xReal, double [] xImag,
double [] hReal, double [] hImag){
for (int i = 0; i < n; i++){
xReal[i] *= hReal[i];
xImag[i] *= hImag[i];
}
}
My question is: is this code for multiplying the elements of the complex DFT correct.
I have been looking for simple convolution code or simply the math behind it but have had no such luck. Everything I have found has just been "multiplication in the frequency domain equals convolution in the time domain".
Multiplication in the frequency domain does a circular convolution, not a straight linear convolution, unless you zero pad out the FFTs and IFFT to longer than the sum of the length of the time domain signal and the full time domain impulse response. (Zero-pad, because a circular wrap/around of all zeros is the same as no wrap around).
Also, for convolution, you need to multiply by the transform (FFT) of the zero-padded impulse response (its frequency response), not the impulse response itself.
Also, the multiplication has to be a complex multiply, not a multiply of just the 2 separated component vectors.
Initially I have an array of time and an array of voltage and I have applied FFT and converted that time domain into frequency domain.After applying FFT I got an array of frequencies. Now I have cut off frequency and I need to implement Low pass filter on the same. I need to do this using JAVA. Could someone please refer me if there is any open source available or any idea of implementing the same. Any references that would be implemented using frequency values and cut off frequency would help.
I am completely new to this topic so my approach of question might be little weird. Thanks in advance for support!!!
Since you already have a array with the FFT values, you can implement a very crude low pass filter by just setting those FFT coefficients that correspond to frequencies over your cut-off value to zero.If you need a nicer filter you can implement a digital filter or find an LPF implementation online and just use that.
EDIT: After computing the FFT you don't get an array of frequencies, you get an array of complex numbers representing the magnitude and phase of the data.You should be able to know what frequency each and every complex number in the array corresponds to because the FFT result will correspond to evenly spaced frequencies ranging from 0 to f_s, where f_s is the sampling frequency you used to get your data.
A useful exercise might be to first try and plot a frequency spectrum, because after plotting it, it will be clear how you can discard high frequencies thus realising a LPF.This slightly similar post might help you: LINK
EDIT: 1) First you need to find the sampling frequency (f_s) of your data, this is the number of samples have taken every second.It can be computed using f_s = 1/T, where T is the time interval between any two consecutive samples in the time domain.
2) After this you divide f_c by f_s, where f_c is the cut-off frequency to get a constant k.
3) You then set all COMPLEX numbers above index ( k times N) in your array to zero, where N is the number of elements in your array, simple as that, that will give you a basic Low pass filter (LPF).
Rough, indicative (pseudo)code below:
Complex[] fftData = FFT(myData);
int N = fftData.Length;
float T = 0.001;
float f_c = 500; //f_c = 500Hz
float f_s = 1/T; //f_s = 1000Hz
float k = f_c/f_s;
int index = RoundToNextLargestInteger(k * N);
//Low pass filter
for(int i = index; index < N; index++)
fftData[i] = 0;
The fftData you receive in your case will not already be in the form of elements from the Complex class, so make sure you know how your data is represented and which data elements to set to zero.
This is not really a good way to do it though as a single frequency in your data can be spread over several bins because of leakage so the results would be nasty in that case.Ideally you would want to design a proper digital filter or just use some software library.So if you need a very accurate LPF, you can go through the normal process of designing analog LPF and then warping it to a digital filter as discussed in THIS document.
My goal is to be able to process a single note from a guitar (or other instrument), and convert it into a frequency value.
This does not have to be in real time- I figured it would be much easier to record a one-second sound and analyse the data afterwards.
I understand that to do this I need to use a Fourier transform (and have a class that will perform a FFT). However, I don't really understand the input / output of a FFT algorithm- the class I am using seems to use a complex vector input and give a complex vector output. What do these represent?
Also, could anyone recommend any Java classes that can detect and record an input (and if possible, give frequency or values that can be plugged into FFT?)?
Thanks in advance.
Input to your FFT will be a time-domain signal representing the audio. If you record some sound for a second from the mic, this will really contain a wave that is made up of various frequencies at different amounts - hopefully mostly the frequency/frequencies corresponding to the note which you are playing, plus some outside noise and noise introduced by the microphone and electronics. If in that 1 second you happen to have, say, 512 time points (so the mic can sample at 512 times a second), then each of those time points represents the intensity picked up by the mic. These sound intensity values can be turned from their time-domain representation to a frequency-domain representation using the FFT.
If you now give this to the FFT, as it is a real-valued input, you will get a symmetric complex output (symmetric around the central value) and can ignore the second half of the complex vector output and use only the first half - i.e. the second half will be symmetric (and thus "identical") to the first half. The output represents the contributions of each frequency to the input waveform - in essence, each "bin" or array index contains information about that frequency's amplitude. To extract the amplitude you want to do:
magnitudeFFTData[i] = Math.sqrt((real * real) + (imaginary * imaginary));
where real and imaginary are the real and imaginary parts of the complex number at that frequency bin. To get the frequency corresponding to a given bin, you need the following:
frequency = i * Fs / N;
where i is bin or array index number, Fs the sampling frequency and N the number of data points. From a project of mine wherein I recently used the FFT:
for (int i = (curPersonFFTData.length / 64); i < (curPersonFFTData.length / 40); i++) {
double rr = (curPersonFFTData[i].getReal());
double ri = (curPersonFFTData[i].getImaginary());
magnitudeCurPersonFFTData[i] = Math.sqrt((rr * rr) + (ri * ri));
ds.addValue(magnitudeCurPersonFFTData[i]);
}
The divisions by 64 and 40 are arbitrary and useful for my case only, to only get certain frequency components, as opposed to all frequencies, which you might want. You can easily do all this in real time.
I need to compare two audio signals, signal1 and signal2. Signal1 is a white noise. Signal2 is the same signal of signal1, with a particular equalization for cutting or for attenuating some frequencies.
How can I get the ratio of two audio signals in the frequencies domain? (e.g.: at the frequency of 100Hz, signal2 is attenuated by 50% compared to signal1).
I need this information to process a third signal by applying the same equalization applied to transform signal1 in signal2.
I used this library to process my data and pass from the time domain to the frequencies domain. This code is the same for signal1 and signal2.
DoubleFFT_1D fft1 = new DoubleFFT_1D (FFT_SIZE);
double[] input1 = new double[FFT_SIZE];
double[] fftBuff1 = new double[FFT_SIZE * 2];
this.wavFileDoubleInputStreamMic.read(input1, 0, FFT_SIZE);
for (int i = 0; i < FFT_SIZE; i++){
fftBuff1[2*i] = input1[i];
fftBuff1[2*i+1] = 0;
}
fft1.complexForward(fftBuff1);
How can I use FFT results (from signal1 and signal2) to reach my goal?
You need to calculate the magnitude of each signal in the frequency domain to get the power spectrum estimate for each, and then do a divsion, i.e.
get signal 1 and signal 2
apply suitable window function to both signals (e.g. von Hann)
apply FFT to windowed signals
calculate magnitude of FFT output, mag = sqrt(re*re+im*im) - this gives a real-valued power spectrum
divide power spectrum of signal 1 by power spectrum of signal 2 to get real-valued ratio versus frequency
To apply this correction to signal 3 you can use the overlap-add or overlap-save method - essentially you take the FFT of signal3, multiply each complex value by the real value obtained above, then use an inverse FFT to get back to the time domain. The only slight complication is the need to overlap successive sample windows and process this overlap correctly (see links to overlap-add/overlap-save methods above.)
I am implementing MFCC algorithm in Java.
There is a sample code here: http://www.ee.columbia.edu/~dpwe/muscontent/practical/mfcc.m at Matlab. However I have some problems with mel filter banking process. How to generate triangular windows and how to use them?
PS1: An article which has a part that describes MFCC: http://arxiv.org/pdf/1003.4083
PS2: If there is a document about MFCC algorithms steps basically, it will be good.
PS3: My main question is related to that: MFCC with Java Linear and Logarithmic Filters some implementations use both linear and logarithmic filter and some of them not. What is that filters and what is the center frequent concept. I follow that code:MFCC Java , what is the difference of it between that code: MFCC Matlab
Triangular windows as frequency band filters aren't hard to implement. You basically want to integrate the FFT data within each band (defined as the frequency space between center frequency i-1 and center frequency i+1).
You're basically looking for something like,
for(int bandIdx = 0; bandIdx < numBands; bandIdx++) {
int startFreqIdx = centerFreqs[bandIdx-1];
int centerFreqIdx = centerFreqs[bandIdx];
int stopFreqIdx = centerFreqs[bandIdx+1];
for(int freq = startFreqIdx; i < centerFreqIdx; i++) {
magnitudeScale = centerFreqIdx-startFreqIdx;
bandData[bandIdx] += fftData[freq]*(i-startFreqIdx)/magnitudeScale;
}
for(int freq = centerFreqIdx; i <= stopFreqIdx; i++) {
magnitudeScale = centerFreqIdx-stopFreqIdx;
bandData[bandIdx] += fftData[freq]*(i-stopFreqIdx)/magnitudeScale;
}
}
If you do not understand the concept of a "center frequency" or a "band" or a "filter," pick up an elementary signals textbook--you shouldn't be implementing this algorithm without understanding what it does.
As for what the exact center frequencies are, it's up to you. Experiment and pick (or find in publications) values that capture the information you want to isolate from the data. The reason that there are no definitive values, or even scale for values, is because this algorithm tries to approximate a human ear, which is a very complicated listening device. Whereas one scale may work better for, say, speech, another may work better for music, etc. It's up to you to choose what is appropriate.
Answer for the second PS: I found this tutorial that really helped me computing the MFCCs.
As for the triangular windows and the filterbanks, from what I understood, they do overlap, they do not extend to negative frequences and the whole process of computing them from the FFT spectrum and applying them back to it goes something like this:
Choose a minimum and a maximum frequency for the filters (for example, min freq = 300Hz - the minimum voice frequency and max frequency = your sample rate / 2. Maybe this is where you should choose the 1000Hz limit you were talking about)
Compute the mel values from the min and max chosen frequences. Formula here.
Compute N equally distanced values between these two mel values. (I've seen examples of different values for N, you can even find a efficiency comparison for different of values in this work, for my tests I've picked 26)
Convert these values back to Hz. (you can find the formula on the same wiki page) => array of N + 2 filter values
Compute a filterbank (filter triangle) for each three consecutive values, either how Thomas suggested above (being careful with the indexes) or like in the turorial recommended at the beginning of this post) => an array of arrays, size NxM, asuming your FFT returned 2*M values and you only use M.
Pass the whole power spectrum (M values obtained from FFT) through each triangular filter to get a "filterbank energy" for each filter (for each filterbank (N loop), multiply each magnitude obtained after FFT to each value in the corresponding filterbank (M loop) and add the M obtained values) => N-sized array of energies.
These are your filterbank energies that you can further apply a log to, apply the DCT and extract the MFCCs...